OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Searching full text - pdf

Are you responsible for making OJS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
What to do if you have a technical problem with OJS:

1. Search the forum. You can do this from the Advanced Search Page or from our Google Custom Search, which will search the entire PKP site. If you are encountering an error, we especially recommend searching the forum for said error.

2. Check the FAQ to see if your question or error has already been resolved.

3. Post a question, but please, only after trying the above two solutions. If it's a workflow or usability question you should probably post to the OJS Editorial Support and Discussion subforum; if you have a development question, try the OJS Development subforum.

Re: Searching full text - pdf

Postby nevermind182004 » Thu Sep 08, 2011 12:55 am

asmecher wrote:Hi Rye,

Dumb question, but have you checked that the referenced command line tool is installed and in the right place on your server?

You might want to test out that part of the configuration by running tools/rebuildSearchIndex.php; this will cause the index to be fully rebuilt and should pick up your PDFs. If that works, then the patch is the next thing to try, as it'll cause PDFs to be indexed upon upload in one place where that call was omitted. If the rebuild script doesn't work, you'll have to look more closely at your PDF indexing tools.

Regards,
Alec Smecher
Public Knowledge Project Team


Hi Alec,

I've run the tools/rebuildSearchIndex.php and its working but its results was all saying;
sh:/usr/bin/pdftotext: No such file or directory
sh:/usr/bin/pdftotext: No such file or directory
sh:/usr/bin/pdftotext: No such file or directory
sh:/usr/bin/pdftotext: No such file or directory

What does it mean and what should i do?

Thanks,
Rye
nevermind182004
 
Posts: 86
Joined: Mon Apr 20, 2009 6:02 pm

Re: Searching full text - pdf

Postby asmecher » Thu Sep 08, 2011 9:14 am

Hi Rye,

OJS is trying to call the "pdftotext" tool, which is not part of OJS and needs to be installed separately.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8597
Joined: Wed Aug 10, 2005 12:56 pm

Re: Searching full text - pdf

Postby nevermind182004 » Wed Sep 14, 2011 7:51 pm

Hello Alec,

Noob question, but how can i install this pdftotext tool. Is this a third party plugins that i need to use and place it in ojs?

Thanks,
Rye
nevermind182004
 
Posts: 86
Joined: Mon Apr 20, 2009 6:02 pm

Re: Searching full text - pdf

Postby asmecher » Thu Sep 15, 2011 8:18 am

Hi Rye,

How to install this will depend on your operating system. In a Debian-based operating system, for example, you can use a package called poppler-utils or ghostscript. Basically, OJS can use any tool that can be invoked from the command line to take a PDF file and return its text-based contents.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8597
Joined: Wed Aug 10, 2005 12:56 pm

Re: Searching full text - pdf

Postby nevermind182004 » Thu Oct 13, 2011 7:51 pm

Hello Alec,

Thanks for the reply! Took me a while to try this out because our hosting provider recently had an internal problem on their servers so they can't assist me asap. Our hosting provider managed to install the poppler-utils and ghostscript in our server.

I've run the rebuildSearchIndex.php, i think it's working already but im getting some illegal entry and incorrect password or something. Im guessing about the incorrect password, is it because some of our pdf's are locked for subscriptions? How about for errors like: illegal entries, Weird Encryption info, Undetermined Strings, Illegal Characters etc? What does it mean?

Image

And to add: Just as i was sending this post, i just got another fatal error: Out of memory (allocated xxxxxxx) (tried allocate 64 bytes) in home/xxx/xxx

Thanks,
Rye
nevermind182004
 
Posts: 86
Joined: Mon Apr 20, 2009 6:02 pm

Re: Searching full text - pdf

Postby asmecher » Fri Oct 14, 2011 10:20 am

Hi Rye,

Those messages are coming from the text extraction tool that OJS is invoking; some of the PDFs are probably password-protected (this is done outside of OJS, when you generate the PDF) and some of them probably have non-standard characters in them. The latter type of message probably isn't causing you any trouble, and the former just means those PDFs won't be indexed, though of course if they're password protected when someone tries to view them they won't be able to.

I'd suggest going through your PDFs to see which are giving you trouble; you may need to work with those files a little.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8597
Joined: Wed Aug 10, 2005 12:56 pm

Re: Searching full text - pdf

Postby nevermind182004 » Sun Oct 16, 2011 5:43 pm

Hi Alec,

Thanks for confirming. Yes, i guess some of the pdf's are password encrypted and some are locked.

One more thing, i can't complete the rebuilding of search index because of this error.. "Fatal error: Out of memory (allocated 13893632) (tried to allocate 64 bytes) in /users/..../public_html/...." Do you have any idea how to fix this? I've set the memory limit from 64-512mb already but to no avail..

Thanks!
Rye
nevermind182004
 
Posts: 86
Joined: Mon Apr 20, 2009 6:02 pm

Re: Searching full text - pdf

Postby asmecher » Mon Oct 17, 2011 11:44 am

Hi Rye,

Hmm, sounds like a bit of a memory leak in the rebuild script. It shouldn't need an ever-increasing amount of memory, but then again, there's not much control over memory usage in PHP. Check to see if your web-based php.ini is different from your command-line-based php.ini -- this is often the case -- and consider setting memory_limit to -1 in your command-line php.ini, at least temporarily.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8597
Joined: Wed Aug 10, 2005 12:56 pm

Re: Searching full text - pdf

Postby nevermind182004 » Wed Nov 09, 2011 6:37 pm

asmecher wrote:Hi Rye,

Hmm, sounds like a bit of a memory leak in the rebuild script. It shouldn't need an ever-increasing amount of memory, but then again, there's not much control over memory usage in PHP. Check to see if your web-based php.ini is different from your command-line-based php.ini -- this is often the case -- and consider setting memory_limit to -1 in your command-line php.ini, at least temporarily.

Regards,
Alec Smecher
Public Knowledge Project Team


Hi Alec,

How can i do this? (Check to see if your web-based php.ini is different from your command-line-based php.ini)

I also found out that the php.ini that i'm configuring located in my public_html directory is not being used/honored so i added a directive in our .htaccess to override our global php.ini. But still rebuilding of search index isn't successful.

Thanks,
Rye
nevermind182004
 
Posts: 86
Joined: Mon Apr 20, 2009 6:02 pm

Re: Searching full text - pdf

Postby asmecher » Wed Nov 09, 2011 7:16 pm

Hi Rye,

Check out your phpinfo() output to find out what php.ini is being used and whether your overrides are taking effect. To do this, put a script somewhere nearby:
Code: Select all
<?php phpinfo(); ?>
...and run it from the command line.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8597
Joined: Wed Aug 10, 2005 12:56 pm

Re: Searching full text - pdf

Postby nevermind182004 » Tue Nov 15, 2011 9:05 pm

hi Alec,

ive check the phpinfo and the php.ini being used is already the one that i've been changing. What must i do next?

Thanks,
Rye
nevermind182004
 
Posts: 86
Joined: Mon Apr 20, 2009 6:02 pm

Re: Searching full text - pdf

Postby asmecher » Tue Nov 15, 2011 10:07 pm

Hi Rye,

The message quoted above says that PHP was only allowed to allocate somewhere around 12MB of memory, not the 64-512MB you mentioned; did the script ever fail with a message about exceeding 512MB?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8597
Joined: Wed Aug 10, 2005 12:56 pm

Re: Searching full text - pdf

Postby nevermind182004 » Mon Jan 09, 2012 2:06 am

asmecher wrote:Hi Rye,

The message quoted above says that PHP was only allowed to allocate somewhere around 12MB of memory, not the 64-512MB you mentioned; did the script ever fail with a message about exceeding 512MB?

Regards,
Alec Smecher
Public Knowledge Project Team


Hi Alec,

No, the script doesn't have a message about exceeding 512mb.. here are the exact errors..
Fatal error: Out of memory (allocated 29097984) (tried to allocate 40961 bytes) in /home/xxxxx/public_html/lib/pkp/lib/phputf8/utils/ascii.php on line 94


Are there any other ways just to reset/recompile these index search?

Thanks much,
Rye
nevermind182004
 
Posts: 86
Joined: Mon Apr 20, 2009 6:02 pm

Re: Searching full text - pdf

Postby asmecher » Mon Jan 09, 2012 2:18 pm

Hi Rye,

I suspect you're modifying the wrong php.ini file or something similar -- that says your limit is still only 28MB or so. However, you might also be able to get through by disabling charset_normalization in config.inc.php; it's safe to do so.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8597
Joined: Wed Aug 10, 2005 12:56 pm

Re: Searching full text - pdf

Postby nevermind182004 » Mon Jan 09, 2012 11:33 pm

i'll try that out alec. thanks!

-- edited: no luck at all :S im still having the same problem :(
nevermind182004
 
Posts: 86
Joined: Mon Apr 20, 2009 6:02 pm

PreviousNext

Return to OJS Technical Support

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 3 guests