You are viewing the PKP Support Forum | PKP Home Wiki

PDF full-text indexing not working

Are you responsible for making OJS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
What to do if you have a technical problem with OJS:

1. Search the forum. You can do this from the Advanced Search Page or from our Google Custom Search, which will search the entire PKP site. If you are encountering an error, we especially recommend searching the forum for said error.

2. Check the FAQ to see if your question or error has already been resolved.

3. Post a question, but please, only after trying the above two solutions. If it's a workflow or usability question you should probably post to the OJS Editorial Support and Discussion subforum; if you have a development question, try the OJS Development subforum.

PDF full-text indexing not working

Postby atlopes » Thu Jul 19, 2012 9:19 am

Solved: upgrade from 2.3.6 to 2.3.7 (after running deeper in the forum...)

Hi, all.

I'm not being able to make the PDF full-text indexing work for our OJS installation.

I wrote a small php program to test if pdftotext is working properly. I tried to replicate, to the best of my knowledge, the OJS code that invokes the extractor. The test program is located at http://www.iuc-revistas.com/bin/info.php and goes like this:
Code: Select all

echo 'Safe mode status: ';
if (ini_get('safe_mode')) echo 'ON';
else echo 'OFF';

echo '<br />Mime type using mime_content_type function: ';
echo mime_content_type('/hsphere/local/home/revistasiuc/iuc-revistas.com/bin/test.pdf');

echo '<hr />Output from pdftotext:';
echo '<pre>';
$fp = popen('/hsphere/local/home/revistasiuc/bin/pdftotext/pdftotext /hsphere/local/home/revistasiuc/iuc-revistas.com/bin/test.pdf','r');
echo fgets($fp,4096);
echo '</pre>';


pdftotext is actually a wrapper that invokes the real program:
Code: Select all
/hsphere/local/home/revistasiuc/bin/pdftotext/pdftotext.script -enc UTF-8 -nopgbrk $1 -

I set the PDF index entry in the search section of the configuration files this way:
Code: Select all
index[application/pdf] = "/hsphere/local/home/revistasiuc/bin/pdftotext/pdftotext %s"

Nevertheless, the PDF documents are not being indexed.

When I run the rebuildSearchIndex tool in my Windows box, against a local replica of our site, I can recreate the index for the articles files, but I'm stuck to make it work in the web installation and at publication time.

Is there anything I am missing here?

Thanks in advance,

António Lopes
Posts: 1
Joined: Thu Jul 19, 2012 8:18 am

Return to OJS Technical Support

Who is online

Users browsing this forum: Bing [Bot] and 2 guests