Hi, all.
I'm not being able to make the PDF full-text indexing work for our OJS installation.
I wrote a small php program to test if pdftotext is working properly. I tried to replicate, to the best of my knowledge, the OJS code that invokes the extractor. The test program is located at http://www.iuc-revistas.com/bin/info.php and goes like this:
- Code: Select all
<?php
echo 'Safe mode status: ';
if (ini_get('safe_mode')) echo 'ON';
else echo 'OFF';
echo '<br />Mime type using mime_content_type function: ';
echo mime_content_type('/hsphere/local/home/revistasiuc/iuc-revistas.com/bin/test.pdf');
echo '<hr />Output from pdftotext:';
echo '<pre>';
$fp = popen('/hsphere/local/home/revistasiuc/bin/pdftotext/pdftotext /hsphere/local/home/revistasiuc/iuc-revistas.com/bin/test.pdf','r');
echo fgets($fp,4096);
pclose($fp);
echo '</pre>';
?>
pdftotext is actually a wrapper that invokes the real program:
- Code: Select all
/hsphere/local/home/revistasiuc/bin/pdftotext/pdftotext.script -enc UTF-8 -nopgbrk $1 -
I set the PDF index entry in the search section of the configuration files this way:
- Code: Select all
index[application/pdf] = "/hsphere/local/home/revistasiuc/bin/pdftotext/pdftotext %s"
Nevertheless, the PDF documents are not being indexed.
When I run the rebuildSearchIndex tool in my Windows box, against a local replica of our site, I can recreate the index for the articles files, but I'm stuck to make it work in the web installation and at publication time.
Is there anything I am missing here?
Thanks in advance,
António Lopes
