We have journal archives that need to be added to the OJS site: we have PDFs for newer issues, and image-only (scanned) PDFs for older. However, we need those articles to be full text searchable. So I have two questions:
1) We are trying now to make PDF search to work as per viewtopic.php?t=1119
- pdftotext.exe was added to C:\PHP\extras and tested that it works fine;
- line added to [search] section of config.inc.php: index[application/pdf] = "C:/PHP/extras/pdftotext.exe %s -"
Now we are trying to run
We receive no error messages, but it just blinks for the second. Search of the PDF is not working.
How we can debug this script? Where we can look for results of the text index?
php -v gives this info:
PHP 5.1.4 (cli) (built: May 4 2006 10:35:22)
Copyright (c) 1997-2006 The PHP Group
Zend Engine v2.1.0, Copyright (c) 1998-2006 Zend Technologies
2) We would like to index our image-only PDFs without going through the trouble of creating HTML versions. Is it possible to either a) add an OCRed plain text file for purpose of creating an index, but flag it so that it will be hidden from users (a "hidden" type for this text galley) or b) generate an index for the article and manually insert it?
Another possibility... if a text galley is added and indexed, then the galley is deleted, will the index also be deleted? Is there a way to force it to persist?