Searching through PDF Files

Are you responsible for making OJS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
The Public Knowledge Project Support Forum is moving to http://forum.pkp.sfu.ca

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.
svillanueva
Posts: 74
Joined: Fri Jan 18, 2008 4:20 am
Location: Barcelona

Searching through PDF Files

Postby svillanueva » Fri Feb 01, 2008 2:03 am

Hi,

i've found this thread

viewtopic.php?f=2&t=621&p=9620&hilit=searching+on+pdf+files#p9620

but the problem is that I'm testing the OJS 2.2.0 TestDrive and I can't get any results when I'm searching an article. I can search through the title, and that's ok, but when I try to find a PDF through a word that I know it's contained in that PDF, the search engine does not show up any result. Why? Does the tool for searching through PDF files work in that testdrive site?


Thanks in advance,

Sergi Villanueva.

jmacgreg
Posts: 4191
Joined: Tue Feb 14, 2006 10:50 am
Contact:

Re: Searching through PDF Files

Postby jmacgreg » Fri Feb 01, 2008 10:55 am

Hi Sergi,

That is correct: we haven't set up the testdrive site to have PDF search functionality. It's an extra step, and the demo site has been set up as a basic example of an OJS installation.

If this is a feature you absolutely need to see before installing and testing the software yourself, I can PM you another test server with this functionality enabled.

Cheers,
James

svillanueva
Posts: 74
Joined: Fri Jan 18, 2008 4:20 am
Location: Barcelona

Re: Searching through PDF Files

Postby svillanueva » Mon Feb 04, 2008 1:16 am

Hi James,

thank you, i will wait until I install the software myself. Is it a plugin or just a configuration of the journal?



Regards,

Sergi Villanueva.

jmacgreg
Posts: 4191
Joined: Tue Feb 14, 2006 10:50 am
Contact:

Re: Searching through PDF Files

Postby jmacgreg » Mon Feb 04, 2008 10:25 am

Hi Sergi,

Indexing items like PDFs is covered in the FAQ:

2) How can I allow users to search non-text files, such as PDF or Microsoft
Word documents?

A: OCS supports indexing of non-text files via external conversion applications.
The "Search Settings" configuration section in config.inc.php can be modified
to enable indexing of certain binary file formats by setting a
"index[MIME_TYPE]" setting (with the desired file mime-type) to the path of
the appropriate external text converter for that file format.

Note that additional third-party software must be installed to use this
feature (such as "pdftotext" for PDF files).


Cheers,
James

sdellis
Posts: 13
Joined: Wed Sep 26, 2007 6:55 am

Re: Searching through PDF Files

Postby sdellis » Fri Feb 08, 2008 11:44 am

I've got pdf2text installed and have made sure the config file is set up properly, but I'm still not able to create an index for searching. Are articles supposed to be indexed automatically when published or do I need to run the rebuildSearchIndex.php script? When I try running that script I get the following errors:

Code: Select all

PHP Warning:  main(includes/driver.inc.php): failed to open stream: No such file or directory in /srv/www/htdocs/journals/tools/includes/cliTool.inc.php on line 22
PHP Fatal error:  main(): Failed opening required 'includes/driver.inc.php' (include_path='/usr/share/php') in /srv/www/htdocs/journals/tools/includes/cliTool.inc.php on line 22/


I noticed there is a file called driver.inc.php in an include directory one level up, but not in the tools/includes dir. Is this a bug or should I have a driver.inc.php file in that dir that somehow did not get placed there upon install?

Thanks,
Shaun

asmecher
Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm
Contact:

Re: Searching through PDF Files

Postby asmecher » Sat Feb 09, 2008 12:42 pm

Hi Shaun,

PDFs should be indexed automatically according to your indexing configuration in config.inc.php at the time they are uploaded (e.g. by the Layout Editor or Editor). The rebuildSearchIndex tool shouldn't be necessary, but it's helpful if you change your configuration or need to rebuild your indexes for some other reason.

Are you running the command-line PHP client? It's often called php-cli (in *NIX systems) or php-cli.exe (in Windows systems). Make sure that you're running it from the OJS installation directory. Other issues may come into play to prevent the includes from working properly, such as if you're running Safe Mode with functions like set_ini disabled. Otherwise, OJS should automatically configure itself to include scripts from the right subdirectories.

Regards,
Alec Smecher
Public Knowledge Project Team


Return to “OJS Technical Support”

Who is online

Users browsing this forum: Yahoo [Bot] and 3 guests