OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



OJS2 - Searching on PDF's

Are you an Editor, Author, or Journal Manager in need of help? Want to talk to us about workflow issues? This is your forum.

Moderators: jmacgreg, michael, vgabler, John

Forum rules
This forum is meant for general questions about the usability of OJS from an everyday user's perspective: journal managers, authors, and editors are welcome to post questions here, as are librarians and other support staff. We welcome general questions about the role of OJS and how the workflow works, as well as specific function- or user-related questions.

What to do if you have general, workflow or usability questions about OJS:

1. Read the documentation. We've written documentation to cover from OJS basics to system administration and code development, and we encourage you to read it.

2. take a look at the tutorials. We will continue to add tutorials covering OJS basics as time goes on.

3. Post a question. Questions are always welcome here, but if it's a technical question you should probably post to the OJS Technical Support subforum; if you have a development question, try the OJS Development subforum.

OJS2 - Searching on PDF's

Postby helenw » Fri Dec 16, 2005 11:23 pm

Hi everyone, we're confused as to why our PDF files are not being indexed. I have tried searching on some terms that I know are in the articles (PDF's) but I keep getting no results.

Could this have something to do with our PDF's being locked?

Any advice would be greatly appreciated!

Helen Wolff
helenw
 
Posts: 18
Joined: Thu Dec 09, 2004 9:31 pm
Location: Swinburne University of Technology, Melbourne, Australia

Postby asmecher » Sat Dec 17, 2005 3:47 pm

Hi Helen,

Look in OJS's configuration (config.inc.php) -- there's configuration options there for the tools used to index PDFs etc. For example:
Code: Select all
index[application/pdf] = "/usr/bin/pdftotext %s -"

uses the pdftotext tool to extract text from PDFs. Ensure this is properly configured and rebuild the index (using tools/rebuildSearchIndex.php); if you're still having trouble, look at the output of the pdftotext tool manually by invoking it on one of your PDFs. Depending on the fonts used, any PDF security settings, etc., the tool may be having trouble extracting text.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

Searching without pdftotxt

Postby pashton » Thu Jan 19, 2006 5:56 am

Hi

Is it possible to some how manually place a text file somewhere on the server to be searched if you do not have and cannot get any of the pdftotext functionality happening?

Kind Regards

Paul
pashton
 
Posts: 38
Joined: Fri Dec 17, 2004 5:51 pm

Postby asmecher » Thu Jan 19, 2006 11:07 am

Hi Paul,

The line in the configuration file can be used to execute any program, batch file or shell script, and as long as it spits out some text, that text will be used to index the PDF. If you were to place text files in the same directories as the PDFs, with ".txt" appended to the filenames (i.e. for a PDF called my-article-galley.pdf, store the text in a file called my-article-galley.pdf.txt), you could use something like:
Code: Select all
index[application/pdf] = "/bin/cat %s.txt"
(Or, if you were using Windows, use "type" instead of "/bin/cat".)

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

PDF to text

Postby pashton » Thu Jan 19, 2006 4:46 pm

Hello Alec,

Thanks for the reply. It still does not seem to work though.
I have added the line:

index[application/pdf] = "/bin/cat %s.txt"

to my config file under the the existing lines.

I then saved a plain text file in files/journals/1/articles/35/public called 35-208-1-PB.pdf.txt (i also tried 35-208-1-PB.txt the pdf is called 35-208-1-PB.pdf which is the name given by the system) I gave it a 644 permission.

When I search i do not get any results.

Can you think what I may be doing incorrectly. I am on a virtual host if that makes any difference.

Regards

Paul
pashton
 
Posts: 38
Joined: Fri Dec 17, 2004 5:51 pm

Postby asmecher » Thu Jan 19, 2006 5:34 pm

Hi Paul,

I haven't tested the setup I suggested above, so feedback on your success or lack thereof is much appreciated.

Have you re-indexed your submissions? Use tools/rebuildSearchIndex.php. It may take quite a while if you've got a lot of articles.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

re indexing

Postby pashton » Thu Jan 19, 2006 6:00 pm

Hi Alec,

No I have not! I was unaware of this tool.

I tried to run it in the browser but I got the following message:

This script can only be executed from the command-line

I am not sure how to execute scritpts from the command line though (or even how to bring up a command line). Is there another way to rebuild the index?

Thanks again

Paul
pashton
 
Posts: 38
Joined: Fri Dec 17, 2004 5:51 pm

Postby asmecher » Thu Jan 19, 2006 6:23 pm

Hi Paul,

If you've got SSH or telnet access to your host, connect and run the indexing tool from the command line. There's currently no web-based tool to rebuild the search index, but you can try running the command-line tool from the web by:
  • Disabling the check that ensures that tools aren't run from the web. See tools/includes/cliTool.inc.php, about line 43. I'd suggest disabling the test, calling the script from the web, and re-enabling the check *while the indexer is running* to ensure that you're secure.
  • You'll probably also need to add the following line of code to the indexer's initialization so you don't timeout:
    Code: Select all
    set_time_limit(0);

Note that if your host is running with safe mode enabled, you probably won't be able to get the index properly built this way, as set_time_limit won't work.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

pdf to text

Postby pashton » Thu Jan 19, 2006 7:57 pm

Hello again,

I can now index the site via the browser:

Clearing index ... done Indexing "Cosmos and History: The Journal of Natural and Social Philosophy" ... 40 articles indexed

but the search does not pull up the article that I have supplied text for. Any other suggestions?

Regards
Paul
pashton
 
Posts: 38
Joined: Fri Dec 17, 2004 5:51 pm

Postby asmecher » Tue Jan 31, 2006 11:55 am

Hi Paul,

Unfortunately, the specifics of what you're trying to do will depend on your server -- but if I was having trouble, I'd try creating a shell script that logged executions to a text file; something like:
Code: Select all
#!/bin/bash
echo Someone called this script with parameters $@ >> /tmp/calls.txt

Make sure the script is set world-executable and ensure it has sufficient permissions to write into /tmp/calls.txt; regenerate your index, and see how the script is being called. Once you get this working, add functionality step-by-step until you've got what you're looking for.

Regards,
Alec Smecher
Open Journal Systems Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

Re: OJS2 - Searching on PDF's

Postby dudu » Wed Jan 30, 2008 1:57 pm

Hi,

I changed the config.inc.php as told by Alec. However, I got the following error message when I try to run "php rebuildSearchIndex.php":

Fatal error: Call to undefined method PhpMyVisitesPlugin::getInstallSchemaFile() in /var/www/XXX/htdocs/YYY/ojs-2.2/classes/plugins/Plugin.inc.php on line 66


Do you have any idea about the cause of it and how to solve it? Thanks in advance.
dudu
 
Posts: 14
Joined: Mon Feb 27, 2006 7:54 am

Re: OJS2 - Searching on PDF's

Postby asmecher » Fri Feb 01, 2008 5:45 pm

Hi dudu,

Could you turn on the show_stacktrace option in config.inc.php and post the results?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm


Return to OJS Editorial Support and Discussion

Who is online

Users browsing this forum: No registered users and 1 guest