Search not via PDFtoText

OJS development discussion, enhancement requests, third-party patches and plug-ins.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
The Public Knowledge Project Support Forum is moving to

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.
Posts: 38
Joined: Fri Dec 17, 2004 5:51 pm

Search not via PDFtoText

Postby pashton » Mon Sep 03, 2007 5:28 pm

I would absolutely love a feature that allowed you to have full text searching on a system that cannot (or I should say will not) install the pdftotext feature.

For me an ideal kind of thing would be to be able to upload a HTML galley but have a tick box that hides it from public view, or just upload a text/html file that can be indexed.

P.S. I have seen the hack to gray out the html but I am in the process of planning to bring a number of journals into the OJS framework and hacks are undesirable when dealing with a larger number of journals.


Paul Ashton

Site Admin
Posts: 910
Joined: Tue Jan 10, 2006 6:20 am

Re: Search not via PDFtoText

Postby JasonNugent » Tue Sep 04, 2007 5:26 am


A while ago, we implemented something like what you want because we were having pdftotext issues. It involves creating a text file containing the full text of the PDF, which is uploaded as a hidden file and handed to the indexer for searching.

I'm attaching a unified diff to my post, which contains the code. It's a diff against version 2.1.0 of OJS, and may/may not go in cleanly against the latest branch. We've since moved away from it, because our issue with pdftotext has been remedied and we're using that now, instead.


(8 KiB) Downloaded 93 times

Return to “OJS Development”

Who is online

Users browsing this forum: No registered users and 1 guest