OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Indexing full text

Are you responsible for making OJS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
What to do if you have a technical problem with OJS:

1. Search the forum. You can do this from the Advanced Search Page or from our Google Custom Search, which will search the entire PKP site. If you are encountering an error, we especially recommend searching the forum for said error.

2. Check the FAQ to see if your question or error has already been resolved.

3. Post a question, but please, only after trying the above two solutions. If it's a workflow or usability question you should probably post to the OJS Editorial Support and Discussion subforum; if you have a development question, try the OJS Development subforum.

Indexing full text

Postby aovalle » Sun May 12, 2013 1:06 pm

Hello,

Currently I have a journal in OJS and I'm trying to use the full-text index. I have done the following:

1. I removed the semicolon symbol start of the next instruction in the configuration file:
Code: Select all
index[application/pdf] = "/usr/bin/pdftotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr '[:cntrl:]' ' '"


2. I executed the following command via terminal:
Code: Select all
php rebuildSearchIndex.php


The problem lies in the results I have obtained, is repeated continuously as if there is a problem with PDF files:

Code: Select all
[ps116191]$ php rebuildSearchIndex.php
Clearing index ... done
Indexing "Revista" ... Error: May not be a PDF file (continuing anyway)
Error: PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
Error: May not be a PDF file (continuing anyway)
Error: PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
Error: May not be a PDF file (continuing anyway)...


It should be noted that my system OJS is installed on a VPS server, I think I have sufficient permissions to run to execute the instructions in the system.

Has anyone had a similar situation?

Thank you!

Andrés
aovalle
 
Posts: 7
Joined: Thu Oct 04, 2012 11:14 am

Re: Indexing full text

Postby asmecher » Sun May 12, 2013 2:10 pm

Hi Andrés,

This is coming from the PDF indexing tools you have configured in config.inc.php; try running that tool on your files manually to identify whether one may be corrupted.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7746
Joined: Wed Aug 10, 2005 12:56 pm

Re: Indexing full text

Postby aovalle » Mon May 13, 2013 8:51 am

Thanks for the reply.

How I can manually run the tool?

Regards,
aovalle
 
Posts: 7
Joined: Thu Oct 04, 2012 11:14 am

Re: Indexing full text

Postby asmecher » Mon May 13, 2013 9:25 am

Hi aovalle,

Check your config.inc.php to see what tools are configured -- typically something like pdf2ps or pdf2text. This will depend on your server's configuration. These tools are not part of OJS, but there should be lots of documentation online on using them.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7746
Joined: Wed Aug 10, 2005 12:56 pm

Re: Indexing full text

Postby aovalle » Wed May 15, 2013 8:32 am

Hello, I have reviewed the information about pdftotext command and have accomplished the following on my server:

1. I have located in the following directory:

/myjournal_files/journals/1/articles/1/

2. I have run the command to convert the PDF to text in different ways with the following results:

2.1. Executing the following command (exactly the same as the line is in config.php):

Code: Select all
/usr/bin/pdftotext-enc UTF-8-nopgbrk% s - | /usr/bin/tr '[: cntrl:]' ''

2.2. Get the following result:

Command does not work, can not find the file:

Image

2.3. Executing the following command (% s Changing the name of the file to convert):

Code: Select all
/usr/bin/pdftotext -enc UTF-8 -nopgbrk 1-4-1-PB.pdf - | /usr/bin/tr '[: cntrl:]' ''

2.4. Get the following result:

Find the file, convert the PDF to text and displays it on the screen:

Image

2.5. Executing the following command (deleting the - right after the name of the PDF file):

Code: Select all
/usr/bin/pdftotext -enc UTF-8 -nopgbrk 1-4-1-PB.pdf | /usr/bin/tr '[: cntrl:]' ''

2.6. Get the following result:

Find the file, convert the PDF to text and displays it on the screen.

A text file named 1-4-1-PB.txt with the content correctly in the article directory:

Image


***

That leads me to think that pdftotext works correctly, but there is a problem in trying to make the indexes.

I do not understand that table and field in the database will stay rates.

Can this information clarify the situation a little and try to solve the initial problem?:

Code: Select all
[ps116191]$ php rebuildSearchIndex.php
Clearing index ... done
Indexing "Revista" ... Error: May not be a PDF file (continuing anyway)
Error: PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
Error: May not be a PDF file (continuing anyway)
Error: PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
Error: May not be a PDF file (continuing anyway)...
aovalle
 
Posts: 7
Joined: Thu Oct 04, 2012 11:14 am

Re: Indexing full text

Postby asmecher » Wed May 15, 2013 8:44 am

Hi aovalle,

You've verified that one PDF file is OK, but there are probably many others in your installation. I suspect one of them is corrupted; you'll have to check them all to find out which. If you're familiar with the command-line "find" command, and in particular the "-exec" parameter to it, using that will be a lot quicker than checking each with a separate command.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7746
Joined: Wed Aug 10, 2005 12:56 pm


Return to OJS Technical Support

Who is online

Users browsing this forum: Bing [Bot] and 2 guests

cron