OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Articles not indexed

Are you an Editor, Author, or Journal Manager in need of help? Want to talk to us about workflow issues? This is your forum.

Moderators: jmacgreg, michael, vgabler, John

Forum rules
This forum is meant for general questions about the usability of OJS from an everyday user's perspective: journal managers, authors, and editors are welcome to post questions here, as are librarians and other support staff. We welcome general questions about the role of OJS and how the workflow works, as well as specific function- or user-related questions.

What to do if you have general, workflow or usability questions about OJS:

1. Read the documentation. We've written documentation to cover from OJS basics to system administration and code development, and we encourage you to read it.

2. take a look at the tutorials. We will continue to add tutorials covering OJS basics as time goes on.

3. Post a question. Questions are always welcome here, but if it's a technical question you should probably post to the OJS Technical Support subforum; if you have a development question, try the OJS Development subforum.

Articles not indexed

Postby chm » Tue Oct 01, 2013 1:55 am

Dear all,

I just uploaded some galley files (PDF) to an already published issue and realized that the files did not get indexed. Somehow I recall that in the past this indexing happened automatically (as it is also mentioned here: viewtopic.php?f=8&t=6610&p=25477&hilit=indexing+article#p25477).
In my config.inc.php "pdftotext" is enabled.
Any clues as to what is going on here?
Many thanks!

(Version: OJS 2.3.7.0)
chm
 
Posts: 10
Joined: Tue Oct 01, 2013 1:46 am

Re: Articles not indexed

Postby asmecher » Tue Oct 01, 2013 11:19 am

Hi chm,

Does running the tools/rebuildSearchIndex.php script resolve the problem?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8337
Joined: Wed Aug 10, 2005 12:56 pm

Re: Articles not indexed

Postby chm » Tue Oct 01, 2013 11:21 pm

Dear Alec

Well, running rebuildSearchIndex.php is quite a major operation in this case. When I set up the journal some years ago I had to do it over night on my local machine because the server didn't cope and constantly ran out of memory (there are quite a few articles). There is still the possibility of running rebuildSearchIndex.php for just the new added articles (by modifying the script) but it was just so convenient in the past... Is there a way to check what's happening with the index when I add the new galley files?

Many thanks for your help!
chm
 
Posts: 10
Joined: Tue Oct 01, 2013 1:46 am

Re: Articles not indexed

Postby asmecher » Wed Oct 02, 2013 4:22 pm

Hi chm,

Does your installation have many journals, or just one large one? Recent releases allow selective rebuilding of the search engine. Also, there's a fix that solves a memory leak you were probably encountering (try patching https://github.com/pkp/pkp-lib/commit/cc4be218320d7cf4480a3dc1b15a54b29d3e5a6e).

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8337
Joined: Wed Aug 10, 2005 12:56 pm

Re: Articles not indexed

Postby chm » Fri Oct 04, 2013 1:01 am

Dear Alec

Many thanks for this information.
It's just one journal. But I have to move my installation to a different server in the next couple of days anyway and will try to rebuild the index there (after upgrading).
I'll post back if this doesn't solve the issue.

Thank you for your patience and assistance!
chm
 
Posts: 10
Joined: Tue Oct 01, 2013 1:46 am

Re: Articles not indexed

Postby chm » Mon Oct 14, 2013 6:57 am

Alright, I moved my installation to another server and ran an upgrade (now OJS 2.4.2.0).
Unfortunately rebuild the index with rebuildSearchIndex does not work. I get some 25000 keywords in article_search_keyword_list but these are only keywords from the articles' titles.

pdftotext is enabled in conf.inc.php:
Code: Select all
index[application/pdf] = "/usr/bin/pdftotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr '[:cntrl:]' ' '"


The path to pdftotext is correct.
pdftotext is working properly when I run it from the command line.

I have no idea what's going wrong here and would appreciate your help with troublshooting very much!

Update:
It looks like the script does not finish. At least it returns the command prompt rather than giving me a message like "XYZ articles indexed".
The PHP memory_limit is set to "no limit".
chm
 
Posts: 10
Joined: Tue Oct 01, 2013 1:46 am

Re: Articles not indexed

Postby chm » Wed Oct 16, 2013 12:09 am

Sorry for pushing this but it's quite urgent. My old installation will be down in a couple of days and till then the new one has to be up and running...
Maybe any hints as to how to debug this? I get no error messages and I don't see anything in the log files (I might looking in the wrong place though...).

Many thanks in advance!
chm
 
Posts: 10
Joined: Tue Oct 01, 2013 1:46 am

Re: Articles not indexed

Postby asmecher » Wed Oct 16, 2013 1:10 am

Hi chm,

Is the full command line...
Code: Select all
/usr/bin/pdftotext -enc UTF-8 -nopgbrk %s - | /usr/bin/tr '[:cntrl:]' ' '
...functioning when you try it with a PDF?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8337
Joined: Wed Aug 10, 2005 12:56 pm

Re: Articles not indexed

Postby chm » Wed Oct 16, 2013 1:23 am

Dear Alec,

Yes, it shows me the extracted content in the terminal.
chm
 
Posts: 10
Joined: Tue Oct 01, 2013 1:46 am

Re: Articles not indexed

Postby asmecher » Wed Oct 16, 2013 8:29 am

Hi chm,

Do you get any error messages in your PHP error log? I'd expect to see a security problem (e.g. execution of external tools disabled for security reasons), a permissions problem (e.g. file permissions), an out-of-memory situation, or something similar; all should be logged. You can use "php -i" to dump your PHP configuration, including logging setup, to find where those are recorded.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8337
Joined: Wed Aug 10, 2005 12:56 pm

Re: Articles not indexed

Postby chm » Wed Oct 16, 2013 11:52 am

Hmm, this is weird. I can't make PHP to talk.

This is what I did (Ubuntu 12.04 / Apache 2.2.22):
In
Code: Select all
/etc/php5/apache2/php.ini
and
Code: Select all
/etc/php5/cli.php.ini
:
Code: Select all
log_errors = On

Code: Select all
error_log = /var/log/php.log
resp.
Code: Select all
error_log = /var/log/php_cli.log


These log-files are owned by root, group is adm; owner and group can write.
After that I restarted the server.

When I run rebuildSearchIndex nothing is written to these log-files, neither to
Code: Select all
/var/log/apache2/error.log
.

phpinfo(); gives me:

Code: Select all
Loaded Configuration File    /etc/php5/apache2/php.ini
error_log   /var/log/php.log


I am sorry for causing so much trouble!
chm
 
Posts: 10
Joined: Tue Oct 01, 2013 1:46 am

Re: Articles not indexed

Postby asmecher » Wed Oct 16, 2013 11:56 am

Hi chm,

Try stepping through the instructions at http://pkp.sfu.ca/wiki/index.php/PKP_Frequently_Asked_Questions#When_I_click_some_button_or_follow_some_link.2C_I.27m_left_with_a_blank_page._What_do_I_do.3F, but making the ini_set modification to the script you're running rather than the php.ini wrapper.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8337
Joined: Wed Aug 10, 2005 12:56 pm

Re: Articles not indexed

Postby chm » Thu Oct 17, 2013 1:32 am

Ok, here we go.

1) in tools/rebuildSearchIndex.php I add: ini_set('display_errors', E_ALL);

when I run php tools/rebuildSearchIndex.php I get 27268 entries in article_search_keyword_list and this output in the terminal:
php_rebuild.png
php_rebuild.png (14 KiB) Viewed 1438 times


when I run sudo php tools/rebuildSearchIndex.php I get 29177 entries in article_search_keyword_list and this output in the terminal:
sudophp_rebuild.png
sudophp_rebuild.png (12.55 KiB) Viewed 1438 times


In both cases nothing in the log files.

2) In addition I changed
if((@include_once BASE_SYS_DIR.'/'.$filePath) === false) {
to
if((include_once BASE_SYS_DIR.'/'.$filePath) === false) {

in /lib/pkp/includes/functions.inc.php

The result is the same as with step 1).

3) In addition I change
function import($class) {
to
function import($class) {
echo "Importing " . $class . "<br/>\n";

in /lib/pkp/includes/functions.inc.php and I get following terminal output:
importingclasses.png
importingclasses.png (18.17 KiB) Viewed 1438 times


Again: nothing in the log files.

By the way: in step 3 I first forgot a quote and that was logged in /var/log/php_cli.php. So logging seems to work.

When I watched how article_search_keyword_list was built during the process I noticed that the progress slowed down considerably in the end which seems to match the "Killed" output in the terminal.
chm
 
Posts: 10
Joined: Tue Oct 01, 2013 1:46 am

Re: Articles not indexed

Postby asmecher » Thu Oct 17, 2013 8:48 am

Hi chm,

That "killed" message suggests an out-of-memory situation. Did you patch the memory leak with the patch indicated above?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8337
Joined: Wed Aug 10, 2005 12:56 pm

Re: Articles not indexed

Postby chm » Fri Oct 18, 2013 1:14 am

Dear Alec

No, I haven't (shame on me). I was too intimidated to do this...
Actually I don't know how to apply this patch. I downloaded the commit by adding .patch to the URL (like this https://github.com/pkp/pkp-lib/commit/c ... 5a6e.patch)
But I am confused, I can't find two of the files that should be patched in my installation, namely
tests/classes/core/PKPRequestTest.php
tests/classes/core/PKPRouterTestCase.inc.php

I am sure I am doing something completely wrong...
chm
 
Posts: 10
Joined: Tue Oct 01, 2013 1:46 am

Next

Return to OJS Editorial Support and Discussion

Who is online

Users browsing this forum: No registered users and 1 guest