OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



RebuildSearchIndex with Allowed memory size exhausted

Are you responsible for making OJS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
What to do if you have a technical problem with OJS:

1. Search the forum. You can do this from the Advanced Search Page or from our Google Custom Search, which will search the entire PKP site. If you are encountering an error, we especially recommend searching the forum for said error.

2. Check the FAQ to see if your question or error has already been resolved.

3. Post a question, but please, only after trying the above two solutions. If it's a workflow or usability question you should probably post to the OJS Editorial Support and Discussion subforum; if you have a development question, try the OJS Development subforum.

RebuildSearchIndex with Allowed memory size exhausted

Postby josipkp » Tue Nov 25, 2008 6:46 pm

Hi,

I am trying to permit search in PDF files in 2.2.2 OJS installation:
uncommented the pdftotext in config.inc.php and run tools/rebuildSearchIndex.php

I received messages like that:
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 40961 bytes) in /var/www/ojs/classes/core/String.inc.php on line 397

I tried to create a new function in JournalDAO.inc.php (getJournalsForRebuildIndex) very similar to getJournals, just changing the sql to select journal_id with few articles. Run ok. I tried to run a second time, with other journal_id and with delete tables commented (in ArticleSearchIndex.php) to keep the work until now. Run ok, but when I run again, I received a similar message.

I tried to execute the pdftotext manually to verify if the problem is on the PDF file. Run.
The memory_limit in PHP5 is 100M.
If I observe the top command, I can see the used memory increase and increase until the message come.
If I comment the pdftotext in config.inc.php the rebuildSearchIndex.php runs ok.

Any suggestions about what could be wrong?
Thanks in advance,
Josi Perez
josipkp
 
Posts: 61
Joined: Fri Jun 27, 2008 8:51 am

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby asmecher » Tue Nov 25, 2008 9:26 pm

Hi Josi,

The error message you quote above indicates that the memory_limit is actually 32M -- I suspect there is another php.ini somewhere on your server for the command-line PHP interpreter, and the memory_limit in that config needs to be increased.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7746
Joined: Wed Aug 10, 2005 12:56 pm

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby josipkp » Wed Nov 26, 2008 12:18 pm

What stupid I was - I was so confident about the phpinfo in the OJS page that I even had seen the number in the error message :oops:

Now, the script runs until the end and I can search for words located in a PDF file, but I got the error below in 3 ou 4 articles:
Error: Illegal entry in bfchar block in ToUnicode CMap

Another doubt:
when a new issue will be published, need we to execute Reindex or is it in the OJS process?
I put counters (sequence, journal title, submission_date and article_id) to try catch the problem and I would like to keep them, but I am afraid if this routine it is called in another OJS point. I change too the SQL to select only enabled journals. Is there any wrong in this way?

Thanks for your answer.
Josi Perez
josipkp
 
Posts: 61
Joined: Fri Jun 27, 2008 8:51 am

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby asmecher » Wed Nov 26, 2008 12:33 pm

Hi Josi,

OJS will automatically index new files as they come in, so you shouldn't need to rebuild the index (though it's probably a good idea to do it occasionally anyway). I'm not sure whether or not the changes you've made will cause problems -- could you describe them further?

The error message you quoted ("Illegal entry in bfchar block in ToUnicode CMap") is probably coming from the text extraction tool, e.g. pdf2text, so I'd suggest investigating that further. It's most likely not something that'll cause a problem.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7746
Joined: Wed Aug 10, 2005 12:56 pm

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby josipkp » Wed Nov 26, 2008 2:35 pm

About the changes:
1) In /ojs/classes/journal/JournalDAO.inc.php
created a new function getJournalsForRebuildIndex($rangeInfo = null) to select specific journals (I thought that the problem was in one journal)
based on getJournals($rangeInfo = null)
function &getJournalsForRebuildIndex($rangeInfo = null) {
$result = &$this->retrieveRange(
'SELECT * FROM journals
WHERE enabled=1
ORDER BY journal_id', false, $rangeInfo
);
$returner = &new DAOResultFactory($result, $this, '_returnJournalFromRow');
return $returner;


2) In /var/www/ojs/classes/search/ArticleSearchIndex.inc.php
to include some counters; lines added are in bold
function rebuildIndex($log = false) {
// Clear index
if ($log) echo 'Clearing index ... ';
$searchDao = &DAORegistry::getDAO('ArticleSearchDAO');
// FIXME Abstract into ArticleSearchDAO?
$searchDao->update('DELETE FROM article_search_object_keywords');
$searchDao->update('DELETE FROM article_search_objects');
$searchDao->update('DELETE FROM article_search_keyword_list');
$searchDao->setCacheDir(Config::getVar('files', 'files_dir') . '/_db');
$searchDao->_dataSource->CacheFlush();
if ($log) echo "done\n";
// Build index
$journalDao = &DAORegistry::getDAO('JournalDAO');
$articleDao = &DAORegistry::getDAO('ArticleDAO');

// 081125 to select specific journals
// $journals = &$journalDao->getJournals();
$journals = &$journalDao->getJournalsForRebuildIndex();


$numTotal = 1;
while (!$journals->eof()) {
$journal = &$journals->next();
$numIndexed = 0;

print("\n--------\n");
if ($log)
echo "Indexing \"", $journal->getJournalTitle(), "\" ... ";

// 081125 header
print(str_pad(" ",74-strlen($journal->getJournalTitle()),"#"));
print("\n--------\n");
Print(" TOTAL\t SEQ\tJOURNAL\t\t\t\tSUBMISSION DATE\t\tARTICLE_ID\n");


$articles = &$articleDao->getArticlesByJournalId($journal->getJournalId());
while (!$articles->eof()) {
$article = &$articles->next();

//081125 counters
printf("%10d\t%5d",$numTotal,$numIndexed+1);
printf("\t%-30s\t",substr($journal->getJournalTitle(),0,30));
//-------------------

if ($article->getDateSubmitted()) {
ArticleSearchIndex::indexArticleMetadata($article);
ArticleSearchIndex::indexArticleFiles($article);
$numIndexed++;
//081125 counters
print($article->getDateSubmitted()); print("\t");
} else {
print("\t\t\t");
//-------------------

}

//081125 counters
printf("%10d\n",$article->getArticleId());
$numTotal++;
//-------------------


unset($article);
}

if ($log) echo $numIndexed, " articles indexed\n";
unset($journal);
}
}

}


I will run pdftotext under the article_ids PDFs to know more about the error message.
Thank you very much for your good project and good support.
Josi Perez
josipkp
 
Posts: 61
Joined: Fri Jun 27, 2008 8:51 am

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby asmecher » Wed Nov 26, 2008 7:41 pm

Hi Josi,

Now that you've got the memory_limit increased, and now that the index is rebuilt, those changes shouldn't be necessary. I suspect you'll see those debug outputs e.g. when you upload a new PDF galley, as indexing occurs when a new galley is saved.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7746
Joined: Wed Aug 10, 2005 12:56 pm

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby peterdietz » Thu May 05, 2011 12:28 pm

Hi,

Sorry for posting in an old thread, but it was the most relevant.

We host multiple journals at our University, and one particular journal is much larger than the others. I noticed something was odd in the database, so on our development machine I ran tools/rebuildSearchIndex.php, and I got:
Indexing "Journal Name Goes Here" ... PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 40961 bytes) in /var/www/html/ojs/lib/pkp/lib/phputf8/utils/ascii.php on line 94
PHP Stack trace:
PHP 1. {main}() /var/www/html/ojs/tools/rebuildSearchIndex.php:0
PHP 2. rebuildSearchIndex->execute() /var/www/html/ojs/tools/rebuildSearchIndex.php:42
PHP 3. ArticleSearchIndex->rebuildIndex() /var/www/html/ojs/tools/rebuildSearchIndex.php:36
PHP 4. ArticleSearchIndex->indexArticleFiles() /var/www/html/ojs/classes/search/ArticleSearchIndex.inc.php:267
PHP 5. ArticleSearchIndex->updateFileIndex() /var/www/html/ojs/classes/search/ArticleSearchIndex.inc.php:231
PHP 6. ArticleSearchIndex->indexObjectKeywords() /var/www/html/ojs/classes/search/ArticleSearchIndex.inc.php:81
PHP 7. ArticleSearchIndex->filterKeywords() /var/www/html/ojs/classes/search/ArticleSearchIndex.inc.php:37
PHP 8. Core->cleanVar() /var/www/html/ojs/classes/search/ArticleSearchIndex.inc.php:112
PHP 9. String->utf8_strip_ascii_ctrl() /var/www/html/ojs/lib/pkp/classes/core/Core.inc.php:70
PHP 10. utf8_strip_ascii_ctrl() /var/www/html/ojs/lib/pkp/classes/core/String.inc.php:538
PHP 11. ob_start() /var/www/html/ojs/lib/pkp/lib/phputf8/utils/ascii.php:94


You can see that I've bumped PHP memory to 512M to run this and it still exhausts. I'm wondering if this problem needs more memory, or if the job can be set to run more efficiently, or without consuming so much resources.

Thank you
peterdietz
 
Posts: 12
Joined: Mon Feb 15, 2010 12:09 pm

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby asmecher » Thu May 05, 2011 1:04 pm

Hi Peter,

What version of PHP are you using, and what version of OJS?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7746
Joined: Wed Aug 10, 2005 12:56 pm

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby peterdietz » Fri May 06, 2011 9:26 am

I bumped out memory usage up to 768M and it did complete. To us thats a lot of memory.

-bash-3.2$ php tools/rebuildSearchIndex.php
Clearing index ... done
Indexing "JOURNAL-1" ... 6 articles indexed
Indexing "JOURNAL-2" ... 166 articles indexed
Indexing "JOURNAL-3" ... 71 articles indexed
Indexing "JOURNAL-4" ... 1074 articles indexed
Indexing "JOURNAL-5" ... 18 articles indexed

The thing that might be the smoking gun is that the database has a total of 1,730,073 records and takes up 102.8 MiB.
article_search_index_keywords has 1,606,598 records and taking up 87.4 MiB

Our uploads directory is 561M. Simple grepping shows that it has about 300 .pdf's and 800 .doc's

I do admit that the searching mechanism is very accurate.

OJS is on "Open Journal Systems 2.3.3.2".

PHP:
-bash-3.2$ php --version
PHP 5.2.9 (cli) (built: Jul 8 2009 06:03:36)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2009 Zend Technologies
with Xdebug v2.0.5, Copyright (c) 2002-2008, by Derick Rethans

This is on a Red Hat 5 virtual machine. Its possible that it being a VM as opposed to a physical machine could be the "slower" disk. But we have several applications on the server, and we don't typically run into problems such as these.


-bash-3.2$ mysql --version
mysql Ver 14.12 Distrib 5.0.77, for redhat-linux-gnu (i386) using readline 5.1

Our database is MyISAM, utf8_general_ci
peterdietz
 
Posts: 12
Joined: Mon Feb 15, 2010 12:09 pm

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby asmecher » Fri May 06, 2011 9:53 am

Hi Peter,

The indexing tool goes through content in a linear fashion, so there's no need for it to keep everything "open" in order to perform the indexing. However, there may well be a bit of caching code that progressively builds up an in-memory representation, and perhaps that would be best disabled on a batch indexing operation. (It means things will run a lot slower, but there won't be a high memory requirement imposed.)

Often servers will support two php.ini configuration files -- one for web-based operations and one for command-line tools -- and often the web-based one will have limits set, such as memory and execution time, while command-line one will allow more leeway.

I've posted a bug entry to review this -- see http://pkp.sfu.ca/bugzilla/show_bug.cgi?id=6635 -- but in the meantime I'd suggest trusting the incremental indexing (which should happen automatically), and if need be, cranking up the memory available to perform a full re-index. We do have larger searching/indexing plans, including Lucene support for larger installs, but they haven't yet been scheduled for attention. Let us know if this becomes a problem and we'll try to prioritize based on feedback.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7746
Joined: Wed Aug 10, 2005 12:56 pm

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby ramon » Thu Oct 27, 2011 9:10 am

Hello all,

Could the error "Error: Illegal entry in bfchar block in ToUnicode CMap Error: Illegal entry in bfchar block in ToUnicode CMap" be caused by mixed character set in the database?
I'm assisting an institution that has failed to upgrade.
Checking their setup I confirmed that:
  1. No connection charset were defined in config.inc.php,
  2. Charset normalization was set to On
  3. Browser charset connectiosn was set to utf-8
  4. MySQL database is latin1_general_ci as well as tables, but data has mixed and doubled enconding (unable to fully clean the dump by manually editing the file or running iconv or other tools...)
ramon
 
Posts: 923
Joined: Wed Oct 15, 2003 6:15 am
Location: Brasí­lia/DF - Brasil

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby jmacgreg » Thu Nov 03, 2011 4:55 pm

Hi Ramon,

Have you investigated whether the bfchar error could have been from pdftotext, as Alec says above? On a side note, having different encoding scattered through the system, DB, etc. would certainly be causing things to be difficult.

Cheers,
James
jmacgreg
 
Posts: 4162
Joined: Tue Feb 14, 2006 10:50 am

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby ramon » Wed Nov 09, 2011 5:50 am

Hello all,

Just to keep you posted.
I've recommended fixing the character set definitions and "cleaning" up the database, upgrading to the latest version and only then run the indexing.
I believe this will reduce the amount of variables that may cause the problem.

We'll have to wait until their done to check for indexing issues, if any.
ramon
 
Posts: 923
Joined: Wed Oct 15, 2003 6:15 am
Location: Brasí­lia/DF - Brasil

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby raickonen » Thu Sep 12, 2013 12:10 pm

Hello OJS team,

Is this patch fully compatible with OJS 2.4.2? RebuildSearchIndex takes server down eating all memory.

https://github.com/pkp/pkp-lib/commit/06bc11f5ecfb25ceffd612a3e7c8aef193c21903

Luciano
raickonen
 
Posts: 34
Joined: Tue Nov 16, 2010 10:39 am

Re: RebuildSearchIndex with Allowed memory size exhausted

Postby asmecher » Thu Sep 12, 2013 12:38 pm

Hi Luciano,

Try applying it with the --dry-run option first to test compatibility; if that works, then remove the --dry-run option to actually apply it.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7746
Joined: Wed Aug 10, 2005 12:56 pm

Next

Return to OJS Technical Support

Who is online

Users browsing this forum: No registered users and 2 guests