OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



XML Import very slow

Are you responsible for making OJS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
What to do if you have a technical problem with OJS:

1. Search the forum. You can do this from the Advanced Search Page or from our Google Custom Search, which will search the entire PKP site. If you are encountering an error, we especially recommend searching the forum for said error.

2. Check the FAQ to see if your question or error has already been resolved.

3. Post a question, but please, only after trying the above two solutions. If it's a workflow or usability question you should probably post to the OJS Editorial Support and Discussion subforum; if you have a development question, try the OJS Development subforum.

XML Import very slow

Postby nef » Fri Nov 08, 2013 2:18 am

Hi
After upgrading to OJS 2.4.2 we are experiencing a distinct reduction in the import speed when we use the Articles & Issues XML Plugin. The import session also uses much more memory, so now we are only able to import very few files at a time.
Hope the problem will be fixed.
Best regards
Niels Erik
nef
 
Posts: 225
Joined: Fri Jun 01, 2007 2:56 am
Location: Aarhus, Denmark

Re: XML Import very slow

Postby asmecher » Fri Nov 08, 2013 10:25 am

Hi Niels Erik,

The speed is probably due to the full-text indexing; the import itself is reasonably fast but extracting and indexing the text may take quite a while depending on the amount of content in the installation. Unfortunately there are limits to the inverted-index approach we use; you might want to consider using Lucene if your installation is getting quite large.

The memory limit problem can probably be solved by patching https://github.com/pkp/pkp-lib/commit/cc4be218320d7cf4480a3dc1b15a54b29d3e5a6e.diff (don't worry about the tests, they aren't included with the tarball).

If you're doing a lot of batch importing, you can truncate the article_search_keyword_list, article_search_object_keywords, and article_search_objects tables and reindex the database afterwards using tools/rebuildSearchIndex.php when the importing is finished. That last step will take quite a while.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7709
Joined: Wed Aug 10, 2005 12:56 pm

Re: XML Import very slow

Postby tgc99 » Tue Nov 12, 2013 1:47 am

asmecher wrote:The speed is probably due to the full-text indexing; the import itself is reasonably fast but extracting and indexing the text may take quite a while depending on the amount of content in the installation. Unfortunately there are limits to the inverted-index approach we use; you might want to consider using Lucene if your installation is getting quite large.

According to Niels Erik we're not doing full-text indexing.
When I look at the system while the import is running it seems most of the time is spent doing INSERT in the article_search_object_keywords table in the database (pg 8.4), one row at a time.

Code: Select all
postgres=# select current_query from pg_stat_activity ;
                                         current_query                                         
------------------------------------------------------------------------------------------------
 <IDLE>
 <IDLE>
 INSERT INTO article_search_object_keywords (object_id, keyword_id, pos) VALUES ( $1,  $2,  $3)
 select current_query from pg_stat_activity ;
(4 rows)

ojsdev=# select count(*) from article_search_object_keywords ;
  count   
----------
 23698189
(1 row)


Admittedly that is quite a lot of rows.
It seems to have grown from 21m to 23m rows since the upgrade which I suppose could be why it is now suddenly slower.

Point taken about Lucene/Solr, I'll investigate.

The memory limit problem can probably be solved by patching https://github.com/pkp/pkp-lib/commit/cc4be218320d7cf4480a3dc1b15a54b29d3e5a6e.diff (don't worry about the tests, they aren't included with the tarball).

This seems to have completely fixed the memory use issue. Thank you.

-tgc
tgc99
 
Posts: 56
Joined: Thu Oct 18, 2007 3:50 am
Location: Aarhus, Denmark


Return to OJS Technical Support

Who is online

Users browsing this forum: Google [Bot] and 2 guests