You are viewing the PKP Support Forum | PKP Home Wiki

slow harvesting?

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

slow harvesting?

Postby gerold » Fri Jul 21, 2006 5:33 am


Installation of harvester2 worked good.

for correctly harvesting I had to increase "memory_limit =8M" in php.ini.

Problem: Harvesting is very slow. first 1000 papers are harvested in 1 minutes. Afer 20.000 papers it takes about 2 Minutes for 100 papers.

I use SUSE 9.3, PHP 5.0.3, MySQL 4.1.10a, Apache 2.0.35

is this normal?


Posts: 5
Joined: Tue May 23, 2006 11:04 am

Postby asmecher » Fri Jul 21, 2006 8:15 am

Hi Gerold,

Due to the way the Harvester indexes keywords, harvesting will proceed more slowly as more papers are indexed; this is an unavoidable part of indexing, although we'll likely be looking into further optimization in the next release. I'd suggest using the Harvester to perform daily, incremental updates -- we'll be providing a command-line tool for the next release that will make this easier. This way, large harvests will be unnecessary once the repository has been loaded initially.

FYI, I've experienced massive differences in performance with different versions of MySQL on different platforms -- for example, MySQL 4.x on Windows performs abysmally.

Alec Smecher
Public Knowledge Project Team
Posts: 9910
Joined: Wed Aug 10, 2005 12:56 pm

Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest