slow harvesting?

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
The Public Knowledge Project Support Forum is moving to

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.
Posts: 5
Joined: Tue May 23, 2006 11:04 am

slow harvesting?

Postby gerold » Fri Jul 21, 2006 5:33 am


Installation of harvester2 worked good.

for correctly harvesting I had to increase "memory_limit =8M" in php.ini.

Problem: Harvesting is very slow. first 1000 papers are harvested in 1 minutes. Afer 20.000 papers it takes about 2 Minutes for 100 papers.

I use SUSE 9.3, PHP 5.0.3, MySQL 4.1.10a, Apache 2.0.35

is this normal?



Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm

Postby asmecher » Fri Jul 21, 2006 8:15 am

Hi Gerold,

Due to the way the Harvester indexes keywords, harvesting will proceed more slowly as more papers are indexed; this is an unavoidable part of indexing, although we'll likely be looking into further optimization in the next release. I'd suggest using the Harvester to perform daily, incremental updates -- we'll be providing a command-line tool for the next release that will make this easier. This way, large harvests will be unnecessary once the repository has been loaded initially.

FYI, I've experienced massive differences in performance with different versions of MySQL on different platforms -- for example, MySQL 4.x on Windows performs abysmally.

Alec Smecher
Public Knowledge Project Team

Return to “Open Harvester Systems Support and Development”

Who is online

Users browsing this forum: No registered users and 1 guest