by asmecher » Fri Jul 21, 2006 8:15 am
Hi Gerold,
Due to the way the Harvester indexes keywords, harvesting will proceed more slowly as more papers are indexed; this is an unavoidable part of indexing, although we'll likely be looking into further optimization in the next release. I'd suggest using the Harvester to perform daily, incremental updates -- we'll be providing a command-line tool for the next release that will make this easier. This way, large harvests will be unnecessary once the repository has been loaded initially.
FYI, I've experienced massive differences in performance with different versions of MySQL on different platforms -- for example, MySQL 4.x on Windows performs abysmally.
Regards,
Alec Smecher
Public Knowledge Project Team