OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Harvesting benchmark

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
The Public Knowledge Project Support Forum is moving to http://forum.pkp.sfu.ca

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.

Harvesting benchmark

Postby sklima » Fri Mar 16, 2007 12:44 am

Hi,

I installed succesfully Harvester2. Harvester2 is intalled on a machine with these features :

1. Linux Open Suse 10.2
2. Apache 2
3. PHP 5.x
4. Mysql 5.x

I created an OAI repository (OAICat) with contains first 100 records. The harvest was succesfull. So i increased the number of records to 1000, then 10000.

I obtained a fatal error "Allowed memory size...". So i increased the memory limit of the module PHP, i set the memory limit to 256Mo. After that, the harvest was succesfull.

I increased the number of records to 20000 then 50000. The harvest failed, all records have not been harvested. The harvest was launched in batch mode and Harvester2 hasn't generated errors.

I have no idea on this problem. Can you help me?

Thanks in advance.
sklima
 
Posts: 3
Joined: Thu Feb 08, 2007 5:13 am
Location: France

Postby asmecher » Fri Mar 16, 2007 8:36 am

Hi Sklima,

Were you specifying an incremental update (i.e. "from=last") on the command line?

If not, and there were no error messages, please contact me personally with your OAI URL so I can test. Often this sort of problem is caused by illegal characters in the OAI XML.

Regards,
Alec Smecher
Public Knowledge Project Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm

Postby sklima » Thu Mar 29, 2007 12:37 am

Hi Asmecher,

No, I didn't specify an incremental update on the command line. Here, an example of my command line :

php -f /srv/wwww/pkp/tools/harvest.xsp 10

I have created the OAI repository which doesn't contain illegal characters. Here, an example of OAI XML :

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2007-03-29T07:32:59Z</responseDate>
<request metadataPrefix="oai_dc" verb="ListRecords">http://localhost:8081/oaicat/OAIHandler</request>
<ListRecords>
<record>
<header>
<identifier>oai:oaicat.oclc.org:2001/ocm10723273.xml</identifier>
<datestamp>2007-01-25T13:37:14Z</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Titre 1</dc:title>
<dc:title>Titre 2</dc:title>
<dc:creator>Createur 1</dc:creator>
<dc:creator>Createur 2</dc:creator>
<dc:subject>Marriage law--Korea (North)</dc:subject>
<dc:subject>Domestic relations--Korea (North)</dc:subject>
<dc:description>Thesis (doctoral), 1975.</dc:description>
<dc:description>Thesis (doctoral)</dc:description>
<dc:description>Bibliography: p. 267-272.</dc:description>
<dc:date>1975</dc:date>
<dc:type>Electronic Thesis or Dissertation</dc:type>
<dc:identifier>id1</dc:identifier>
</oai_dc:dc>
</metadata>
</record>
....

Unfortunetely, my OAI repository is not installed on DMZ, so is not available.

Furthemore, the harvest of this repository works good with an another harvester like OCLC - OAIHarvester2.

Regards,

Sébastien Klima
sklima
 
Posts: 3
Joined: Thu Feb 08, 2007 5:13 am
Location: France

Postby asmecher » Thu Mar 29, 2007 9:11 am

Hi Sébastien,

Try specifying the "flush" and "verbose" options to the harvest, i.e.:
Code: Select all
php -f /srv/wwww/pkp/tools/harvest.xsp 10 verbose flush
(FYI, using the "skipIndexing" option as well will improve performance for large harvests; you'll have to generate the search index afterwards using tools/rebuildSearchIndex.php.)

See how far it gets -- also I'd suggest checking your PHP configuration to make sure that display_errors is set to E_ALL and your memory_limit and max_execution_time parameters are set sufficiently high.

Regards,
Alec Smecher
Open Journal Systems Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest