OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Harvesting benchmark

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Harvesting benchmark

Postby sklima » Fri Mar 16, 2007 12:44 am

Hi,

I installed succesfully Harvester2. Harvester2 is intalled on a machine with these features :

1. Linux Open Suse 10.2
2. Apache 2
3. PHP 5.x
4. Mysql 5.x

I created an OAI repository (OAICat) with contains first 100 records. The harvest was succesfull. So i increased the number of records to 1000, then 10000.

I obtained a fatal error "Allowed memory size...". So i increased the memory limit of the module PHP, i set the memory limit to 256Mo. After that, the harvest was succesfull.

I increased the number of records to 20000 then 50000. The harvest failed, all records have not been harvested. The harvest was launched in batch mode and Harvester2 hasn't generated errors.

I have no idea on this problem. Can you help me?

Thanks in advance.
sklima
 
Posts: 3
Joined: Thu Feb 08, 2007 5:13 am
Location: France

Postby asmecher » Fri Mar 16, 2007 8:36 am

Hi Sklima,

Were you specifying an incremental update (i.e. "from=last") on the command line?

If not, and there were no error messages, please contact me personally with your OAI URL so I can test. Often this sort of problem is caused by illegal characters in the OAI XML.

Regards,
Alec Smecher
Public Knowledge Project Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 9050
Joined: Wed Aug 10, 2005 12:56 pm

Postby sklima » Thu Mar 29, 2007 12:37 am

Hi Asmecher,

No, I didn't specify an incremental update on the command line. Here, an example of my command line :

php -f /srv/wwww/pkp/tools/harvest.xsp 10

I have created the OAI repository which doesn't contain illegal characters. Here, an example of OAI XML :

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2007-03-29T07:32:59Z</responseDate>
<request metadataPrefix="oai_dc" verb="ListRecords">http://localhost:8081/oaicat/OAIHandler</request>
<ListRecords>
<record>
<header>
<identifier>oai:oaicat.oclc.org:2001/ocm10723273.xml</identifier>
<datestamp>2007-01-25T13:37:14Z</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Titre 1</dc:title>
<dc:title>Titre 2</dc:title>
<dc:creator>Createur 1</dc:creator>
<dc:creator>Createur 2</dc:creator>
<dc:subject>Marriage law--Korea (North)</dc:subject>
<dc:subject>Domestic relations--Korea (North)</dc:subject>
<dc:description>Thesis (doctoral), 1975.</dc:description>
<dc:description>Thesis (doctoral)</dc:description>
<dc:description>Bibliography: p. 267-272.</dc:description>
<dc:date>1975</dc:date>
<dc:type>Electronic Thesis or Dissertation</dc:type>
<dc:identifier>id1</dc:identifier>
</oai_dc:dc>
</metadata>
</record>
....

Unfortunetely, my OAI repository is not installed on DMZ, so is not available.

Furthemore, the harvest of this repository works good with an another harvester like OCLC - OAIHarvester2.

Regards,

Sébastien Klima
sklima
 
Posts: 3
Joined: Thu Feb 08, 2007 5:13 am
Location: France

Postby asmecher » Thu Mar 29, 2007 9:11 am

Hi Sébastien,

Try specifying the "flush" and "verbose" options to the harvest, i.e.:
Code: Select all
php -f /srv/wwww/pkp/tools/harvest.xsp 10 verbose flush
(FYI, using the "skipIndexing" option as well will improve performance for large harvests; you'll have to generate the search index afterwards using tools/rebuildSearchIndex.php.)

See how far it gets -- also I'd suggest checking your PHP configuration to make sure that display_errors is set to E_ALL and your memory_limit and max_execution_time parameters are set sufficiently high.

Regards,
Alec Smecher
Open Journal Systems Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 9050
Joined: Wed Aug 10, 2005 12:56 pm


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 2 guests