OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Harvesting not all record

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Harvesting not all record

Postby alexukua » Tue Dec 13, 2011 5:51 am

Hi
I have archive http://eprints.zu.edu.ua/cgi/oai2 total count 4262
Aftter harvester i have
http://oai.org.ua/index.php/browse/index/23

records 1481
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: Harvesting not all record

Postby asmecher » Tue Dec 13, 2011 10:50 am

Hi alexukua,

Have you tried using the command-line harvester (tools/harvest.php)? It's generally more reliable for longer harvests; many web servers will be configured to stop processes that run for more than a few seconds.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8321
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting not all record

Postby alexukua » Tue Dec 13, 2011 11:38 am

Hi asmecher

php /usr/share/harvester/tools/harvest.php 23
Selected archive: Zhytomyr State University Library
Fetching records...
Finished:
1487 records indexed
542 seconds elapsed
2.74 records per second
0 records kept from past harvests
1487 records total.

I have detail information system not indexer first records
for example
OAI Record: oai:eprints.zu.edu.ua:99 not index harvester.
The first record in database harvester (table records)
oai:eprints.zu.edu.ua:1016

All the best
Alex
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: Harvesting not all record

Postby asmecher » Tue Dec 13, 2011 12:17 pm

Hi Alex,

Could you try harvesting with the "verbose" option? Is there a chance that the repository is serving up duplicate record IDs to something else that's already been harvested?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8321
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting not all record

Postby alexukua » Tue Dec 13, 2011 12:46 pm

Hi Asmecher
1. Flush metadata for archive
2. php /usr/share/harvester/tools/harvest.php 23
Selected archive: Zhytomyr State University Library
Fetching records...
Finished:
1377 records indexed
350 seconds elapsed
3.93 records per second
0 records kept from past harvests
1377 records total.
it took several days
3. Flush ALL archive, table records is emptly
4. php /usr/share/harvester/tools/harvest.php 23 verbose
Selected archive: Zhytomyr State University Library
Fetching records...
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... fix=oai_dc
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D3658
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D4041
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D4175
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D4290
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D4415
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D4581
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D4698
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D4812
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D4933
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D5075
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D5206
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D5320
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb= ... t%253D5487
Finished:
1271 records indexed
291 seconds elapsed
4.37 records per second
0 records kept from past harvests
1271 records total.

For me seem every next day (next date) OHS harvest less count records
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: Harvesting not all record

Postby alexukua » Sun Dec 18, 2011 8:48 am

Do you have any idea?
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: Harvesting not all record

Postby asmecher » Mon Dec 19, 2011 10:45 am

Hi Alex,

I get the following:
Code: Select all
# php tools/harvest.php 5 flush verbose
Selected archive: http://eprints.zu.edu.ua/cgi/oai2
Flushing metadata index for archive... 4 records deleted.
Fetching records...
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&metadataPrefix=oai_dc
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D3114
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4139
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4268
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4393
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4557
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4677
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4792
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4912
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D5052
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D5183
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D5297
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D5447
Finished:
   1222 records indexed
   187 seconds elapsed
   6.53 records per second
   0 records kept from past harvests
   1222 records total.
The repository appears to be serving up 100 records per request, so 13 requests corresponds to between 1200 and 1300 records. It seems to be behaving as expected. (The total harvest took just over 3 minutes.)

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8321
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting not all record

Postby alexukua » Mon Dec 19, 2011 11:59 am

asmecher wrote:Hi Alex,

I get the following:
Code: Select all
# php tools/harvest.php 5 flush verbose
Selected archive: http://eprints.zu.edu.ua/cgi/oai2
Flushing metadata index for archive... 4 records deleted.
Fetching records...
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&metadataPrefix=oai_dc
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D3114
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4139
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4268
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4393
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4557
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4677
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4792
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D4912
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D5052
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D5183
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D5297
Harvest URL: http://eprints.zu.edu.ua/cgi/oai2?verb=ListRecords&resumptionToken=metadataPrefix%253Doai_dc%2526offset%253D5447
Finished:
   1222 records indexed
   187 seconds elapsed
   6.53 records per second
   0 records kept from past harvests
   1222 records total.
The repository appears to be serving up 100 records per request, so 13 requests corresponds to between 1200 and 1300 records. It seems to be behaving as expected. (The total harvest took just over 3 minutes.)

Regards,
Alec Smecher
Public Knowledge Project Team


Howewer repository (Zhytomyr State University Library) have more 4000 records, whay did't harvest all records?
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: Harvesting not all record

Postby asmecher » Mon Dec 19, 2011 12:29 pm

Hi Alex,

In the contents of the last request noted above, there is no resumption token provided. That means the repository did not have more records available.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8321
Joined: Wed Aug 10, 2005 12:56 pm


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 0 guests