OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Harvesting less records than real

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Harvesting less records than real

Postby josipkp » Wed Jul 15, 2009 4:55 pm

Hi,

We are trying to harvester Actas from DSpace at UMinho Portugal.
Browsing the DSpace community, there is 42 itens.
The same itens harvesting by hand:
http://repositorium.sdum.uminho.pt/oai/ ... l_1822_831

Using Harvester2 we just collect 3 records: the last 3 records visible in the hand harvesting.
What could be happen?

Thanks in advance,
Josi Perez
josipkp
 
Posts: 61
Joined: Fri Jun 27, 2008 8:51 am

Re: Harvesting less records than real

Postby asmecher » Thu Jul 16, 2009 7:33 am

Hi Josi,

Are the harvest from / to dates blank when you submit the harvest form, and are any specific sets selected?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7737
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting less records than real

Postby josipkp » Thu Jul 16, 2009 5:57 pm

Hi,

It was the first harvesting: no dates, but yes, we select 2 specific sets.
When the harvester didn't harvest all records expected using the web interface, we tried to harvester using harvester/tools.

The command used:
php harvest.php 50 flush verbose set=hdl_1822_831 set=hdl_1822_830

Harvesting just the set hdl_1822_830 results zero records, but there are around 50.
No success trying to divide the range:
php harvest.php 50 flush verbose set=hdl_1822_831 from=2000-01-01 (3 records)
php harvest.php 50 flush verbose set=hdl_1822_831 from=2000-01-01 until=2009-12-30 (3 records)

Changing the data granularity results error message.
Any suggestions?

Thank you for your support.
Josi Perez
josipkp
 
Posts: 61
Joined: Fri Jun 27, 2008 8:51 am

Re: Harvesting less records than real

Postby asmecher » Fri Jul 17, 2009 10:33 am

Hi Josi,

What version of the harvester are you using?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7737
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting less records than real

Postby josipkp » Fri Jul 17, 2009 1:47 pm

Hi,

Using Harvester2 version 2.3.0.0

Thank you for your efforts.
Josi Perez
josipkp
 
Posts: 61
Joined: Fri Jun 27, 2008 8:51 am

Re: Harvesting less records than real

Postby asmecher » Fri Jul 17, 2009 3:42 pm

Hi Josi,

I managed to harvest 70 records from the two sets. Is it possible that you have several records from different archives that share the same identifier?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7737
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting less records than real

Postby josipkp » Sat Jul 18, 2009 11:20 am

Gotcha!
We decided to divide DSpace at UMinho in more than one set, creating categories (Thesis, Journals, Books and so) and I kept the entire Uminho while not harvested each category. :oops:

Thank you,
Josi Perez
josipkp
 
Posts: 61
Joined: Fri Jun 27, 2008 8:51 am

Re: Harvesting less records than real

Postby josipkp » Fri Sep 25, 2009 6:04 pm

Hi,

I need a help to harvester a journal from SCielo.
I received the following error message when collecting from tools/harvester.php:
Errors/Warnings:
Missing or empty metadataPrefix

I tried to divide in small pieces of time and observed that the error occurs when the interval of time have more than 30 records, the number send by SCielo on each page.
For example:
http://www.scielo.br/oai/scielo-oai.php ... 2009-01-01
is the first page and have resumptionToken.
Executing the command manually in a web page we have response:
http://www.scielo.br/oai/scielo-oai.php ... 2009-01-01
but these records I can not found on Harvester2. On each interval I can just to collect 30 records.

Can you help me to discover the solution?

Thanks in advance,
Josi Perez
Univerciencia.org
josipkp
 
Posts: 61
Joined: Fri Jun 27, 2008 8:51 am

Re: Harvesting less records than real

Postby otuften » Thu Oct 01, 2009 12:20 am

Hi,
it seems to me to be an error in the OAI gateway scielo-oai.php.
The script demands metadataPrefix in addition to resumptionToken, but the OAI guidelines say that resumptionToken is an exclusive argument and should thus be the only in addition to the verb.
So PKP harvester sends only verb=ListRecords and resumptionToken=..., which should be accepted.

Regards
Olav
otuften
 
Posts: 5
Joined: Tue Mar 10, 2009 9:50 am


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 2 guests