OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



additional archives not seen

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Re: additional archives not seen

Postby alexukua » Tue Dec 02, 2008 11:17 am

A not get all records from http://eprints.ksame.kharkov.ua/cgi/oai2
I have erorr
Code: Select all
The metadata index could not be updated. The following error(s) occurred:

    * An unknown error occurred.
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: additional archives not seen

Postby asmecher » Tue Dec 02, 2008 12:49 pm

Hi alexukua,

I suspect that you're encountering either a PHP error or an XML parsing error (typically caused by a data source serving up invalid XML, usually because of invalid UTF-8 characters). Have you checked your PHP error log to see if something shows up there?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8833
Joined: Wed Aug 10, 2005 12:56 pm

Re: additional archives not seen

Postby alexukua » Tue Dec 02, 2008 1:34 pm

error.log Apache is clean. Now the problem has another view For this archive, harvester2 return only 400 ( sometimes only 4 or 6 record after flush).
My system tested for:
Apache + WinXP+PHP+PHP Version 5.2.4 (http://www.denwer.ru/)
Debian +PHP+APache
Win2000+PHP+IIS

P.S. This problem repeat for http://repository.ibss.org.ua/dspace-oai/request (From 901 records return rough 500)

Resume: Harvester2 corect work with only one archive (if use data provider Eprints or Dspace) ?
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: additional archives not seen

Postby asmecher » Wed Dec 03, 2008 10:54 am

Hi alexukua,

This message is coming from the XML parser, so it might be helpful to use the "verbose" option to the command-line harvester. This will dump the harvest request URL to the screen whenever a request is made and may help to identify what content is causing the problem.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8833
Joined: Wed Aug 10, 2005 12:56 pm

Re: additional archives not seen

Postby alexukua » Wed Dec 03, 2008 1:59 pm

For example

Code: Select all
lili:/usr/share/harvester/tools# php ./harvest.php 5 verbose
Selected archive: IBSS Repository
Fetching records...
Harvest URL: http://repository.ibss.org.ua/dspace-oai/request?verb=ListRecords&metadataPrefix=oai_dc
50 records indexed.
100 records indexed.
Handling resumption token "0001-01-01T00:00:00Z/9999-12-31T23:59:59Z//oai_dc/100"
150 records indexed.
200 records indexed.
Handling resumption token "0001-01-01T00:00:00Z/9999-12-31T23:59:59Z//oai_dc/200"
250 records indexed.
300 records indexed.
Handling resumption token "0001-01-01T00:00:00Z/9999-12-31T23:59:59Z//oai_dc/300"
350 records indexed.
400 records indexed.
Handling resumption token "0001-01-01T00:00:00Z/9999-12-31T23:59:59Z//oai_dc/400"
450 records indexed.
500 records indexed.
Handling resumption token "0001-01-01T00:00:00Z/9999-12-31T23:59:59Z//oai_dc/500"
Finished:
        536 records indexed
        169 seconds elapsed
        3.17 records per second
        0 records kept from past harvests
        536 records total.


IBSS Repository (901 records)
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: additional archives not seen

Postby asmecher » Wed Dec 03, 2008 2:31 pm

Hi alexukua,

There is no error message here -- the harvest completed without problems.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8833
Joined: Wed Aug 10, 2005 12:56 pm

Re: additional archives not seen

Postby alexukua » Wed Dec 03, 2008 2:34 pm

But all data no harvesting
if compare with
http://roar.eprints.org/?action=home&q= ... mit=Filter

If I have one archive in harvester2, this soft get from http://eprints.ksame.kharkov.ua/cgi/oai ... fix=oai_dc get all data (5357 records)
If have several archives in harvester2 (now time) harvester2 get only 4060 records

Code: Select all
lili:/usr/share/harvester/tools# php ./harvest.php 2 verbose
Selected archive: Цифровий репозиторій Харківської національної академії міського господарства (ХНАМГ)
Fetching records...
Harvest URL: http://eprints.ksame.kharkov.ua/cgi/oai2?verb=ListRecords&metadataPrefix=oai_dc
50 records indexed.
100 records indexed.
Handling resumption token "archive/100/54490251/oai_dc"
150 records indexed.
200 records indexed.
Handling resumption token "archive/200/54490251/oai_dc"
250 records indexed.
300 records indexed.
Handling resumption token "archive/300/54490251/oai_dc"
350 records indexed.
400 records indexed.
Handling resumption token "archive/400/54490251/oai_dc"
450 records indexed.
500 records indexed.
Handling resumption token "archive/500/54490251/oai_dc"
550 records indexed.
600 records indexed.
Handling resumption token "archive/600/54490251/oai_dc"
650 records indexed.
700 records indexed.
Handling resumption token "archive/700/54490251/oai_dc"
750 records indexed.
800 records indexed.
Handling resumption token "archive/800/54490251/oai_dc"
850 records indexed.
900 records indexed.
Handling resumption token "archive/900/54490251/oai_dc"
950 records indexed.
1000 records indexed.
Handling resumption token "archive/1000/54490251/oai_dc"
1050 records indexed.
1100 records indexed.
Handling resumption token "archive/1100/54490251/oai_dc"
1150 records indexed.
1200 records indexed.
Handling resumption token "archive/1200/54490251/oai_dc"
1250 records indexed.
1300 records indexed.
Handling resumption token "archive/1300/54490251/oai_dc"
1350 records indexed.
1400 records indexed.
Handling resumption token "archive/1400/54490251/oai_dc"
1450 records indexed.
1500 records indexed.
Handling resumption token "archive/1500/54490251/oai_dc"
1550 records indexed.
1600 records indexed.
Handling resumption token "archive/1600/54490251/oai_dc"
1650 records indexed.
1700 records indexed.
Handling resumption token "archive/1700/54490251/oai_dc"
1750 records indexed.
1800 records indexed.
Handling resumption token "archive/1800/54490251/oai_dc"
1850 records indexed.
1900 records indexed.
Handling resumption token "archive/1900/54490251/oai_dc"
1950 records indexed.
2000 records indexed.
Handling resumption token "archive/2000/54490251/oai_dc"
2050 records indexed.
2100 records indexed.
Handling resumption token "archive/2100/54490251/oai_dc"
2150 records indexed.
2200 records indexed.
Handling resumption token "archive/2200/54490251/oai_dc"
2250 records indexed.
2300 records indexed.
Handling resumption token "archive/2300/54490251/oai_dc"
2350 records indexed.
2400 records indexed.
Handling resumption token "archive/2400/54490251/oai_dc"
2450 records indexed.
2500 records indexed.
Handling resumption token "archive/2500/54490251/oai_dc"
2550 records indexed.
2600 records indexed.
Handling resumption token "archive/2600/54490251/oai_dc"
2650 records indexed.
2700 records indexed.
Handling resumption token "archive/2700/54490251/oai_dc"
2750 records indexed.
2800 records indexed.
Handling resumption token "archive/2800/54490251/oai_dc"
2850 records indexed.
2900 records indexed.
Handling resumption token "archive/2900/54490251/oai_dc"
2950 records indexed.
3000 records indexed.
Handling resumption token "archive/3000/54490251/oai_dc"
3050 records indexed.
3100 records indexed.
Handling resumption token "archive/3100/54490251/oai_dc"
3150 records indexed.
3200 records indexed.
Handling resumption token "archive/3200/54490251/oai_dc"
3250 records indexed.
3300 records indexed.
Handling resumption token "archive/3300/54490251/oai_dc"
3350 records indexed.
3400 records indexed.
Handling resumption token "archive/3400/54490251/oai_dc"
3450 records indexed.
3500 records indexed.
Handling resumption token "archive/3500/54490251/oai_dc"
3550 records indexed.
3600 records indexed.
Handling resumption token "archive/3600/54490251/oai_dc"
3650 records indexed.
3700 records indexed.
Handling resumption token "archive/3700/54490251/oai_dc"
3750 records indexed.
3800 records indexed.
Handling resumption token "archive/3800/54490251/oai_dc"
3850 records indexed.
3900 records indexed.
Handling resumption token "archive/3900/54490251/oai_dc"
3950 records indexed.
4000 records indexed.
Handling resumption token "archive/4000/54490251/oai_dc"
4050 records indexed.
4100 records indexed.
Handling resumption token "archive/4100/54490251/oai_dc"
4150 records indexed.
4200 records indexed.
Handling resumption token "archive/4200/54490251/oai_dc"
4250 records indexed.
4300 records indexed.
Handling resumption token "archive/4300/54490251/oai_dc"
4350 records indexed.
4400 records indexed.
Handling resumption token "archive/4400/54490251/oai_dc"
4450 records indexed.
4500 records indexed.
Handling resumption token "archive/4500/54490251/oai_dc"
4550 records indexed.
4600 records indexed.
Handling resumption token "archive/4600/54490251/oai_dc"
4650 records indexed.
4700 records indexed.
Handling resumption token "archive/4700/54490251/oai_dc"
4750 records indexed.
4800 records indexed.
Handling resumption token "archive/4800/54490251/oai_dc"
4850 records indexed.
4900 records indexed.
Handling resumption token "archive/4900/54490251/oai_dc"
4950 records indexed.
5000 records indexed.
Handling resumption token "archive/5000/54490251/oai_dc"
5050 records indexed.
5100 records indexed.
Handling resumption token "archive/5100/54490251/oai_dc"
5150 records indexed.
5200 records indexed.
Handling resumption token "archive/5200/54490251/oai_dc"
5250 records indexed.
5300 records indexed.
Handling resumption token "archive/5300/54490251/oai_dc"
5350 records indexed.
5400 records indexed.
Handling resumption token "archive/5400/54490251/oai_dc"
5450 records indexed.
5500 records indexed.
Handling resumption token "archive/5500/54490251/oai_dc"
5550 records indexed.
Handling resumption token "deletion/0/54492029/oai_dc"
5600 records indexed.
5650 records indexed.
Handling resumption token "deletion/100/54492029/oai_dc"
5700 records indexed.
5750 records indexed.
Handling resumption token "deletion/200/54492029/oai_dc"
5800 records indexed.
5850 records indexed.
Handling resumption token "deletion/300/54492029/oai_dc"
5900 records indexed.
5950 records indexed.
Handling resumption token "deletion/400/54492029/oai_dc"
6000 records indexed.
6050 records indexed.
Handling resumption token "deletion/500/54492029/oai_dc"
6100 records indexed.
6150 records indexed.
Handling resumption token "deletion/600/54492029/oai_dc"
6200 records indexed.
6250 records indexed.
Handling resumption token "deletion/700/54492029/oai_dc"
6300 records indexed.
6350 records indexed.
Handling resumption token "deletion/800/54492029/oai_dc"
6400 records indexed.
6450 records indexed.
Handling resumption token "deletion/900/54492029/oai_dc"
6500 records indexed.
6550 records indexed.
Handling resumption token "deletion/1000/54492029/oai_dc"
6600 records indexed.
6650 records indexed.
Handling resumption token "deletion/1100/54492029/oai_dc"
6700 records indexed.
6750 records indexed.
Handling resumption token "deletion/1200/54492029/oai_dc"
6800 records indexed.
6850 records indexed.
Handling resumption token "deletion/1300/54492029/oai_dc"
6900 records indexed.
6950 records indexed.
Handling resumption token "deletion/1400/54492029/oai_dc"
7000 records indexed.
7050 records indexed.
Handling resumption token "deletion/1500/54492029/oai_dc"
7100 records indexed.
7150 records indexed.
Handling resumption token "deletion/1600/54492029/oai_dc"
7200 records indexed.
7250 records indexed.
Handling resumption token "deletion/1700/54492029/oai_dc"
7300 records indexed.
7350 records indexed.
Handling resumption token "deletion/1800/54492029/oai_dc"
7400 records indexed.
7450 records indexed.
Handling resumption token "deletion/1900/54492029/oai_dc"
7500 records indexed.
7550 records indexed.
Handling resumption token "deletion/2000/54492029/oai_dc"
7600 records indexed.
7650 records indexed.
Handling resumption token "deletion/2100/54492029/oai_dc"
7700 records indexed.
7750 records indexed.
Handling resumption token "deletion/2200/54492029/oai_dc"
7800 records indexed.
7850 records indexed.
Handling resumption token "deletion/2300/54492029/oai_dc"
7900 records indexed.
7950 records indexed.
Handling resumption token "deletion/2400/54492029/oai_dc"
Finished:
        4057 records indexed
        2547 seconds elapsed
        1.59 records per second
        3 records kept from past harvests
        4060 records total.
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: additional archives not seen

Postby asmecher » Thu Dec 04, 2008 12:45 am

Hi alexukua,

Is the other tool being used to harvest incrementally? If so, it may be a difference in how deletions are handled. If you're able to identify a record in one that's not in the other, it may help to track down the problem.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8833
Joined: Wed Aug 10, 2005 12:56 pm

Re: additional archives not seen

Postby alexukua » Thu Dec 04, 2008 4:44 am

Hi asmecher.
Ok. I think the problem bind with indexer MySQl. For example records confuse between archives. View attachment file.
Attachments
Hatvester.zip
(115.13 KiB) Downloaded 61 times
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: additional archives not seen

Postby asmecher » Thu Dec 04, 2008 10:34 am

Hi alexukua,

Is it possible that there are records in different archives with the same OAI identifier? If so, they will be replaced rather than duplicated.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8833
Joined: Wed Aug 10, 2005 12:56 pm

Re: additional archives not seen

Postby alexukua » Thu Dec 04, 2008 12:02 pm

Hi asmecher

All records (443) in Browse with other archive.
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Re: additional archives not seen

Postby asmecher » Thu Dec 04, 2008 12:26 pm

Hi alexukua,

This is consistent with duplicated OAI IDs. Have you checked for those?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8833
Joined: Wed Aug 10, 2005 12:56 pm

Re: additional archives not seen

Postby alexukua » Fri Dec 05, 2008 8:16 am

asmecher wrote:This is consistent with duplicated OAI IDs. Have you checked for those?

Problem solved, need change sting identification records in Eprints 3 configuration file oai.pl.
from $oai->{v2}->{archive_id} = "generic.eprints.org"; to $oai->{v2}->{archive_id} = "eprints.zu.edu.ua";

Thanks a lot!
alexukua
 
Posts: 32
Joined: Thu Oct 16, 2008 3:27 am

Previous

Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest