OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



additional archives not seen

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

additional archives not seen

Postby akossov » Wed Apr 25, 2007 5:52 am

Dear List,

with harvester 2.0.0 on Windows XP I have just created the 1. archive (477 records) and two more archives, which did not appear. The number of all records grows permanently and is now 950, but in the archive which is seen the number does not change.

How can I show other archives and know the definite number of records?
Any help appreciated.

Truly yours
A. Kossovoi
akossov
 
Posts: 5
Joined: Wed Apr 25, 2007 5:49 am

Postby asmecher » Wed Apr 25, 2007 10:14 am

Hi akossov,

Please upgrade to version 2.0.1 -- there was a known issue with certain system configurations that caused this sort of problem. There are complete upgrade instructions in docs/README.

Regards,
Alec Smecher
Public Knowledge Project Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 8316
Joined: Wed Aug 10, 2005 12:56 pm

additional archives not seen

Postby akossov » Thu Apr 26, 2007 12:50 am

Dear Mr. Smecher,

thank you for a good advice.

Truly yours

A. Kossovoi
akossov
 
Posts: 5
Joined: Wed Apr 25, 2007 5:49 am

additional archives not seen

Postby akossov » Thu Apr 26, 2007 1:47 am

Dear Mr. Smecher ,

I have used the previous version of the harvester while with the last version of harvester on Win XP SP2 XAMPP (latest version of XAMPP) – I have Apache problem right after the successful installation of the harvester in the attempt to login.

The Apache error says:

[Thu Apr 26 10:28:44 2007] [notice] Server built: Mar 5 2007 11:23:00
[Thu Apr 26 10:28:44 2007] [notice] Parent: Created child process 2748
[Thu Apr 26 10:28:47 2007] [notice] Child 2748: Child process is running
[Thu Apr 26 10:28:47 2007] [notice] Child 2748: Acquired the start mutex.
[Thu Apr 26 10:28:47 2007] [notice] Child 2748: Starting 250 worker threads.
[Thu Apr 26 10:28:47 2007] [notice] Child 2748: Starting thread to listen on port 80.
[Thu Apr 26 10:28:47 2007] [notice] Child 2748: Starting thread to listen on port 443.
Error in my_thread_global_end(): 254 threads didn't exit
[Thu Apr 26 10:29:11 2007] [notice] Parent: child process exited with status 3221225477 -- Restarting.
[Thu Apr 26 10:29:13 2007] [notice] Apache/2.2.4 (Win32) DAV/2 mod_ssl/2.2.4 OpenSSL/0.9.8e mod_autoindex_color PHP/5.2.1 configured -- resuming normal operations
[Thu Apr 26 10:29:13 2007] [notice] Server built: Mar 5 2007 11:23:00
[Thu Apr 26 10:29:13 2007] [notice] Parent: Created child process 3072

I have done the installation of XAMPP and harvester again, but the problem remains.

Your advice is highly appreciated.

Truly yours

A. Kossovoi
akossov
 
Posts: 5
Joined: Wed Apr 25, 2007 5:49 am

Postby asmecher » Thu Apr 26, 2007 9:42 am

Hi akossov,

Looks like a PHP bug to me... a PHP script shouldn't be able to cause a segmentation fault, which looks like the case here. Two things to check:
  • See if Zend Optimizer is included in your XAMPP install, and if it is, try disabling it. Versions of Zend Optimizer older than the newest release are buggy.
  • Check your memory_limit in the php.ini configuration file; try increasing it to at least 12M or 16M. (Make sure you restart the web server to make the configuration changes take effect.)
Regards,
Alec Smecher
Open Journal Systems Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 8316
Joined: Wed Aug 10, 2005 12:56 pm

Postby akossov » Fri Apr 27, 2007 1:44 am

Dear Mr. Smecher,

thank you for the advices. I think that you are perfectly right - one should chek both.

At the moment the problem is solved: I used not the latest downloadable XAMPP, but XAMPP from the Author's book. There no problems at all. The harvester in the last version functions perfectly both on OpenSuse 10.2 and the book version of XAMPP for Win.

One more question.
Is it possible and how to harvest not a certain archive as a whole but only one set of it, which can be seen in "Archive Manage“ window? I mean how to harvest only a part of the archive limited to a certain specialty?

Thank you

Truely yours

A. Kossovoi
akossov
 
Posts: 5
Joined: Wed Apr 25, 2007 5:49 am

Postby asmecher » Fri Apr 27, 2007 9:12 am

Hi akossov,

Yes, you can harvest a single set either by using the web interface you mention, or by using the command-line interface and specifying the "set" parameter, e.g.:
Code: Select all
php tools/harvest.php 5 set=some.set.spec
...where 5 is the archive ID you want to harvest.

Regards,
Alec Smecher
Public Knowledge Project Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 8316
Joined: Wed Aug 10, 2005 12:56 pm

Postby akossov » Thu May 03, 2007 3:22 am

Dear Mr Smecher,

thank you for the precise and important tipps.

truly yours

A. Kossovoi
akossov
 
Posts: 5
Joined: Wed Apr 25, 2007 5:49 am

Postby ozp » Sun May 20, 2007 12:43 pm

I have a problem when I try to havest from comand line

Code: Select all
[biblioteca@l30dnn0100 tools]$ php harvest.php 1 set=Tempo
PHP Notice:  Undefined index:  SCRIPT_NAME in /home/biblioteca/public_html/classes/core/Request.inc.php on line 96
PHP Notice:  Undefined index:  SCRIPT_NAME in /home/biblioteca/public_html/classes/core/Request.inc.php on line 194
Content-type: text/html

Selected archive: SciELO - Scientific Electronic Library Online
Fetching records...
<br />
<b>Notice</b>:  Undefined index:  SCRIPT_NAME in <b>/home/biblioteca/public_html/classes/core/Request.inc.php</b> on line <b>96</b><br />
<br />
<b>Notice</b>:  Undefined index:  SCRIPT_NAME in <b>/home/biblioteca/public_html/classes/core/Request.inc.php</b> on line <b>194</b><br />
Finished:
        0 records indexed
        0 seconds elapsed
        0.00 records per second
        0 records kept from past harvests
        0 records total.
Errors/Warnings:


[biblioteca@l30dnn0100 tools]$



If i try to haverst from the web interface, it goes ok
But I have to haverst only some sets, and this ARCHIVE has more than 50 sets to havest

Also some sets have names like that:

Trans/Form/Ação - Revista de Filosofia

Interface - Comunicação, Saúde, Educação

Is this a problem?


Lines 96 and 194 of Request.inc.php

Code: Select all
         $basePath = dirname($_SERVER['SCRIPT_NAME']);

         $requestPath = $_SERVER['SCRIPT_NAME'] . (isset($_SERVER['PATH_INFO']) ? $_SERVER['PATH_INFO'] :
ozp
 
Posts: 51
Joined: Sat Apr 28, 2007 9:01 pm

Postby asmecher » Mon May 21, 2007 9:40 am

Hi ozp,

All those warnings about SCRIPT_NAME can be ignored; they're appearing because you're using the PHP CGI binary to run a command-line script, when you should be using a command-line binary. However, it's harmless.

Make sure you're using a valid setSpec. You can list sets by using a web browser to request the OAI URL with ?verb=ListSets at the end, e.g.:
Code: Select all
http://my-server-name/ojs2/index.php/index/oai?verb=ListSets
The resulting XML document will list all setSpecs; make sure you use one of those verbatim when requesting a set.

Regards,
Alec Smecher
Public Knowledge Project Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 8316
Joined: Wed Aug 10, 2005 12:56 pm

Postby ozp » Mon May 21, 2007 10:23 am

Hello asmecher!

I think this is a problem with the source repository
It is not OJS, its another journal system that publish many journals using the bireme system

http://www.scielo.br/oai/scielo-oai.php?verb=ListSets
(this seems to be ok)

I think they either block me or they are unstable, because sometimes I try to harvest and get a error from the web interface:

Code: Select all
The metadata index could not be updated. The following error(s) occurred:

    *

Return to management


If I try to harvest this source with command line nothing happends (like I posted)

With others sources from OJS I can harvest with no problem, because ther e are not too many records to fetch

But with big sources I'm having another problem

Code: Select all
[biblioteca@l30dnn0100 tools]$ php harvest.php 8
PHP Notice: Undefined index: SCRIPT_NAME in /home/biblioteca/public_html/classes/core/Request.inc.php on line 96
PHP Notice: Undefined index: SCRIPT_NAME in /home/biblioteca/public_html/classes/core/Request.inc.php on line 194
PHP Fatal error: Maximum execution time of 60 seconds exceeded in /home/biblioteca/public_html/classes/search/SearchDAO.inc.php on line 34
Content-type: text/html

Selected archive: BDTD Ibict
Fetching records...
<br />
<b>Notice</b>: Undefined index: SCRIPT_NAME in <b>/home/biblioteca/public_html/classes/core/Request.inc.php</b> on line <b>96</b><br />
<br />
<b>Notice</b>: Undefined index: SCRIPT_NAME in <b>/home/biblioteca/public_html/classes/core/Request.inc.php</b> on line <b>194</b><br />
<br />
<b>Fatal error</b>: Maximum execution time of 60 seconds exceeded in <b>/home/biblioteca/public_html/classes/search/SearchDAO.inc.php</b> on line <b>34</b><br />
[biblioteca@l30dnn0100 tools]$


This particular source is a thesis/dissertation repository from many brazilian universities.

I can choose some sets and havest one to one, but even some sets generates time outs (the bigger ones)
ozp
 
Posts: 51
Joined: Sat Apr 28, 2007 9:01 pm

Postby asmecher » Tue May 22, 2007 11:21 am

Hi ozp,

Well, one thing to do is increase the execution time limit in your php.ini -- see the max_execution_time directive. If it's an unstable data source, it'll be tricky to debug; you'll probably need to generate a few OAI URLs yourself to see if you can duplicate the problem. Are you familiar enough with OAI to do this?

Regards,
Alec Smecher
Public Knowledge Project Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 8316
Joined: Wed Aug 10, 2005 12:56 pm

Postby ozp » Tue May 22, 2007 4:25 pm

hello! I think I'm close to solve the problem

The source works if I use the spec instead of the name

<set>
<setSpec>0102-8839</setSpec>
<setName>
São Paulo em Perspectiva
</setName>
</set>

Now I'm asking my host company to increase the max time

There are a lot of sets and sources to update.

Is there a way to make a php file to call all the updates ?
so instead of placing one by one in cron, you place only 1 php file and then update all?
ozp
 
Posts: 51
Joined: Sat Apr 28, 2007 9:01 pm

Postby asmecher » Wed May 23, 2007 9:22 am

Hi ozp,

Yes -- sorry, I wasn't clear in the posting above. You need to specify a set using its setSpec.

I do have a command-line PHP script to harvest multiple sets; I've uploaded it at http://pkp.sfu.ca/harvester2/download/harvest_multiple_sets.php.txt.

Regards,
Alec Smecher
Public Knowledge Project Team
---
Don't miss the First International PKP Scholarly Publishing Conference
July 11 - 13, 2007, Vancouver, BC, Canada
http://ocs.sfu.ca/pkp2007/
asmecher
 
Posts: 8316
Joined: Wed Aug 10, 2005 12:56 pm

Postby ozp » Wed May 23, 2007 4:50 pm

hello asmecher!

Thanks!!

Ill test it right now
ozp
 
Posts: 51
Joined: Sat Apr 28, 2007 9:01 pm

Next

Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: Bing [Bot] and 1 guest