OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Harvesting an ojs journal

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Harvesting an ojs journal

Postby hundevater » Sat Sep 25, 2010 8:47 am

hi,

is it possible to harvest oa-journals that support oai-pmh?
then, if i want to harvest an ojs journal how do i find out its base url if it's not given by the journal editors?

thanks for your help,

paul
hundevater
 
Posts: 5
Joined: Thu Aug 05, 2010 5:26 am

Re: Harvesting an ojs journal

Postby jmacgreg » Wed Sep 29, 2010 9:55 am

Hi Paul,

Yes, you can harvest a journal via OAI-PMH. You will have to ask the Journal Manager for the base url and any other information you may need -- this can be found in Journal Setup Step 3.5. The Site Administrator can also give you site-wide harvesting information.

Cheers,
James
jmacgreg
 
Posts: 4190
Joined: Tue Feb 14, 2006 10:50 am

Re: Harvesting an ojs journal

Postby dcomeaux » Sat Jan 19, 2013 3:26 pm

I'm the site admin, and I have no idea where to find the information I need to harvest the data. I don't see any info anywhere in the documentation on this. ( searched for the phrase "base url" in the latest document and no matches were found) Is there?
dcomeaux
 
Posts: 17
Joined: Sat Jan 19, 2013 3:22 pm

Re: Harvesting an ojs journal

Postby asmecher » Sat Jan 19, 2013 3:37 pm

Hi dcomeaux,

What are you trying to harvest from?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting an ojs journal

Postby dcomeaux » Sat Jan 19, 2013 3:50 pm

Hi! Thanks for answering!

I run a library website. We have an OJS instance set up at journals.tulane.edu. I'm trying to set up a harvest process from our web-scale discovery system (Primo, created by Ex Libris). I just found the base url of one our journals "https://journals.tulane.edu/index.php/TJIA/oai", so I'm super excited that I might finally be on the right track.I assume the metadata format is oai-pmh? Is there any more info out there?
dcomeaux
 
Posts: 17
Joined: Sat Jan 19, 2013 3:22 pm

Re: Harvesting an ojs journal

Postby asmecher » Sun Jan 20, 2013 1:29 pm

Hi dcomeaux,

Yes, you're on the right track. The last part of the URL you quote is the journal path, visible in journal URLs after the "index.php" part; if you want all journals, use "index". This information appears in Setup 3.5 "Register Journal for Indexing (Metadata Harvesting)", or in the Site Administrator's "Site Settings" page. The harvesting protocol is OAI-PMH, but that's a container format that doesn't prescribe a single metadata format; it requires DC but OJS also supports MARC, MARCXML, RFC1807, and a subset of NLM Journal Article.

This forum is for the OHS harvester application that we also maintain, which is capable of harvesting OJS as a data source, but isn't part of OJS itself; are you using OHS, or just OJS?

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting an ojs journal

Postby dcomeaux » Tue Jan 22, 2013 1:47 pm

Hi Alec,

Thanks for your help! I'm just using OJS. I couldn't find any help on the OJS forum. I was thrilled to finally stumble on this post.

So to harvest all journals, i'd use " https://journals.tulane.edu/index.php", or is it "https://journals.tulane.edu/index" ?

Thanks again,
Dave Comeaux
dcomeaux
 
Posts: 17
Joined: Sat Jan 19, 2013 3:22 pm

Re: Harvesting an ojs journal

Postby asmecher » Tue Jan 22, 2013 3:04 pm

Hi Dave,

Actually, it would be...
https://journals.tulane.edu/index.php/index/oai

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting an ojs journal

Postby dcomeaux » Sat Feb 02, 2013 5:56 pm

Awesome! Thanks Alec. I was able to create a pipe that harvests the data for an individual collection. I just saw your post with the correct url to harvest all the journals, so I just now created another pipe that I hope will work for all journals. Our discovery system indicated that the pipe ran successfully, but I won't see any records to verify this until at least tomorrow.

I do have more questions though. I looked at the OAI listrecords at this url (https://journals.tulane.edu/index.php/T ... fix=oai_dc) and I see three Resource Type fields. It appears that our discovery system is correctly mapping the OJS "Resource Type" field to dc:type, but it's pulling the first one, and I'd like "Peer-reviewed Article" to display. I don't see where the author would submit this information. Can you explain how this information gets added? Is there any way i can ensure that there is only one Resource Type, or at least determine which gets preference?

Here's the pertinent info copied from the listRecords link:

Resource Type info:eu-repo/semantics/article
Resource Type Peer-reviewed Article
Resource Type info:eu-repo/semantics/publishedVersion

Thanks - i'm really stumped with this.

Dave
dcomeaux
 
Posts: 17
Joined: Sat Jan 19, 2013 3:22 pm

Re: Harvesting an ojs journal

Postby asmecher » Sat Feb 02, 2013 9:32 pm

Hi Dave,

For OJS 2.3.x, look in plugins/oaiMetadataFormats/dc/OAIMetadataFormat_DC.inc.php in the toXml function (around line 125).

Cheers,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting an ojs journal

Postby dcomeaux » Sun Feb 03, 2013 3:30 pm

Thanks Alec. I see "driverType" and "driverVersion" is what is being displated in my resource type field. Are these necessary? Will something blow up id I comment these out?
dcomeaux
 
Posts: 17
Joined: Sat Jan 19, 2013 3:22 pm

Re: Harvesting an ojs journal

Postby asmecher » Sun Feb 03, 2013 8:57 pm

Hi Dave,

No, you can modify those as you like.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting an ojs journal

Postby dcomeaux » Fri Feb 22, 2013 10:18 am

Hi Alec!

It's me again...

I was able to resolve the field formatting issue with the Resource types by commenting out the "driver type. So now the discovery tool is harvesting the resource type from OJS. Yippee!

But unfortunately, there's another problem. This system displays the journal in an iframe in the results page. This works fine. But it also provides an option to open the article in a new window. When that link is clicked, the new window returns the following error:

DB Error: Duplicate entry '19-http://tulprimo.hosted.exlibrisgroup.com:1701/primo_library/l' for key 'referral_article_id'

I asked the discovery tool support, and they responded:

From what i can tell, since the referrer is the same for both the in-Primo-page request, where the journals.tulane page loads in a frame, and the new page request, where the journals.tulane page is loaded in a new window, the value being inserted into the website database for the 'referral_article_id' is being duplicated. I have no documentation or control over how the journals.tulane page interacts with its database to store information when a journal page is requested, so I have a limited capacity to investigate this error. I do notice that the page loads correctly if the error page is refreshed by the user.

Do you have any suggestions on how to tackle this?
dcomeaux
 
Posts: 17
Joined: Sat Jan 19, 2013 3:22 pm

Re: Harvesting an ojs journal

Postby asmecher » Fri Feb 22, 2013 11:23 am

Hi Dave,

This error message is coming from OJS's "Referral Plugin", which looks for incoming links to articles. I'm guessing, but it sounds to me like the journal is getting hit simultaneously by two requests, so the referral plugin's "check to see if it exists, then if not, insert it" is getting fooled into attempting a second insert. I've created a Bugzilla entry for this (see http://pkp.sfu.ca/bugzilla/show_bug.cgi?id=8129), but in the meantime, I'd suggest disabling the referral plugin from Journal Management unless it's of prime importance to you.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8575
Joined: Wed Aug 10, 2005 12:56 pm

Re: Harvesting an ojs journal

Postby dcomeaux » Sat Feb 23, 2013 3:42 pm

Thanks Alec! instead of disabling Referral Plugins, I just went to the plugin settings and added the url of our discovery system to the list of exclusions.

#^http://www.google.#
#^http://www.yahoo.#
#^http://tulprimo.hosted.exlibrisgroup.#

And it worked!
This is awesome!!

Thanks agin,
Dave
dcomeaux
 
Posts: 17
Joined: Sat Jan 19, 2013 3:22 pm


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest