You are viewing the PKP Support Forum | PKP Home Wiki

Indexing by ListSets

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
The Public Knowledge Project Support Forum is moving to http://forum.pkp.sfu.ca

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.

Indexing by ListSets

Postby Scott » Mon Jun 26, 2006 10:55 pm


I've been asked to set up a prototype harvester/search applicattion to be able to expose collection-level metadata for a harvester. Using LANLs static repository gateway in conjunction with the PKP Harvester works well there is no OAI_PMH capability within the repository. For existing data providers, it seems really usefiul to be able to index by the ListSets request which would make it possible to expose collection-level metadata to the harvester. It looks like the main change needs to be in the OAIXMLHandler class to process the setDescription element in the same or similar way to the metadata element, and the setSpec element in the same way as the header element (which isn't present in ListSets). has anyone done this before and has a patch or can the developers advise of any gotchas or whether this is correct (I am not a PHP person).

The other option is to force the data provider to generate a ListRecords response using a separate schema and/or setSpec to expose collection-level entities, but it seems nicer to be able to harvest using the logical grouping level of a repository. In the exampl I'm using the DSpace repository software which exposes its collections as sets.

Posts: 23
Joined: Thu Jun 15, 2006 7:23 pm

Postby asmecher » Thu Jun 29, 2006 12:28 pm

Hi Scott,

This is an interesting idea, and while IMO it's a slight violation of the intention of the separation of delivery from metadata format, I can see its usefulness. This would probably make a good plug-in; if you haven't looked into the existing plugins that ship with OJS, take a look at the plugins/ directory and the technical reference (http://pkp.sfu.ca/harvester2/TechnicalReference.pdf).

In particular, look at the PKP DC extender for something that does a similar task. It's in plugins/generic/pkpdc/.

Alec Smecher
Open Journal Systems Team
Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm

Postby Scott » Thu Jun 29, 2006 7:00 pm

Hi Alec,

I know what you mean, it's clear from the spec that the ListSets purpose is really to return the set structure for subsequent record harvesting. I couldn't really get a grasp on the purpose of the setDescription though from the implementation spec, it seems a sensible idea to harvest metadata from it should someone provide it. Otherwise I think the only way of achieving a collection-level harvest is getting all the data providers to introduce a different level of record (i.e. collections rather than items) to expose. This may be a more correct way of implementing it, but I suspect it would result in more work for data providers (and that assumes they have control over the software beyond configurable crosswalks at the item level).

I'm also not looking at actually harvesting the ListSets markup per-se, I'm really just after the setDescription metadata. It's much the same way as the Harvester currently uses the header identifier as record index and then just harvests the record metadata.

I'll take a look at the plugin you mentioned, if I can develop this (mostly) in a plug-in that would be neat. If I get it working I'll post the mods, and then you may find them useful for a future release (or not!)


Posts: 23
Joined: Thu Jun 15, 2006 7:23 pm

Indexing by ListSets

Postby Scott » Sun Jul 02, 2006 11:09 pm

Hi Alec (or other PKP developers),

OK, I had a look at the PKPDC plugin and have some followup queries. The PKPDC appears to be more concerned with handling a different schema. In the ListSets case I'm still dealing with a standard DC metadata harvest but harvesting from a different OAI request (i.e. the DC is in the setDescription element rather than the metadata element), so I think it is more an extension/enhancement of the exisiting OAI plugin than a new one. An example of a ListSets "record":
<setName>Chinese Revolution (New)</setName>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/ ">
<dc:title>Chinese Revolution</dc:title>
<dc:description>Scholarly Information Services/The Library at ANU holds a number of unique and in some cases rare and fragile collections in both print and microfilm relating to the Chinese Cultural Revolution period (1966 to 1976).</dc:description>

The setSpec content acts as the record identifer in this case (same purpose as identifier in ListRecords), and there is no datestamp information. The harvested information is the setDescription content (same as metadata in ListRecords).

I'd value the developers' opinion on how best to implement this so it is in a useful form for the core code base, then I can contribute it back. I have noted the changes I've made to get it working at the end of the email. Other than modifying the OAI harvester, an alternative is to use a separate harvester altogether that only implements ListSets harvesting but I'm not sure how much code duplication there would be or whether that is the right approach.

Anyhow, the mods I've made to get this to work in the harvester plugin include:
- adding the ListSets to the Index Method dropdown
- only show "All Sets" in the set selection list where index method is ListSets
- passing a "noindex" parameter in the OAIXMLHandler instantiation in getSets method (this method appears to only be used for retrieving a list of sets so no side-effects in doing this). I needed to do this to avoid a harvest when the set selection list is built
- modifying the OAIXMLHandler to harvest from the setDescription. This involved some slight changes in the case statements for the elements in the ListSets request

The metadata updates, searching and flushing all appears to function correctly, and no side-effects appear to have been introduced on the existing OAI harvesting.

I can either post these to the list for a closer look or mail to you direct if you want to have a look.

Posts: 23
Joined: Thu Jun 15, 2006 7:23 pm

Re: Indexing by ListSets

Postby cristianviza » Fri Nov 01, 2013 4:10 pm

Hello Everybody.
I need help with matadata indexing in ojs. I want show one more <set> when execute -> oai?verb=ListSets in my ojs.
Example, Explanation:
now mi ListSets is :
IMAGE : http://www.subimelafoto.com.ar/images/737snapshot39.png

I need add a new information in set , same for all journal. I can do that by editing the file OAIMetadataFormat_DC.inc.php ?

Regard Cristian
Posts: 75
Joined: Tue Nov 06, 2012 10:22 am
Location: Argentina

Re: Indexing by ListSets

Postby asmecher » Fri Nov 01, 2013 4:55 pm

Hi Cristian,

OJS's "DRIVER" plugin (plugins/generic/driver) extends the list of built-in sets; I'd suggest using that plugin as an example. It's documented here (in German, suggest using Google Translate).

Alec Smecher
Public Knowledge Project Team
Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm

Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 2 guests