You are viewing the PKP Support Forum | PKP Home Wiki

Per archive dc:idnetifier prefix.

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Per archive dc:idnetifier prefix.

Postby bdgregg » Fri Dec 10, 2010 1:48 pm

We are running an instance of OHS and we have a site that we are harvesting that their dc:identifier is not a URL and according to the OAI PMH standard which is located here: http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm thus the following are valid identifiers and could be passed the harvester instead of full blown URLs. This would mean that the link normally found in the record view of OHS would fail to take the user anywhere, which makes for an irritated user of course.

Code: Select all

oai:FOO.ORG:some-local-id-53     ;not the same as above,
                                 ;should not use foo.org _and_ FOO.ORG

oai:foo.org:Some-Local-Id-54     ;not the same as above, distinct identifier

oai:wibble.org:ab%20cd           ;space in internal id correctly escaped
oai:wibble.org:ab?cd             ;question mark should not be escaped

Note that the Identifier field within OHS is the link back to the original archive from which the item was harvested and usually what the user is looking for.

Would it be possible to add the ability to have a way to prefix the identifier that was provided in the above format (e.g. oai:blah.blah:blah) with a specific URL portion in the individual archive settings page such that the record display URL is a concatenation of the specific URL and the oai identifier as in:

Code: Select all
Archive Specific URL prefix: 'http://some.site.edu/cgi/executable?view=brief&id='
Archive Specific URL post-fix: '&someotherparam=10'

Thus the Record Display URL in OHS would be 'http://some.site.edu/cgi/executable?view=brief&id=oai:blah.blah:blah&someotherparam=10'

Additionally it would be a good practice to see if the oai record identifier would check to see if the first 4 characters of the identifier were 'http' (already a URL) and if so take it and display as such, otherwise build the URL using the Archive Specific URL prefix and post-fix.

Or if anyone has a suggestion that does not require an enhancement to link back the individual record in the owning system I'm all ears/eyes.

Posts: 118
Joined: Wed Sep 15, 2004 8:21 am
Location: University of Pittsburgh

Re: Per archive dc:idnetifier prefix.

Postby asmecher » Fri Dec 10, 2010 2:28 pm

Hi Brian,

Note that the Dublin Core identifier and the OAI identifier are different things -- I'm not sure if OAI has documented a recommended relationship between the two, but if there is, please let me know. Anyway, we implemented metadata format support for the harvester using plugins with the intention that it should be easy to modify the presentation of each metadata format (e.g. DC).

The Dublin Core plugin is implemented in plugins/schemas/dc and there are two places where URLs are taken from the "identifier" field:
  • In DublinCorePlugin::getUrl, which is used to display the link in the record list to jump directly to the record in the OAI data source. There's already a check here to see whether or not it looks like a URL, i.e. matches the regular expression /^[a-z]+:\/\//.
  • In the record view template, record.tpl, there's the following statement:
    Code: Select all
                                            {if $name == 'identifier'}
                                                    <a href="{$value|escape}">{$value|escape|default:"&mdash;"}</a>
    This presents all the metadata values for a particular metadata field in the record, watching in particular for the "identifier" field and formatting any entries as hyperlinks. As you can see, there is no checking to see whether or not it's actually a hyperlink.
I'd suggest adding a preg_match(...) to the {if} statement.

Alec Smecher
Public Knowledge Project Team
Posts: 9920
Joined: Wed Aug 10, 2005 12:56 pm

Re: Per archive dc:idnetifier prefix.

Postby bdgregg » Fri Dec 10, 2010 2:31 pm

Thanks Alec,

I'll take a look.
Posts: 118
Joined: Wed Sep 15, 2004 8:21 am
Location: University of Pittsburgh

Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest