OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



"VIEW ORIGINAL" no longer showing in results

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

"VIEW ORIGINAL" no longer showing in results

Postby fredriley » Fri Dec 07, 2007 7:24 am

Hi again

This is a stumper. I'm managing the Harvester instance at http://www.rlo-cetl.ac.uk/harvester2/ which is hosted on a Windoze server. Until yesterday if you did a search (eg "Java") you got a list of titles with "VIEW RECORD" and "VIEW ORIGINAL" links, the latter of which was the public URL of the resource in the harvested repository. Then I made two admin changes:

1. Removed all bar the rlocetl archive (I'd harvested two other sources just as a demo)
2. Updated the rlocetl index.

Suddenly searches only showed "VIEW RECORD", with the link to the resource no longer appearing. Similarly, in the full record where there used to be a link in Identifier to the public URL (eg http://www.rlo-cetl.ac.uk:8080/test/IntraLibrary?command=open-preview&learning_object_key=i08n13448t), this had also disappeared. I'd thought this was a problem with the repository software (Intrallect's Intralibrary), but today I've used a Harvester instance on a Macbook at home running OS/X and updated the same metadata index, and the search results include "VIEW ORIGINAL". I've not yet carried out (1) on the local instance as I don't want to delete archives unnecessarily. Is there any reason why "VIEW ORIGINAL" should disappear like this? There've been no tweaks to the repository software config as far as I'm aware, and the fact that it works fine on my home machine indicates the problem lying with Harvester. I've compared the two Harvester configs and they appear to be identical. I've not yet tried to compare the full mySQL and/or PHP configs.

On a related topic, whilst I have your attention :), in a full record display what is the Identifier element supposed to link to? For instance, in the rlocetl repository search the identifier LTRI003 ("While loops") is a link which points to http://www.rlo-cetl.ac.uk/harvester2/in ... ew/LTRI003 but clicking on that link just returns the user to the harvester instance home. This happens with other repository harvests as well so it's not just an issue with the rlo-cetl repository. For instance, on my home machine the identifier linking to http://127.0.0.1/harvester-2.0.1/index. ... map-rm2008 also, when clicked, goes to the Harvester home.

Many thanks in advance, again, for any tips/suggestions/help. We've nearly got the thing up and running...

Cheers

Fred
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Re: "VIEW ORIGINAL" no longer showing in results

Postby asmecher » Fri Dec 07, 2007 9:05 am

Hi Fred,

Depending on what schema you're using, the URL for the original record is located using the getUrl function (e.g. in plugins/schema/dc/DublinCorePlugin.inc.php). I'd suggest looking there to determine what fields the Harvester is using to find URLs.

The second question is related; in Dublin Core, the DC "identifier" field is often used to store URLs to the original. It sounds like your data source behaves differently; have a look at the implementation of getUrl, and if there is a more appropriate field, make the change there.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7698
Joined: Wed Aug 10, 2005 12:56 pm

Re: "VIEW ORIGINAL" no longer showing in results

Postby fredriley » Fri Dec 07, 2007 10:06 am

asmecher wrote:Hi Fred,

Depending on what schema you're using, the URL for the original record is located using the getUrl function (e.g. in plugins/schema/dc/DublinCorePlugin.inc.php). I'd suggest looking there to determine what fields the Harvester is using to find URLs.


Thanks, Alec. I'll have a look at the code on my Macbook (I don't have the privileges to do that on the live server), but I'm not sure that'll tell me why "VIEW ORIGINAL" was showing one minute then disappeared the next. Presumably the PHP script you mention is unchanged by any admin tweaks? If so, what could make it behave one way one moment and another way the next?

The second question is related; in Dublin Core, the DC "identifier" field is often used to store URLs to the original. It sounds like your data source behaves differently; have a look at the implementation of getUrl, and if there is a more appropriate field, make the change there.


Ah, well that I can't do - I've got admin access to the Harvester instance, but not to the area of the server it's sitting on. I'd have to get the server admin to make changes to the PHP, but I can't know what changes to make because the problem isn't replicable on my home machine - a sort of Catch-22. Looking at the metadata in records in the data source (the RLO-CETL installation of Intralibrary) the Identifier field is simple a unique code, eg:

Record Identifier

LCTL0013


That's what appears in the field to a contributor, but I don't know what goes on 'behind the scenes' in the software. I have emailed the software company about this but won't get a response until next week as everyone's skived off for the long weekend.

Unfortunately this is a crucial problem to solve, as without direct access to the resources from the search results the harvester approach will have to be ditched, which would mean writing off some tens of hours of my time and spending tens of hours more on another approach. My heart sinks mightily at the very thought :(

Cheers

Fred
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am

Re: "VIEW ORIGINAL" no longer showing in results

Postby asmecher » Fri Dec 07, 2007 11:39 am

Hi Fred,

One thought comes immediately to mind... Are you harvesting the archive on each machine using the same metadata format? It's possible that using different metadata formats would result in slightly different data sets being served, particularly depending on the data source's implementation.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7698
Joined: Wed Aug 10, 2005 12:56 pm

Re: "VIEW ORIGINAL" no longer showing in results

Postby fredriley » Mon Dec 10, 2007 4:23 am

asmecher wrote:Hi Fred,

One thought comes immediately to mind... Are you harvesting the archive on each machine using the same metadata format?


I think I must be, though can't check on my home machine as I'm at work at the mo. The only metadata format available in the archive admin page is Dublin Core (I posted something about that here a while back, IIRC). However, I've just now re-harvested the metadata index setting the Index Method to List Identifiers, from the default ListRecords, and that seems to have cracked it - searching http://www.rlo-cetl.ac.uk/harvester2/ now shows VIEW ORIGINAL again. The well-written Harvester Help reads:

# Index Method: The administrator can choose whether to harvest an OAI repository using the ListRecords or ListIdentifiers methods. ListRecords will generally be faster, but ListIdentifiers may be useful in some cases with repositories that are not 100% compatible.

and it looks like, in this case, that's worked out. I've looked at section 3.5 of the OAI-PMH spec (http://www.openarchives.org/OAI/openarchivesprotocol.html#FlowControl) but my eyes started to swim and I'm none the wiser as to why one method should give different results from another. Not that it matters as long as it works.

Cheers

Fred
fredriley
 
Posts: 27
Joined: Fri Sep 14, 2007 10:47 am


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 1 guest