OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



How do I add a NON-OAI repository to my Database?

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

How do I add a NON-OAI repository to my Database?

Postby chasan » Sat Jun 18, 2011 3:21 am

Hey all!

I am facing a problem and I wish that I can find a solution through the forum.
I am using harvester ( https://vivliothmmy.ee.auth.gr/chasan_pkp/ ) and I would like to import all the records form a NON-OAI repository.

The NON-OAI repository is the following: http://www.corgialenios.gr/library/default.asp .
I want for example to extract the data from http://www.corgialenios.gr/library/thistype.asp?tid=26 , using a data extraction application (DeiXTO, from http://www.deixto.com) and then somehow import these records to my Harvester database.
The data extraction tool I am using, is really useful, and I can extract the specific fields I want (for example: title, author, year etc.) to a specific file type (XML or TXT or RSS).

How can I do that? Do I have to edit or interact my SQL Base? Which files do I have to edit in order to do that?
I want to make harvester understand with some kind of rules(?) to fetch metadata to the database. Is it possible to do that? How?

I am in a hurry, and I would appreciate if you could help me as soon as possible.

Thank you very much!
chasan
 
Posts: 28
Joined: Sat May 15, 2010 4:45 pm

Re: How do I add a NON-OAI repository to my Database?

Postby asmecher » Sat Jun 18, 2011 10:35 am

Hi chasan,

OHS currently includes only an OAI-PMH protocol harvester. It is implemented as a harvester plugin, with the intention of supporting additional plugins -- but you'd have to write one. See plugins/harvesters/oai for the OAI-specific code; you'll need to implement something similar for the other system.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

Re: How do I add a NON-OAI repository to my Database?

Postby chasan » Sun Jun 19, 2011 3:25 pm

asmecher wrote:Hi chasan,

OHS currently includes only an OAI-PMH protocol harvester. It is implemented as a harvester plugin, with the intention of supporting additional plugins -- but you'd have to write one. See plugins/harvesters/oai for the OAI-specific code; you'll need to implement something similar for the other system.

Regards,
Alec Smecher
Public Knowledge Project Team


FOR the other system or TO the other system? I hope that I understood what exactly you meant... just to make it clear.
And one more thing... how easy is it to create a new protocol?

Each online repository that does not support OAI-PMH protocol and it's independent, has it's own structure.
So if I write a protocol, it will be ONLY for the SPECIFIC REPOSITORY. Right?
chasan
 
Posts: 28
Joined: Sat May 15, 2010 4:45 pm

Re: How do I add a NON-OAI repository to my Database?

Postby asmecher » Mon Jun 20, 2011 9:53 am

Hi chasan,

You can think of the OHS software itself as a generic container for metadata. Metadata formats (e.g. Dublin Core, MARCXML, etc.) are implemented as plugins. Harvester protocols (e.g. OAI-PMH) used to get data from the data sources into OHS are also implemented as plugins.

It's hard to suggest whether or not you'd be better off writing one or several harvester plugins without knowing your requirements well, but be warned that writing a harvester plugin isn't trivial -- have a look at the OAI code referenced above to see what you're getting involved in.

Each repository can use a different combination of harvester plugin and metadata format plugin without problems. There are searching/indexing/browsing tools included to present them all together regardless of the technology used.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

Re: How do I add a NON-OAI repository to my Database?

Postby chasan » Wed Aug 10, 2011 6:25 am

So is it possible to convert the fields (date, author, pages etc.) from the NON-OAI repository I want to add to an already known metadata format and then add this repository to my database?
chasan
 
Posts: 28
Joined: Sat May 15, 2010 4:45 pm

Re: How do I add a NON-OAI repository to my Database?

Postby asmecher » Wed Aug 10, 2011 9:23 am

Hi chasan,

This would probably be easiest using an OAI static repository. See http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm for details. Basically, an OAI static repository is an XML file containing OAI data. The harvester can harvest from static repositories.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

Re: How do I add a NON-OAI repository to my Database?

Postby chasan » Thu Aug 11, 2011 8:22 am

hmm.. I'll check it right now. Hope, it will help..!
Thank you.
chasan
 
Posts: 28
Joined: Sat May 15, 2010 4:45 pm

Re: How do I add a NON-OAI repository to my Database?

Postby chasan » Sun Aug 14, 2011 8:11 am

I read about the static repositories and I think that this might be the solution to the above problem I stated.
One question. With the program I have (deixto), I can extract data from a website to a specific file type (.txt or .xlm or .rss).
But as I read I have to "give" a specific "shape", format to the .xml format that is going to be my static repository.
Is it right? Did I understand it?
chasan
 
Posts: 28
Joined: Sat May 15, 2010 4:45 pm

Re: How do I add a NON-OAI repository to my Database?

Postby asmecher » Sun Aug 14, 2011 11:56 am

Hi chasan,

The best thing to do is probably to find an example of a static repository XML file and use that as an example. If you can find a DTD file for static repositories, you can also use that to validate your XML file using an XML validator.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

Re: How do I add a NON-OAI repository to my Database?

Postby chasan » Mon Aug 15, 2011 7:52 am

So in other words, I have to extract all the data I want from the NON-OAI repository and then create an XML (a static repository) that I am going to upload in a server in order to fetch the metadata and all the valuable information through Harvester (I saw that there is an option when you click to add Archives, to select if you want to add a static repository).

Is it enough just to upload that file in a server and then fetch it, or do I have to create a static repositroy gateway (sth that I did not totally understood)?

Many thanks for the help and the patience you are showing. Really appreciate it.

One more thing, really lame question but sorry, what is a dtd file (in simple words, not big explanations)?
chasan
 
Posts: 28
Joined: Sat May 15, 2010 4:45 pm

Re: How do I add a NON-OAI repository to my Database?

Postby asmecher » Mon Aug 15, 2011 8:07 am

Hi chasan,

No need for a gateway; just putting the XML file somewhere on a webserver, so that it can be accessed from a URL, is enough. You can enter that URL into the "Create Archive" form of the Harvester and it should be able to harvest it OK.

A DTD file describes the layout of an XML file. (For example it might say in a computer-readable way, "the <articles> node must contain one or more <article> nodes, and nothing else.") If you write an XML file for a particular purpose, you can use a validator to look at both the DTD and XML to check it for mistakes. It'll save you a lot of debugging headaches if you're able to check the file that way before you start trying to use it.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

Re: How do I add a NON-OAI repository to my Database?

Postby chasan » Wed Aug 24, 2011 4:22 pm

Hey Alec,

Do you have any examples of online static repositories? I would like to see the structure and have an idea of how it works.
What do I have to add in order harvester be able to fetch the .xml file? Need any specific command lines in the xml file?
Do I have to use specific name for the fields (i.e author, date, description etc..)?
IF yes, where am I able to find those variables? IF not, how will harvester know (understand) what parameters am I giving to it?
chasan
 
Posts: 28
Joined: Sat May 15, 2010 4:45 pm

Re: How do I add a NON-OAI repository to my Database?

Postby asmecher » Wed Aug 24, 2011 4:47 pm

Hi chasan,

See section 3.6 in http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm (and the whole document, for that matter).

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 8599
Joined: Wed Aug 10, 2005 12:56 pm

Re: How do I add a NON-OAI repository to my Database?

Postby chasan » Wed Aug 24, 2011 4:55 pm

asmecher wrote:Hi chasan,

See section 3.6 in http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm (and the whole document, for that matter).

Regards,
Alec Smecher
Public Knowledge Project Team


Thank for the immediate answer, I will take a look on that and then come back for answers if I cannot solve it on my own :)
chasan
 
Posts: 28
Joined: Sat May 15, 2010 4:45 pm

Re: How do I add a NON-OAI repository to my Database?

Postby chasan » Fri Aug 26, 2011 4:22 am

Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<Repository xmlns="http://www.openarchives.org/OAI/2.0/static-repository"
                   xmlns:oai="http://www.openarchives.org/OAI/2.0/"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/static-repository.xsd">

   <Identify>
      <oai:repositoryName>Demo repository</oai:repositoryName>
      <oai:baseURL>[b]https://vivliothmmy.ee.auth.gr/chasan_pkp/test001.xml[/b]</oai:baseURL>
      <oai:protocolVersion>2.0</oai:protocolVersion>
      <oai:adminEmail>chatzioglou@gmail.com</oai:adminEmail>
      <oai:earliestDatestamp>2011-08-25</oai:earliestDatestamp>
      <oai:deletedRecord>no</oai:deletedRecord>
      <oai:granularity>YYYY-MM-DD</oai:granularity>
   </Identify>

   <ListMetadataFormats>
      <oai:metadataFormat>
         <oai:metadataPrefix>oai_dc</oai:metadataPrefix>
         <oai:schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</oai:schema>
         <oai:metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</oai:metadataNamespace>
      </oai:metadataFormat>
   </ListMetadataFormats>

   <ListRecords metadataPrefix="oai_dc">
      <oai:record>
         <oai:header>
            <oai:identifier>oai:arXiv:cs/0112017</oai:identifier>
            <oai:datestamp>2001-12-14</oai:datestamp>
         </oai:header>
         <oai:metadata>
            <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                                                 xmlns:dc="http://purl.org/dc/elements/1.1/"
                                                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                                                 [b]xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ [/b]
                                http://www.openarchives.org/OAI/2.0/oai_dc.xsd">

               <dc:title>Using Structural Metadata to Localize Experience of Digital Content</dc:title>
               <dc:creator>Dushay, Naomi</dc:creator>
               <dc:subject>Digital Libraries</dc:subject>
               <dc:description>With the increasing technical sophistication of
            both information consumers and providers, there is
            increasing demand for more meaningful experiences of digital
            information. We present a framework that separates digital
            object experience, or rendering, from digital object storage
            and manipulation, so the rendering can be tailored to
            particular communities of users.
          </dc:description>
               <dc:description>Comment: 23 pages including 2 appendices, 8 figures</dc:description>
               <dc:date>2001-12-14</dc:date>
            </oai_dc:dc>
         </oai:metadata>
      </oai:record>
      <oai:record>
         <oai:header>
            <oai:identifier>oai:perseus:Perseus:text:1999.02.0084</oai:identifier>
            <oai:datestamp>2002-05-01</oai:datestamp>
         </oai:header>
         <oai:metadata>
            <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
                                                 xmlns:dc="http://purl.org/dc/elements/1.1/"
                                                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                                                [b] xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ [/b]
                                http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
               <dc:title>Germany and its Tribes</dc:title>
               <dc:creator>Tacitus</dc:creator>
               <dc:type>text</dc:type>
               <dc:source>Complete Works of Tacitus. Tacitus. Alfred John Church.
            William Jackson Brodribb. Lisa Cerrato. edited for Perseus.
            New York: Random House, Inc. Random House, Inc. reprinted 1942.
          </dc:source>
               <dc:identifier>http://www.perseus.tufts.edu/cgi-bin/ptext?doc=Perseus:text:1999.02.0083</dc:identifier>
            </oai_dc:dc>
         </oai:metadata>
      </oai:record>
   </ListRecords>
</Repository>


This is the test code That I am trying to fetch through harvester in order to understand the basics related to STATIC REPOSITORIES.
Harvester does not harvest it at all. It says that it is invalid. I save it as .xml file in the pkp_harvester folder on my server(is this a problem?).
You can see it in bold in the first lines of the test code I have.

Do I have to retreive the SCHEMA plugin from elsewhere?

Finally, where does pkp-harvester save the xml files after it harvests a repository? Would it be helpful at all to study tha xml files in order to construct my static repository?

Thanks again for the help you are offering.
chasan
 
Posts: 28
Joined: Sat May 15, 2010 4:45 pm

Next

Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 0 guests