OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Integrate Harvester2 with another app?

Open Harvester Systems support questions and answers, bug reports, and development issues.

Moderators: jmacgreg, michael, John

Forum rules
Developer Resources:

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome.

Integrate Harvester2 with another app?

Postby nator » Thu Nov 29, 2007 10:46 am

Hi there! I'm excited to have found Harvester2 - it seems like a great bit of open source development that will (hopefully) fill a need in our project. I've installed the app and read the technical documentation (very nice!), but I have some questions for the community. First, some background:

I'm involved in building an online "meta-collection" of several art museums' online collections. We've chosen OAI as the transport mechanism so we can open it up to as many institutions as possible, and will be using several metadata formats in addition to the required oai_dc. I'm comfortable writing these schema plugins, that shouldn't be a big issue, but I'm unsure how to then transfer this data into our system. That is, I'd like to use Harvester2 for the part it's good at - harvesting - and our system to then translate and import the data into actual art objects and supporting media. I see two ways to do this:

1. Some pretty big code changes to Harvester2 to integrate it tightly with our system - we'd do the translating and object creation as the records are harvested. Could this be done via plugins? I see a hook for when an entry is added to a record, but this seems cumbersome to do it at this point.
2. Leave Harvester2 alone, run via cron, and have our system periodically check the records table in Harvester2 for new records. We'd perform the import and object creation at this time, and track datestamps along with the archives so we'd know "where we were" in time.

My questions: Has anyone done anything like this? Is there a better way to do what I'm after? Harvester2 seems to be (by far) the best PHP implementation of an OAI harvester I've found - in fact, one of the only. I was debating using a perl library, or even java, but I'd prefer to keep the whole project in PHP.

I'm leaning towards #2, since Harvester2 seems to be well-tested and very robust. I'd rather use a best-of-breed harvester than write one from scratch, even if it means some hacking to get records between the two systems.

Thanks for any input you might have!
Nate
nator
 
Posts: 3
Joined: Thu Nov 29, 2007 10:21 am

Re: Integrate Harvester2 with another app?

Postby asmecher » Thu Nov 29, 2007 12:17 pm

Hi Nate,

I would lean towards #1; the plugin system should be able to step in before content is indexed and essentially tell the harvester "thanks, but I'll take it from here". This will save you a lot of headaches relating to synchronizing databases, not to mention the extra storage etc.

I'm not aware of anyone who has taken this approach, so you'll have to do a little bit of legwork to figure out how the particulars of the plugin will need to operate, but I can suggest somewhere to start: try writing a preprocessor plugin to accept incoming records and shuffle them into your external system. If the preprocessEntry function returns true, the record should not be processed by the Harvester.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 9083
Joined: Wed Aug 10, 2005 12:56 pm

Re: Integrate Harvester2 with another app?

Postby nator » Thu Nov 29, 2007 12:26 pm

Alec -

Thanks for the quick reply, and for suggesting a starting point! I'll give it a whirl with a preprocessEntry plugin and see what I can see go by. If I have any luck I'll try to come back and post my work so other people can see how it integrates.

Thanks again,
Nate
nator
 
Posts: 3
Joined: Thu Nov 29, 2007 10:21 am

Re: Integrate Harvester2 with another app?

Postby DevinTheDude » Mon Jun 02, 2008 2:04 pm

Hi Nate,

Any work you can post yet?:)
DevinTheDude
 
Posts: 2
Joined: Sun Jun 01, 2008 5:10 pm

Re: Integrate Harvester2 with another app?

Postby nator » Wed Jun 25, 2008 9:33 am

Sorry this has taken so long to get back to... Yes, good progress so far using a preprocessor plugin and returning false. I will try to more fully document it later - the project is sort of in the height of development, and not stable yet - but I had to add a few more pieces: a metadata parser plugin for the metadata format we're using (CDWAlite), and then basically a whole mess of code in the preprocessor that watches for certain tags and builds the object. The big trick is the metadata uses these "sets", or wrapper tags, so I have to set flags when I see the first one and catch everything until the last one before adding it all at once. Hard to explain and the code is too embarrassing to show just yet... :)

So far so good - I had to fix a bunch of memory leaks, but they were all in the code from the rest of my project. The PKP harvester seems very solid, but it's strange to use so little of it - just the harvester, none of the front end or DB.

Nate
nator
 
Posts: 3
Joined: Thu Nov 29, 2007 10:21 am

Re: Integrate Harvester2 with another app?

Postby asmecher » Wed Jun 25, 2008 10:17 am

Hi Nate,

The Harvester is a bit of a wildcard for us, since it can be used in so many ways (many of which were certainly not predicted by us). We'll continue to build flexibility into it over the coming months so that it'll be easier to pick and choose which pieces of it to use, and we hope to include a few big-ticket items such as support for external indexing tools (e.g. Lucene) which will add new dimensions.

Please consider contributing your modifications back to us if you think they'd be useful to the greater community!

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 9083
Joined: Wed Aug 10, 2005 12:56 pm


Return to Open Harvester Systems Support and Development

Who is online

Users browsing this forum: No registered users and 2 guests