by nator » Thu Nov 29, 2007 10:46 am
Hi there! I'm excited to have found Harvester2 - it seems like a great bit of open source development that will (hopefully) fill a need in our project. I've installed the app and read the technical documentation (very nice!), but I have some questions for the community. First, some background:
I'm involved in building an online "meta-collection" of several art museums' online collections. We've chosen OAI as the transport mechanism so we can open it up to as many institutions as possible, and will be using several metadata formats in addition to the required oai_dc. I'm comfortable writing these schema plugins, that shouldn't be a big issue, but I'm unsure how to then transfer this data into our system. That is, I'd like to use Harvester2 for the part it's good at - harvesting - and our system to then translate and import the data into actual art objects and supporting media. I see two ways to do this:
1. Some pretty big code changes to Harvester2 to integrate it tightly with our system - we'd do the translating and object creation as the records are harvested. Could this be done via plugins? I see a hook for when an entry is added to a record, but this seems cumbersome to do it at this point.
2. Leave Harvester2 alone, run via cron, and have our system periodically check the records table in Harvester2 for new records. We'd perform the import and object creation at this time, and track datestamps along with the archives so we'd know "where we were" in time.
My questions: Has anyone done anything like this? Is there a better way to do what I'm after? Harvester2 seems to be (by far) the best PHP implementation of an OAI harvester I've found - in fact, one of the only. I was debating using a perl library, or even java, but I'd prefer to keep the whole project in PHP.
I'm leaning towards #2, since Harvester2 seems to be well-tested and very robust. I'd rather use a best-of-breed harvester than write one from scratch, even if it means some hacking to get records between the two systems.
Thanks for any input you might have!
Nate