Difference between revisions of "Harvester Roadmap"

From PKP Wiki
Jump to: navigation, search
m (PKP Harvester moved to Harvester Roadmap: cleanup)
(No difference)

Revision as of 13:56, 16 February 2009

Development Roadmap

Milestone 2.3 (Current - Q1 2009)

General functionality

  • Upgrade utility to move from existing Harvester instances without having to repopulate archives details (but reharvesting OK?)
  • Ability to ingest multiple journals/conferences into a Harvester instance so it acts as an aggregator platform
    • Specifically, the ability to integrate the articles and other content from multiple OJS and OCS collections (OMP for collections of books?) into a single interface for searching and browsing. An important aspect of this is to allow browsing and searching of hierarchical data (e.g., national, regional journals; disciplines; etc.)
    • OAI-PMH, using a richer metadata schema than uDC, would suffice for this.
    • Aggregating user data to enable social networking, sharing of annotations and workspaces, etc.
  • APT/Ubuntu package installation
  • Simple CMS plugin like OJS's (static pages plugin already ported for Harvester)

User interface

  • Faceted browsing for author, institution, and research funder, year of publication, document type and key word (tag cloud).
  • Allow the creation of accounts for users who can submit and administer archives.
    • Flexible templating of results and record displays that can be configured by harvester admins. Alec and Siavash have done some work (instantiated in the CHODARR harvester) but more flexibility would be useful, e.g., for a given schema, allow repository-specific overrides on display templates; external template files for specific metadata elements; maybe URL parameterized invocation of templates.
  • Allow the creation of accounts for "readers", so they can get alerts of new items based on saved searches, rss feeds, etc.
  • Sorting of results sets by more factors (currently available for title and date)
    • Other items in the 'Data flow management and manipulation' section of the IReL "Results" document


  • An OpenURL resolver interface that operates on URL-based requests (like CUFTS)
  • Ability to act as an OAI data source, and point to, eg. another harvester
    • One issue here is provenance of records, a known weak area in the OAI-PMH protocol, specifically, there is no standard place in the OAI meta-metadata to store the "breadcrumbs" documenting where a resource description has lived.

Harvesting, data model, data management

  • Lucene or other (Solr, Xapian) indexing back ends.
  • Provide a simple OAI XML validation tool that does a "pre-harvest" that identifies any invalid tokens or other bad tasting XML, so these admins can fix it before actually committing a harvest to the db.