Difference between revisions of "Harvester Roadmap"

From PKP Wiki
Jump to: navigation, search
(8 intermediate revisions by one user not shown)
Line 1: Line 1:
 
=Development Roadmap=
 
=Development Roadmap=
  
==Milestone 2.3 ('''Current - Q1 2009''')==
+
You will find the Open Harvester Systems development roadmap for 2009 below. Please note that these dates are not fixed.
  
===General functionality===
+
==Milestone 2.3 ('''Q1 2009''')==
* Upgrade utility to move from existing Harvester instances without having to repopulate archives details (but reharvesting OK?)
+
* Ability to ingest multiple journals/conferences into a Harvester instance so it acts as an aggregator platform
+
** Specifically, the ability to integrate the articles and other content from multiple OJS and OCS collections (OMP for collections of books?) into a single interface for searching and browsing. An important aspect of this is to allow browsing and searching of hierarchical data (e.g., national, regional journals; disciplines; etc.)
+
** OAI-PMH, using a richer metadata schema than uDC, would suffice for this.
+
** Aggregating user data to enable social networking, sharing of annotations and workspaces, etc.
+
* APT/Ubuntu package installation
+
* Simple CMS plugin like OJS's (static pages plugin [http://www.alperin.ca-a.googlepages.com/staticPagesPlugin-Harvester-1.0.tar.gz already ported for Harvester])
+
  
===User interface===
+
Version 2.3 represents a major rewrite of several aspects of the Harvester 2.x line, particularly metadata storage and indexing. Improvements in metadata storage allow the Harvester to operate as an OAI Data Provider as well as an OAI Harvester, allowing it to work in a multi-tiered environment with the potential for conversion between metadata formats. Indexing is now supported via plugins, including support for SOLR/Lucene in addition to the traditional MySQL inverted-index. User accounts are now supported, allowing for the possibility of user-submitted archives that can thereafter be managed to a limited extent by the submitter. Robustness of error handling during harvesting and indexing has also been improved, with new code to detect and correct UTF-8 errors. All of this has been accomplished via the PKP Web Application Platform, which will underpin future releases of the Harvester as well as other well-known PKP applications such as Open Journal Systems (OJS) and Open Conference Systems (OCS).
* Faceted browsing for author, institution, and research funder, year of publication, document type and key word (tag cloud).
+
* Allow the creation of accounts for users who can submit and administer archives.
+
** Flexible templating of results and record displays that can be configured by harvester admins. Alec and Siavash have done some work (instantiated in the CHODARR harvester) but more flexibility would be useful, e.g., for a given schema, allow repository-specific overrides on display templates; external template files for specific metadata elements; maybe URL parameterized invocation of templates.
+
* Allow the creation of accounts for "readers", so they can get alerts of new items based on saved searches, rss feeds, etc.
+
* Sorting of results sets by more factors (currently available for title and date)
+
** Other items in the 'Data flow management and manipulation' section of the IReL "Results" document
+
  
===Interoperability===
+
You can view bug reports by type:
* An OpenURL resolver interface that operates on URL-based requests (like [http://cufts.lib.sfu.ca/ CUFTS])
+
** See, for example, [http://www.dlib.org/dlib/july03/young/07young.html http://www.dlib.org/dlib/july03/young/07young.html]
+
* Ability to act as an OAI data source, and point to, eg. another harvester
+
** One issue here is provenance of records, a known weak area in the OAI-PMH protocol, specifically, there is no standard place in the OAI meta-metadata to store the "breadcrumbs" documenting where a resource description has lived.
+
  
===Harvesting, data model, data management===
+
* [http://pkp.sfu.ca/bugzilla/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&product=OAI+Harvester&version=2.3&long_desc_type=allwordssubstr&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&deadlinefrom=&deadlineto=&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED&emailtype1=exact&email1=&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=bug_status&type0-0-0=notequals&value0-0-0=UNCONFIRMED&field0-0-1=reporter&type0-0-1=equals&value0-0-1= All bug reports]
* Lucene or other (Solr, Xapian) indexing back ends.
+
* [http://pkp.sfu.ca/bugzilla/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&product=OAI+Harvester&version=2.3&long_desc_type=allwordssubstr&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&deadlinefrom=&deadlineto=&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED&bug_severity=trivial&bug_severity=enhancement&emailtype1=exact&email1=&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=bug_status&type0-0-0=notequals&value0-0-0=UNCONFIRMED&field0-0-1=reporter&type0-0-1=equals&value0-0-1= Feature requests/enhancements]
* Provide a simple OAI XML validation tool that does a "pre-harvest" that identifies any invalid tokens or other bad tasting XML, so these admins can fix it before actually committing a harvest to the db.
+
* [http://pkp.sfu.ca/bugzilla/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&product=PKP+Web+Application+Library&long_desc_type=allwordssubstr&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&deadlinefrom=&deadlineto=&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED&emailtype1=exact&email1=&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=bug_status&type0-0-0=notequals&value0-0-0=UNCONFIRMED&field0-0-1=reporter&type0-0-1=equals&value0-0-1= PKP WAL bug reports]
 +
 
 +
==Milestone 2.3.1 ('''Q1 2010''')==
 +
 
 +
This is a planned stability/bugfix release of the 2.3 line, roughly scheduled for March 2010. Very few new features will be included with this release.
 +
 
 +
* [http://pkp.sfu.ca/bugzilla/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&product=OAI+Harvester&version=2.3.1&long_desc_type=substring&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0= All bug reports]

Revision as of 13:21, 7 May 2010

Development Roadmap

You will find the Open Harvester Systems development roadmap for 2009 below. Please note that these dates are not fixed.

Milestone 2.3 (Q1 2009)

Version 2.3 represents a major rewrite of several aspects of the Harvester 2.x line, particularly metadata storage and indexing. Improvements in metadata storage allow the Harvester to operate as an OAI Data Provider as well as an OAI Harvester, allowing it to work in a multi-tiered environment with the potential for conversion between metadata formats. Indexing is now supported via plugins, including support for SOLR/Lucene in addition to the traditional MySQL inverted-index. User accounts are now supported, allowing for the possibility of user-submitted archives that can thereafter be managed to a limited extent by the submitter. Robustness of error handling during harvesting and indexing has also been improved, with new code to detect and correct UTF-8 errors. All of this has been accomplished via the PKP Web Application Platform, which will underpin future releases of the Harvester as well as other well-known PKP applications such as Open Journal Systems (OJS) and Open Conference Systems (OCS).

You can view bug reports by type:

Milestone 2.3.1 (Q1 2010)

This is a planned stability/bugfix release of the 2.3 line, roughly scheduled for March 2010. Very few new features will be included with this release.