Difference between revisions of "Lemon8-XML Roadmap"

From PKP Wiki
Jump to: navigation, search
(Comment about OAI data as lookup source)
m (Clarification)
Line 82: Line 82:
 
=Additional Requirements=
 
=Additional Requirements=
 
* No new initial installation requirements
 
* No new initial installation requirements
* Maintain PHP4 compatibility for initial installation, new installation requirements (additional software, PHP>4) only for optional plug-ins
+
* Maintain PHP4 compatibility for initial installation, new installation requirements (additional software, PHP>4) only for optional plug-ins - a notable example being the citation editor/parser/lookup which requires at least PHP5.0
 
* Thorough documentation of additional installation / runtime environment requirements
 
* Thorough documentation of additional installation / runtime environment requirements
 
* Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)
 
* Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)

Revision as of 08:52, 14 May 2010

Development Roadmap

Q1 2009

This is an initial release of the 1.x line, to be shortly deprecated into maintenance mode; we will still be tracking and addressing major / security-related bugs, and you are encouraged to browse our Bugzilla database fully.

Q3 2009

As of Q3 2009, development on L8X as a stand-alone application has been halted in favor of a refactoring of the L8X functionality into the PKP Web Application Library. The rationale for this approach is to provide direct integration with OJS and OCS, as well as functionality for the initial relase of OMP. Users can expect a major change to bring the UI in line with the rest of the PKP suite, while keeping much of the dynamic interface in 1.x.

Q4 2009

  • Add L8X's citation parsing/lookup/editing functionality to OJS
    • citation lookup and editing in submission/editing process ("citation grid")

Q1 2010

  • Implement citation output use-cases (see feature list below)
    • addition of citation data in XML export (e.g. for PubMed, Synergies, and CrossRef)
    • generation of COinS (Context Object in Span) from citations, including Zotero integration
    • Allow readers to view citations in all existing citation output formats (EndNote?, RefWorks? integration)

Q3 2010

  • Add document parsing/editing capability to OJS
    • automatic citation data extraction from ODT in submission process
    • add section parser / editor to editorial process (generate and edit full semantic XML structure) in OJS
  • Implement XML-to-PDF and XML-to-HTML rendering
  • Add document conversion capability to OJS
    • automatic document conversion during submission process (*.*)->(*.odt) to allow automatic extraction for more formats

Q4 2010

  • Add L8X's meta-data extraction to OJS
    • automatic metadata extraction from ODT in submission process
  • Market migrated parsing/lookup code as a standalone library

Not yet scheduled

Additional Use Cases

  • Copyediting: Author match between the name used in body of the text and name used in the citations, as per spelling and reference link between text and bibliography (author with no reference; reference with no link to body of the text);
  • Copyediting: Quotation checking, where a quote in the body of the text is checked against the web for accuracy, with candidates proposed for comparison and correction, as well as reference checking;
  • Plagiarism: Random check of not-quoted bits of text for matches and possible plagiarism.

Usability

  • Let users "lock" citations once they are in their final state. Locked citations won't be overwritten by parser or lookup results.
  • Introduce a "batch processing mode" for citation parsing/lookup
    • keep the application responsive while citation parsing is going on in the background
    • do citation parsing/lookup during off-hours (e.g. every night)

Document Parsing

  • Let users configure "content types" (document types) to improve parsing and reduce manual work for batches of similar documents
  1. extract styles from sample document
  2. extract sections from sample document
  3. let user attribute semantic information to styles and sections (e.g. first section = always contains author information)
  4. parse document (metadata, citations, structure) batch based on these specific user definitions
  • Additional file conversion based on plugins: XSLT, ICE, GD, ImageMagick, etc.
  • Integrate OpenCalais service for metadata identification and extraction.
    • using OpenCalais on the full-text of an article is less accurate, though it does a pretty good job of finding entities
    • use L8X to detect the front, body, and back matter of a document, then:
      1. send the front matter to Calais to be broken into metadata (more accurately than we do now)
      2. send the back matter to the L8X citation handling and associated parse/lookup services
      3. send the body to eg. Lucene for full-text indexing and/or Calais for automatic keyword assignment (this works well, eg. with medical terms in MeSH, etc.)

Citation Parsing

  • Use machine-learning approaches (e.g. data mining/classifiers) to improve parser results

Citation Lookup

  • Integrate more citation lookup services: OAIster, CiteSeer, Amazon, LibraryThing, OpenLibrary, SRU/SRW, Z39.50
  • generic OAI-DC: maybe with a local Harvester as meta-data cache and as a search interface?
  • Port source adapters from Umlaut project, see http://umlaut.rubyforge.org/.

Citation Output

  • Implement citation output plug-ins for Chicago Manual of Style, American Medical Association, American Sociological Association and Council of Science Editors (see mails to pkp-support from Mark and John, 20/10/2009)
  • Auto-COinS plugin (WAL): generate COinS in HTML/abstract view for marked references in textarea
  • Apply reading tools to references within articles (provide additional information about cited works in RT sidebar)

Document Export

  • Additional XML schemas for export

Backporting to other Applications

  • Extend L8X functionality to OCS and OMP
  • Add citation support to Harvester
    • If a metadata element in Harvester looks like a citation, parse the citation and render it in HTML with COinS
    • use Harvester to retrieve additional citation meta-data that will be attached to the meta-data we already retrieve (i.e. every single harvester record may contain or point to additional citation records)

Additional Requirements

  • No new initial installation requirements
  • Maintain PHP4 compatibility for initial installation, new installation requirements (additional software, PHP>4) only for optional plug-ins - a notable example being the citation editor/parser/lookup which requires at least PHP5.0
  • Thorough documentation of additional installation / runtime environment requirements
  • Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)
  • Closely integrate with OMP to make sure that the GUI components will work in OMP without adaptation
  • All contributions should be fully unit-test covered
  • All workflows should be fully web-test covered