Talk:Lemon8-XML Roadmap

From PKP Wiki
Revision as of 18:47, 7 October 2009 by Jerico.dev (Talk | contribs)

Jump to: navigation, search

Priorities (descending)

  • Add L8X's citation parsing, lookup/correction and editing functionality to OJS
  • Add L8X's metadata extraction to OJS
  • Add document parsing/editing capability to OJS
  • Add document conversion capability to OJS
  • Don't kill L8X as a standalone application, integrate it with PKP WAL
  • Package/brand/SEOize parser/lookup library separately for re-use in other document based OSS applications (ECM)

Proposed Features / Integration Points (by descending priority)

  • automatic citation lookup and editing in submission process ("citation box")
  • allow readers to view citations in multiple citation formats (including Zotero integration)
  • addition of citation data in XML export (e.g. for PubMed, Synergies, and CrossRef)
  • generation of COinS (Context Object in Span) from citations
  • add citation editing and lookup to editorial process
  • automatic citation data extraction from ODT in submission process
  • automatic metadata extraction from ODT in submission process
  • add section parser / editor to editorial process (generate and edit full semantic XML structure) in OJS
  • add automatic document conversion to submission process * -> ODT to allow automatic extraction for more formats

Proposed Architecture (citation integration only)

Citation Backend Services Library

  • Move L8X citation parser components to pkp/classes/citation/CitationParser*.inc.php
  • Move L8X citation lookup components to pkp/classes/citation/CitationLookup*.inc.php
  • Specific implementations extend a base object that enforces the API contract (template pattern), interfaces are no-go in PHP4
  • Make sure that the API can be used by all PKP applications
  • Use migrated code in L8X standalone
  • Make sure that the components can be integrated/extended for metadata/section parsers/editors later

Citation DAO Library

  • We can use the usual PKP DAO pattern for all citation data persistence requirements

Citation GUI Pages

  • We might need an extra step in the submission process and an extra page in the editorial process for citation editing/lookup
  • Pages have to be application specific so we cannot usually share them between applications.
  • We'll however try to move as much as possible to the GUI components library for re-use, the page will only consist of a very high-level outline (GUI components library)
  • Apart from citation editing I don't think we'll invent new pages, it's more about integrating new components into existing pages.

Citation GUI Components Library

L8X editing capability is a lot more demanding than anything I know so far in OCS/OJS. I didn't look at OMP so far (Alec, can you comment?)

Anyway... I think we'll have to think thoroughly about the GUI component library before we start implementing it.

My bet is that 90% of the migration effort will go into the GUI migration (MJ, can you comment, please?). We have to port from scriptaculous to jQuery and from CakePHP's MVC-implementation to PKP's (including smarty). To achieve re-use between PKP applications and between pages we'll have to "componentize" the GUI more than it currently is in L8X.

These are my ideas for the GUI architecture:

  • Create a low-level AJAX-based GUI component library in WAL (re-usable in all PKP applications)
  • Create re-usable high-level template fragments in WAL (e.g. citation editor component, re-use in all PKP applications)
  • Create an L8X citation renderer template library
    • One smarty template per citation style, including COinS
    • Migrate COinS plugin to use COinS template fragment

AJAX Request Architecture

  • We should probably think of an AJAX specific high-performance MVC controller architecture. This means to implement shortcuts in the request processing for AJAX requests wherever possible (performance bottleneck!)
  • Both, the AJAX handler and the Page handler will be based on the same base classes but will extend them differently
  • Make sure, there is no AJAX security bypass of course (maintain the single point of entry + common security infrastructure for all types of request)

Next Steps

  • Thoroughly analyze GUI requirements
  • Define integration points
  • Specify GUI library components

Installation Requirements and Compatibility

  • No new initial installation requirements
  • Maintain PHP4 compatibility for initial installation
  • New installation requirements (additional software, PHP>4) only for optional plug-ins
  • Thorough documentation of additional installation / runtime environment requirements
  • Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)
  • Make sure that L8X standalone will continue working/improving by cleanly backporting/integrating migrated code to L8X (DRY!)
  • Use standard UI technology to make sure that backport of new OMP GUI will be easier
  • Comments have to follow Doxygen syntax (Alec, which tags are required?)

Further Ideas (Attic)

  • Let users configure "content types" (document types) to improve parsing and reduce manual work for batches of similar documents
  1. extract styles from sample document
  2. extract sections from sample document
  3. let user attribute semantic information to styles and sections (e.g. first section = always contains author information)
  4. parse document (metadata, citations, structure) batch based on these specific user definitions
  • Integrate more citation lookup services
  • Additional XML schemas for export
  • Additional file conversion based on plugins: XSLT, ICE, GD, ImageMagick, etc.
  • Improve support for metadata schemas
  • Use machine-learning approaches (data mining technology) to improve parser robustness (citations, document structure, metadata)
  • Introduce a "batch processing mode" for citation parsing/lookup
    • keep the application responsive while citation parsing is going on in the background
    • do citation parsing/lookup during off-hours (e.g. every night)