Talk:Lemon8-XML Roadmap

From PKP Wiki
Revision as of 20:12, 8 October 2009 by (Talk | contribs) (document implementation difference in move up/down)

Jump to: navigation, search

Priorities (descending)

  • Add L8X's citation parsing, lookup/correction and editing functionality to OJS
  • Add L8X's metadata extraction to OJS
  • Add document parsing/editing capability to OJS
  • Add document conversion capability to OJS

Proposed Features / Integration Points (by descending priority)

  1. automatic citation lookup and editing in submission process ("citation box")
  2. addition of citation data in XML export (e.g. for PubMed, Synergies, and CrossRef)
  3. allow readers to view citations in multiple citation formats (including Zotero integration)
  4. generation of COinS (Context Object in Span) from citations
  5. add citation editing and lookup to editorial process
  6. automatic citation data extraction from ODT in submission process
  7. automatic metadata extraction from ODT in submission process
  8. add section parser / editor to editorial process (generate and edit full semantic XML structure) in OJS
  9. add automatic document conversion to submission process * -> ODT to allow automatic extraction for more formats

Proposed Architecture (citation integration only)

Citation Backend Services Library

  • Move L8X citation parser components to pkp/classes/citation/CitationParser*.inc.php
  • Move L8X citation lookup components to pkp/classes/citation/CitationLookup*.inc.php
  • Specific implementations extend a base object that enforces the API contract (template pattern), interfaces are no-go in PHP4
  • Make sure that the API can be used by all PKP applications
  • Use migrated code in L8X standalone
  • Make sure that the components can be integrated/extended for metadata/section parsers/editors later

Citation DAO Library

  • We can use the usual PKP DAO pattern for all citation data persistence requirements

Citation GUI Pages

  • We might need an extra step in the submission process and an extra page in the editorial process for citation editing/lookup
  • Pages have to be application specific so we cannot usually share them between applications.
  • We'll however try to move as much as possible to the GUI components library for re-use, the page will only consist of a very high-level outline (GUI components library)
  • Apart from citation editing I don't think we'll invent new pages, it's more about integrating new components into existing pages.

Citation GUI Components Library

L8X editing capability is a lot more demanding than anything I know so far in OCS/OJS.

My bet is that 90% of the migration effort will go into the GUI migration (MJ, can you comment, please?). We have to port from scriptaculous to jQuery and from CakePHP's MVC-implementation to PKP's (including smarty). To achieve re-use between PKP applications and between pages we'll have to "componentize" the GUI more than it currently is in L8X.

These are my ideas for the GUI architecture:

  • Create L8X-specific GUI components and template fragments in WAL (e.g. citation editor component, re-use in all PKP applications)
  • Create an L8X citation renderer template library
    • One smarty template per citation style, including COinS
    • Migrate COinS plugin to use COinS template fragment

AJAX Request Architecture

  • We should probably think of an AJAX specific high-performance MVC controller architecture. This means to implement shortcuts in the request processing for AJAX requests wherever possible (performance bottleneck!)
  • Both, the AJAX handler and the Page handler will be based on the same base classes but will extend them differently
  • Make sure, there is no AJAX security bypass of course (maintain the single point of entry + common security infrastructure for all types of request)

Installation Requirements and Compatibility

  • No new initial installation requirements
  • Maintain PHP4 compatibility for initial installation
  • New installation requirements (additional software, PHP>4) only for optional plug-ins
  • Thorough documentation of additional installation / runtime environment requirements
  • Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)
  • Make sure that L8X standalone will continue working/improving by cleanly backporting/integrating migrated code to L8X (DRY!)
  • Use standard UI technology to make sure that backport of new OMP GUI will be easier
  • Comments have to follow Doxygen syntax

GUI specification for Feature #1 (citation support in submission process)

Citation Extraction/Insertion

  1. must-have: copy & paste
    We can use the existing text field in the submission process for "bulk citation insert":
    • enter citations (text-only) -> disable TinyMCE-plugin for citation field
    • "parse" button will split up citations (one per line?) and send them to the configured parser services
    • new: parsing should be non-blocking if possible - alternatively: a progress bar should appear
    • citation editor appears as soon as citations have been recognized
  2. optional: automatic extraction
    Use document parser to extract citations:
    • recognize .odt file type and try to extract citations
    • show citations in citation editor if citations have been found

Citation Parsing/Editing/Lookup

I think the current citation editor GUI is already very good. It has the following functionality:

  • open/close citation details (current bug: opening details for one citation should close all other citations)
  • save citation details
  • parse/lookup citation
  • text field for editing the unparsed text
  • moving citations up and down
  • new: allow users to move a citation anywhere in the editor
  • remove citation
  • add citation

Plugin Configuration

  • enable/disable L8X citation parsing
  • enable/disable automatic citation extraction
  • select/configure parsing services
  • select/configure lookup services


  1. citation insertion
    • use existing text-area for input in submission step 2 (metadata)
    • "parse"-button triggers AJAX request that will insert the citation editor on the same page
    • open: non-blocking AJAX-request / progress-bar
  2. citation extraction
    • use full-page-request on file upload
    • if citations have been found then display a check-box (default: on) to enable citation extraction
    • citation editor will automatically appear in step two (metadata) with the extracted citations
  3. citation editor
    • port existing GUI to jQuery
    • "edit" triggers an AJAX request that inserts the citation field editor
    • "edit" closes other open field editor (if any)
    • "edit" for an open citation closes it
    • implement dirty-pattern to avoid losing user-data on editor close
    • "save details" and "save citation text" will become one single button ("save")
    • "save" triggers an AJAX request that persists citation text and citation fields to the database
    • "parse citation" and "lookup citation" will become one single button ("lookup")
    • "lookup" triggers an AJAX request for parsing and lookup that inserts lookup data into fields and provides the user with feedback for the parsing/lookup score
    • unparsed citation is implemented as text area
    • citation fields are implemented as input fields
    • "move up" and "move down" trigger AJAX requests that update the GUI accordingly (this is different from current implementation which triggers a full-page request that is not really usable)
    • "insert before" is a drop-down field that shows all citations by number, it has an entry "at the end..."
    • "remove citation" triggers an alert "do you really want to remove ...citation title...?" - if confirmed, an AJAX request will be triggered that persists the removal
    • "add citation" triggers an AJAX request that inserts a new citation into the GUI and opens the citation editor with empty fields
    • make sure that GUI conforms to PKP's standard design re-using existing CSS wherever possible
  4. configuration
    • check-box in setup - step4: enable/disable L8X citation parsing (if jQuery support is enabled then this will trigger the other options to appear)
    • check-box in setup - step4: enable/disable automatic citation extraction (available only if L8X parsing is enabled)
    • select/configure parsing/lookup services: use the existing GUI elements from L8X (no AJAX required, if jQuery support is enabled then dependent sub-options will only appear when the main service is enabled)

Next Steps

  • specify backend services / DAOs
  • co-ordinate with OMP-development
  • specify AJAX request architecture
  • specify GUI fragments/AJAX components
  • get specification approval from Alec, Brian, ...
  • start coding

@Alec: From my side the specification for this first feature can be complete for approval until (at the latest) Wednesday next week. The only limiting factor is co-ordination with OMP-development.

Further Ideas (Attic)

  • Don't kill L8X as a standalone application, integrate it with PKP WAL
  • Package/brand/SEOize parser/lookup library separately for re-use in other document based OSS applications (ECM)
  • Let users configure "content types" (document types) to improve parsing and reduce manual work for batches of similar documents
  1. extract styles from sample document
  2. extract sections from sample document
  3. let user attribute semantic information to styles and sections (e.g. first section = always contains author information)
  4. parse document (metadata, citations, structure) batch based on these specific user definitions
  • Integrate more citation lookup services
  • Additional XML schemas for export
  • Additional file conversion based on plugins: XSLT, ICE, GD, ImageMagick, etc.
  • Improve support for metadata schemas
  • Use machine-learning approaches (data mining technology) to improve parser robustness (citations, document structure, metadata)
  • Introduce a "batch processing mode" for citation parsing/lookup
    • keep the application responsive while citation parsing is going on in the background
    • do citation parsing/lookup during off-hours (e.g. every night)