Difference between revisions of "Talk:Lemon8-XML Roadmap"

From PKP Wiki
Jump to: navigation, search
m (Installation Requirements and Compatibility)
m (Moved "additional ideas" to main page.)
 
(47 intermediate revisions by 9 users not shown)
Line 1: Line 1:
==Priorities (descending)==
+
=GUI specification Citation support in submission/editorial process=
* Add L8X's citation parsing, lookup/correction and editing functionality to OJS
+
* Add L8X's metadata extraction to OJS
+
* Add document parsing/editing capability to OJS
+
* Add document conversion capability to OJS
+
* Don't kill L8X as a standalone application, integrate it with PKP WAL
+
* Package/brand/SEOize parser/lookup library separately for re-use in other document based OSS applications (ECM)
+
  
==Proposed Features / Integration Points (by descending priority)==
+
==Citation Extraction/Insertion==
* automatic citation lookup and editing in submission process ("citation box")
+
* allow readers to view citations in multiple citation formats (including Zotero integration)
+
* addition of citation data in XML export (e.g. for PubMed, Synergies, and CrossRef)
+
* generation of COinS (Context Object in Span) from citations
+
* add citation editing and lookup to editorial process
+
* automatic citation data extraction from ODT in submission process
+
* automatic metadata extraction from ODT in submission process
+
* add section parser / editor to editorial process (generate and edit full semantic XML structure) in OJS
+
* add automatic document conversion to submission process * -> ODT to allow automatic extraction for more formats
+
  
==Proposed Architecture (citation integration only)==
+
# must-have: copy & paste<br/>We can use the existing text field in the submission process for "bulk citation insert":
 +
#* enter citations (text-only) -> disable TinyMCE-plugin for citation field
 +
#* "parse" button will split up citations (one per line?) and send them to the configured parser services
 +
#* new: parsing should be non-blocking if possible - alternatively: a progress bar should appear
 +
#* citation editor appears as soon as citations have been recognized
 +
# optional: automatic extraction<br/>Use document parser to extract citations:
 +
#* recognize .odt file type and try to extract citations
 +
#* show citations in citation editor if citations have been found
  
===Citation Backend Services Library===
+
==Citation Parsing/Editing/Lookup==
* Move L8X citation parser components to pkp/classes/citation/CitationParser*.inc.php
+
* Move L8X citation lookup components to pkp/classes/citation/CitationLookup*.inc.php
+
* Specific implementations extend a base object that enforces the API contract (template pattern), interfaces are no-go in PHP4
+
* Make sure that the API can be used by all PKP applications
+
* Use migrated code in L8X standalone
+
* Make sure that the components can be integrated/extended for metadata/section parsers/editors later
+
  
===Citation DAO Library===
+
The current citation editor GUI is already very good. It has the following functionality:
* We can use the usual PKP DAO pattern for all citation data persistence requirements
+
* open/close citation details (current bug: opening details for one citation should close all other citations)
 +
* save citation details
 +
* parse/lookup citation
 +
* text field for editing the unparsed text
 +
* moving citations up and down
 +
* new: allow users to move a citation anywhere in the editor
 +
* remove citation
 +
* add citation
  
===Citation GUI Pages===
+
==Plugin Configuration==
* We might need an extra step in the submission process and an extra page in the editorial process for citation editing/lookup
+
* enable/disable L8X citation parsing
* Pages have to be application specific so we cannot usually share them between applications.
+
* enable/disable automatic citation extraction
* We'll however try to move as much as possible to the GUI components library for re-use, the page will only consist of a very high-level outline (GUI components library)
+
* select/configure parsing services
* Apart from citation editing I don't think we'll invent new pages, it's more about integrating new components into existing pages.
+
* select/configure lookup services
  
===Citation GUI Components Library===
+
==Implementation==
L8X editing capability is a lot more demanding than anything I know so far in OCS/OJS. I didn't look at OMP so far (Alec, can you comment?)
+
  
Anyway... I think we'll have to think thoroughly about the GUI component library before we start implementing it.
+
# citation insertion
 +
#* use existing text-area for input in submission step 2 (metadata)
 +
#* "parse"-button triggers AJAX request that will insert the citation editor on the same page
 +
#* open: non-blocking AJAX-request / progress-bar
 +
# citation extraction
 +
#* use full-page-request on file upload
 +
#* if citations have been found then display a check-box (default: on) to enable citation extraction
 +
#* citation editor will automatically appear in step two (metadata) with the extracted citations
 +
# citation editor
 +
#* port existing GUI to jQuery
 +
#* "edit" triggers an AJAX request that inserts the citation field editor
 +
#* "edit" closes other open field editor (if any)
 +
#* "edit" for an open citation closes it
 +
#* implement dirty-pattern to avoid losing user-data on editor close
 +
#* "save details" and "save citation text" will become one single button ("save")
 +
#* "save" triggers an AJAX request that persists citation text and citation fields to the database
 +
#* "parse citation" and "lookup citation" will become one single button ("lookup")
 +
#* "lookup" triggers an AJAX request for parsing and lookup that inserts lookup data into fields and provides the user with feedback for the parsing/lookup score
 +
#* unparsed citation is implemented as text area
 +
#* citation fields are implemented as input fields
 +
#* "move up" and "move down" trigger AJAX requests that update the GUI accordingly (this is different from current implementation which triggers a full-page request that is not really usable)
 +
#* "insert before" is a drop-down field that shows all citations by number, it has an entry "at the end..."
 +
#* "remove citation" triggers an alert "do you really want to remove ...citation title...?" - if confirmed, an AJAX request will be triggered that persists the removal
 +
#* "add citation" triggers an AJAX request that inserts a new citation into the GUI and opens the citation editor with empty fields
 +
#* make sure that GUI conforms to PKP's standard design re-using existing CSS wherever possible
 +
# configuration
 +
#* check-box in setup - step4: enable/disable L8X citation parsing (if jQuery support is enabled then this will trigger the other options to appear)
 +
#* check-box in setup - step4: enable/disable automatic citation extraction (available only if L8X parsing is enabled)
 +
#* select/configure parsing/lookup services: use the existing GUI elements from L8X (no AJAX required, if jQuery support is enabled then dependent sub-options will only appear when the main service is enabled)
  
My bet is that 90% of the migration effort will go into the GUI migration (MJ, can you comment, please?). We have to port from scriptaculous to jQuery and from CakePHP's MVC-implementation to PKP's (including smarty). To achieve re-use between PKP applications and between pages we'll have to "componentize" the GUI more than it currently is in L8X.
+
=The Role of Plug-ins=
 +
When to do we need plug-ins?
 +
* non-standard installation requirements that need to be isolated
 +
* complex configuration or user-interface requirements that clutter the core interface for first-time users and should be kept out of the way
 +
* performance implications (when switching on a functionality causes a non-avoidable performance onus)
 +
* isolation of application-specific citation adapter code in one place to keep the core code clean -> improved code modularization and maintainability
  
These are my ideas for the GUI architecture:
+
Where to place citation plug-ins?
* Create a low-level AJAX-based GUI component library in WAL (re-usable in all PKP applications)
+
* We'll have to create a new citation plugin category if we need additional hooks or have many plug-ins.
* Create re-usable high-level template fragments in WAL (e.g. citation editor component, re-use in all PKP applications)
+
* Otherwise we prefer to use existing categories.
* Create an L8X citation renderer template library
+
** One smarty template per citation style, including COinS
+
** Migrate COinS plugin to use COinS template fragment
+
 
+
===AJAX Request Architecture===
+
* We should probably think of an AJAX specific high-performance MVC controller architecture. This means to implement shortcuts in the request processing for AJAX requests wherever possible (performance bottleneck!)
+
* Both, the AJAX handler and the Page handler will be based on the same base classes but will extend them differently
+
* Make sure, there is no AJAX security bypass of course (maintain the single point of entry + common security infrastructure for all types of request)
+
 
+
==Next Steps==
+
* Thoroughly analyze GUI requirements
+
* Define integration points
+
* Specify GUI library components
+
 
+
==Installation Requirements and Compatibility==
+
* No new initial installation requirements
+
* Maintain PHP4 compatibility for initial installation
+
* New installation requirements (additional software, PHP>4) only for optional plug-ins
+
* Thorough documentation of additional installation / runtime environment requirements
+
* Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)
+
* Make sure that L8X standalone will continue working/improving by cleanly backporting/integrating migrated code to L8X (DRY!)
+
* Use standard UI technology to make sure that backport of new OMP GUI will be easier
+
* Comments have to follow Doxygen syntax
+
** See e.g. http://pkp.sfu.ca/cvs/cvsweb.cgi/ojs2/classes/article/Article.inc.php?rev=1.48;content-type=text%2Fplain for a standard code header
+
** Functions should at least include a general description as well as @param and @return tags as necessary.
+
 
+
==Further Ideas (Attic)==
+
* Let users configure "content types" (document types) to improve parsing and reduce manual work for batches of similar documents
+
# extract styles from sample document
+
# extract sections from sample document
+
# let user attribute semantic information to styles and sections (e.g. first section = always contains author information)
+
# parse document (metadata, citations, structure) batch based on these specific user definitions
+
* Integrate more citation lookup services
+
* Additional XML schemas for export
+
* Additional file conversion based on plugins: XSLT, ICE, GD, ImageMagick, etc.
+
* Improve support for metadata schemas
+
* Use machine-learning approaches (data mining technology) to improve parser robustness (citations, document structure, metadata)
+
* Introduce a "batch processing mode" for citation parsing/lookup
+
** keep the application responsive while citation parsing is going on in the background
+
** do citation parsing/lookup during off-hours (e.g. every night)
+

Latest revision as of 11:42, 3 February 2010

GUI specification Citation support in submission/editorial process

Citation Extraction/Insertion

  1. must-have: copy & paste
    We can use the existing text field in the submission process for "bulk citation insert":
    • enter citations (text-only) -> disable TinyMCE-plugin for citation field
    • "parse" button will split up citations (one per line?) and send them to the configured parser services
    • new: parsing should be non-blocking if possible - alternatively: a progress bar should appear
    • citation editor appears as soon as citations have been recognized
  2. optional: automatic extraction
    Use document parser to extract citations:
    • recognize .odt file type and try to extract citations
    • show citations in citation editor if citations have been found

Citation Parsing/Editing/Lookup

The current citation editor GUI is already very good. It has the following functionality:

  • open/close citation details (current bug: opening details for one citation should close all other citations)
  • save citation details
  • parse/lookup citation
  • text field for editing the unparsed text
  • moving citations up and down
  • new: allow users to move a citation anywhere in the editor
  • remove citation
  • add citation

Plugin Configuration

  • enable/disable L8X citation parsing
  • enable/disable automatic citation extraction
  • select/configure parsing services
  • select/configure lookup services

Implementation

  1. citation insertion
    • use existing text-area for input in submission step 2 (metadata)
    • "parse"-button triggers AJAX request that will insert the citation editor on the same page
    • open: non-blocking AJAX-request / progress-bar
  2. citation extraction
    • use full-page-request on file upload
    • if citations have been found then display a check-box (default: on) to enable citation extraction
    • citation editor will automatically appear in step two (metadata) with the extracted citations
  3. citation editor
    • port existing GUI to jQuery
    • "edit" triggers an AJAX request that inserts the citation field editor
    • "edit" closes other open field editor (if any)
    • "edit" for an open citation closes it
    • implement dirty-pattern to avoid losing user-data on editor close
    • "save details" and "save citation text" will become one single button ("save")
    • "save" triggers an AJAX request that persists citation text and citation fields to the database
    • "parse citation" and "lookup citation" will become one single button ("lookup")
    • "lookup" triggers an AJAX request for parsing and lookup that inserts lookup data into fields and provides the user with feedback for the parsing/lookup score
    • unparsed citation is implemented as text area
    • citation fields are implemented as input fields
    • "move up" and "move down" trigger AJAX requests that update the GUI accordingly (this is different from current implementation which triggers a full-page request that is not really usable)
    • "insert before" is a drop-down field that shows all citations by number, it has an entry "at the end..."
    • "remove citation" triggers an alert "do you really want to remove ...citation title...?" - if confirmed, an AJAX request will be triggered that persists the removal
    • "add citation" triggers an AJAX request that inserts a new citation into the GUI and opens the citation editor with empty fields
    • make sure that GUI conforms to PKP's standard design re-using existing CSS wherever possible
  4. configuration
    • check-box in setup - step4: enable/disable L8X citation parsing (if jQuery support is enabled then this will trigger the other options to appear)
    • check-box in setup - step4: enable/disable automatic citation extraction (available only if L8X parsing is enabled)
    • select/configure parsing/lookup services: use the existing GUI elements from L8X (no AJAX required, if jQuery support is enabled then dependent sub-options will only appear when the main service is enabled)

The Role of Plug-ins

When to do we need plug-ins?

  • non-standard installation requirements that need to be isolated
  • complex configuration or user-interface requirements that clutter the core interface for first-time users and should be kept out of the way
  • performance implications (when switching on a functionality causes a non-avoidable performance onus)
  • isolation of application-specific citation adapter code in one place to keep the core code clean -> improved code modularization and maintainability

Where to place citation plug-ins?

  • We'll have to create a new citation plugin category if we need additional hooks or have many plug-ins.
  • Otherwise we prefer to use existing categories.