Talk:Lemon8-XML Roadmap
Contents
- 1 Priorities (descending)
- 2 Proposed Features / Integration Points (by descending priority)
- 3 Proposed Architecture (citation integration only)
- 4 Installation Requirements and Compatibility
- 5 GUI specification for Feature #1 (citation support in submission process)
- 6 Next Steps
- 7 Further Ideas (Attic)
Priorities (descending)
- Add L8X's citation parsing, lookup/correction and editing functionality to OJS
- Add L8X's metadata extraction to OJS
- Add document parsing/editing capability to OJS
- Add document conversion capability to OJS
Proposed Features / Integration Points (by descending priority)
- automatic citation lookup and editing in submission process ("citation box")
- addition of citation data in XML export (e.g. for PubMed, Synergies, and CrossRef)
- allow readers to view citations in multiple citation formats (including Zotero integration)
- generation of COinS (Context Object in Span) from citations
- add citation editing and lookup to editorial process
- automatic citation data extraction from ODT in submission process
- automatic metadata extraction from ODT in submission process
- add section parser / editor to editorial process (generate and edit full semantic XML structure) in OJS
- add automatic document conversion to submission process * -> ODT to allow automatic extraction for more formats
Proposed Architecture (citation integration only)
Citation Backend Services Library
- Move L8X citation parser components to pkp/classes/citation/CitationParser*.inc.php
- Move L8X citation lookup components to pkp/classes/citation/CitationLookup*.inc.php
- Specific implementations extend a base object that enforces the API contract (template pattern), interfaces are no-go in PHP4
- Make sure that the API can be used by all PKP applications
- Use migrated code in L8X standalone
- Make sure that the components can be integrated/extended for metadata/section parsers/editors later
Citation DAO Library
- We can use the usual PKP DAO pattern for all citation data persistence requirements
Citation GUI Pages
- We might need an extra step in the submission process and an extra page in the editorial process for citation editing/lookup
- Pages have to be application specific so we cannot usually share them between applications.
- We'll however try to move as much as possible to the GUI components library for re-use, the page will only consist of a very high-level outline (GUI components library)
- Apart from citation editing I don't think we'll invent new pages, it's more about integrating new components into existing pages.
Citation GUI Components Library
L8X editing capability is a lot more demanding than anything I know so far in OCS/OJS.
My bet is that 90% of the migration effort will go into the GUI migration (MJ, can you comment, please?). We have to port from scriptaculous to jQuery and from CakePHP's MVC-implementation to PKP's (including smarty). To achieve re-use between PKP applications and between pages we'll have to "componentize" the GUI more than it currently is in L8X.
These are my ideas for the GUI architecture:
- Create L8X-specific GUI components and template fragments in WAL (e.g. citation editor component, re-use in all PKP applications)
- Create an L8X citation renderer template library
- One smarty template per citation style, including COinS
- Migrate COinS plugin to use COinS template fragment
AJAX Request Architecture
- We should probably think of an AJAX specific high-performance MVC controller architecture. This means to implement shortcuts in the request processing for AJAX requests wherever possible (performance bottleneck!)
- Both, the AJAX handler and the Page handler will be based on the same base classes but will extend them differently
- Make sure, there is no AJAX security bypass of course (maintain the single point of entry + common security infrastructure for all types of request)
Installation Requirements and Compatibility
- No new initial installation requirements
- Maintain PHP4 compatibility for initial installation
- New installation requirements (additional software, PHP>4) only for optional plug-ins
- Thorough documentation of additional installation / runtime environment requirements
- Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)
- Make sure that L8X standalone will continue working/improving by cleanly backporting/integrating migrated code to L8X (DRY!)
- Use standard UI technology to make sure that backport of new OMP GUI will be easier
- Comments have to follow Doxygen syntax
- See e.g. http://pkp.sfu.ca/cvs/cvsweb.cgi/ojs2/classes/article/Article.inc.php?rev=1.48;content-type=text%2Fplain for a standard code header
- Functions should at least include a general description as well as @param and @return tags as necessary.
GUI specification for Feature #1 (citation support in submission process)
Citation Extraction/Insertion
- must-have: copy & paste
We can use the existing text field in the submission process for "bulk citation insert":- enter citations (text-only) -> disable TinyMCE-plugin for citation field
- "parse" button will split up citations (one per line?) and send them to the configured parser services
- new: parsing should be non-blocking if possible - alternatively: a progress bar should appear
- citation editor appears as soon as citations have been recognized
- optional: automatic extraction
Use document parser to extract citations:- recognize .odt file type and try to extract citations
- show citations in citation editor if citations have been found
Citation Parsing/Editing/Lookup
I think the current citation editor GUI is already very good. It has the following functionality:
- open/close citation details (current bug: opening details for one citation should close all other citations)
- save citation details
- parse/lookup citation
- text field for editing the unparsed text
- moving citations up and down
- new: allow users to move a citation anywhere in the editor
- remove citation
- add citation
Plugin Configuration
- enable/disable L8X citation parsing
- enable/disable automatic citation extraction
- select/configure parsing services
- select/configure lookup services
Implementation
- citation insertion
- use existing text-area for input in submission step 2 (metadata)
- "parse"-button triggers AJAX request that will insert the citation editor on the same page
- open: non-blocking AJAX-request / progress-bar
- citation extraction
- use full-page-request on file upload
- if citations have been found then display a check-box (default: on) to enable citation extraction
- citation editor will automatically appear in step two (metadata) with the extracted citations
- citation editor
- port existing GUI to jQuery
- "edit" triggers an AJAX request that inserts the citation field editor
- "edit" closes other open field editor (if any)
- "edit" for an open citation closes it
- implement dirty-pattern to avoid losing user-data on editor close
- "save details" and "save citation text" will become one single button ("save")
- "save" triggers an AJAX request that persists citation text and citation fields to the database
- "parse citation" and "lookup citation" will become one single button ("lookup")
- "lookup" triggers an AJAX request for parsing and lookup that inserts lookup data into fields and provides the user with feedback for the parsing/lookup score
- unparsed citation is implemented as text area
- citation fields are implemented as input fields
- "move up" and "move down" trigger AJAX requests that update the GUI accordingly
- "insert before" is a drop-down field that shows all citations by number, it has an entry "at the end..."
- "remove citation" triggers an alert "do you really want to remove ...citation title...?" - if confirmed, an AJAX request will be triggered that persists the removal
- "add citation" triggers an AJAX request that inserts a new citation into the GUI and opens the citation editor with empty fields
- make sure that GUI conforms to PKP's standard design re-using existing CSS wherever possible
- configuration
- check-box in setup - step4: enable/disable L8X citation parsing (if jQuery support is enabled then this will trigger the other options to appear)
- check-box in setup - step4: enable/disable automatic citation extraction (available only if L8X parsing is enabled)
- select/configure parsing/lookup services: use the existing GUI elements from L8X (no AJAX required, if jQuery support is enabled then dependent sub-options will only appear when the main service is enabled)
Next Steps
- specify backend services / DAOs
- co-ordinate with OMP-development
- specify AJAX request architecture
- specify GUI fragments/AJAX components
- get specification approval from Alec, Brian, ...
- start coding
@Alec: From my side the specification for this first feature can be complete for approval until (at the latest) Wednesday next week. The only limiting factor is co-ordination with OMP-development.
Further Ideas (Attic)
- Don't kill L8X as a standalone application, integrate it with PKP WAL
- Package/brand/SEOize parser/lookup library separately for re-use in other document based OSS applications (ECM)
- Let users configure "content types" (document types) to improve parsing and reduce manual work for batches of similar documents
- extract styles from sample document
- extract sections from sample document
- let user attribute semantic information to styles and sections (e.g. first section = always contains author information)
- parse document (metadata, citations, structure) batch based on these specific user definitions
- Integrate more citation lookup services
- Additional XML schemas for export
- Additional file conversion based on plugins: XSLT, ICE, GD, ImageMagick, etc.
- Improve support for metadata schemas
- Use machine-learning approaches (data mining technology) to improve parser robustness (citations, document structure, metadata)
- Introduce a "batch processing mode" for citation parsing/lookup
- keep the application responsive while citation parsing is going on in the background
- do citation parsing/lookup during off-hours (e.g. every night)