Talk:Lemon8-XML Roadmap
From PKP Wiki
Revision as of 18:30, 8 October 2009 by Jerico.dev (Talk | contribs)
Contents |
Priorities (descending)
- Add L8X's citation parsing, lookup/correction and editing functionality to OJS
- Add L8X's metadata extraction to OJS
- Add document parsing/editing capability to OJS
- Add document conversion capability to OJS
Proposed Features / Integration Points (by descending priority)
- automatic citation lookup and editing in submission process ("citation box")
- addition of citation data in XML export (e.g. for PubMed, Synergies, and CrossRef)
- allow readers to view citations in multiple citation formats (including Zotero integration)
- generation of COinS (Context Object in Span) from citations
- add citation editing and lookup to editorial process
- automatic citation data extraction from ODT in submission process
- automatic metadata extraction from ODT in submission process
- add section parser / editor to editorial process (generate and edit full semantic XML structure) in OJS
- add automatic document conversion to submission process * -> ODT to allow automatic extraction for more formats
Proposed Architecture (citation integration only)
Citation Backend Services Library
- Move L8X citation parser components to pkp/classes/citation/CitationParser*.inc.php
- Move L8X citation lookup components to pkp/classes/citation/CitationLookup*.inc.php
- Specific implementations extend a base object that enforces the API contract (template pattern), interfaces are no-go in PHP4
- Make sure that the API can be used by all PKP applications
- Use migrated code in L8X standalone
- Make sure that the components can be integrated/extended for metadata/section parsers/editors later
Citation DAO Library
- We can use the usual PKP DAO pattern for all citation data persistence requirements
Citation GUI Pages
- We might need an extra step in the submission process and an extra page in the editorial process for citation editing/lookup
- Pages have to be application specific so we cannot usually share them between applications.
- We'll however try to move as much as possible to the GUI components library for re-use, the page will only consist of a very high-level outline (GUI components library)
- Apart from citation editing I don't think we'll invent new pages, it's more about integrating new components into existing pages.
Citation GUI Components Library
L8X editing capability is a lot more demanding than anything I know so far in OCS/OJS.
My bet is that 90% of the migration effort will go into the GUI migration (MJ, can you comment, please?). We have to port from scriptaculous to jQuery and from CakePHP's MVC-implementation to PKP's (including smarty). To achieve re-use between PKP applications and between pages we'll have to "componentize" the GUI more than it currently is in L8X.
These are my ideas for the GUI architecture:
- Create L8X-specific GUI components and template fragments in WAL (e.g. citation editor component, re-use in all PKP applications)
- Create an L8X citation renderer template library
- One smarty template per citation style, including COinS
- Migrate COinS plugin to use COinS template fragment
AJAX Request Architecture
- We should probably think of an AJAX specific high-performance MVC controller architecture. This means to implement shortcuts in the request processing for AJAX requests wherever possible (performance bottleneck!)
- Both, the AJAX handler and the Page handler will be based on the same base classes but will extend them differently
- Make sure, there is no AJAX security bypass of course (maintain the single point of entry + common security infrastructure for all types of request)
Next Steps
- Thoroughly analyze GUI requirements for citation extraction and citation parsingedition/lookup
- Define integration points
- Specify GUI library components
Installation Requirements and Compatibility
- No new initial installation requirements
- Maintain PHP4 compatibility for initial installation
- New installation requirements (additional software, PHP>4) only for optional plug-ins
- Thorough documentation of additional installation / runtime environment requirements
- Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)
- Make sure that L8X standalone will continue working/improving by cleanly backporting/integrating migrated code to L8X (DRY!)
- Use standard UI technology to make sure that backport of new OMP GUI will be easier
- Comments have to follow Doxygen syntax
- See e.g. http://pkp.sfu.ca/cvs/cvsweb.cgi/ojs2/classes/article/Article.inc.php?rev=1.48;content-type=text%2Fplain for a standard code header
- Functions should at least include a general description as well as @param and @return tags as necessary.
Further Ideas (Attic)
- Don't kill L8X as a standalone application, integrate it with PKP WAL
- Package/brand/SEOize parser/lookup library separately for re-use in other document based OSS applications (ECM)
- Let users configure "content types" (document types) to improve parsing and reduce manual work for batches of similar documents
- extract styles from sample document
- extract sections from sample document
- let user attribute semantic information to styles and sections (e.g. first section = always contains author information)
- parse document (metadata, citations, structure) batch based on these specific user definitions
- Integrate more citation lookup services
- Additional XML schemas for export
- Additional file conversion based on plugins: XSLT, ICE, GD, ImageMagick, etc.
- Improve support for metadata schemas
- Use machine-learning approaches (data mining technology) to improve parser robustness (citations, document structure, metadata)
- Introduce a "batch processing mode" for citation parsing/lookup
- keep the application responsive while citation parsing is going on in the background
- do citation parsing/lookup during off-hours (e.g. every night)