Talk:Lemon8-XML Roadmap
From PKP Wiki
Revision as of 18:47, 7 October 2009 by Jerico.dev (Talk | contribs)
Contents |
Priorities (descending)
- Add L8X's citation parsing, lookup/correction and editing functionality to OJS
- Add L8X's metadata extraction to OJS
- Add document parsing/editing capability to OJS
- Add document conversion capability to OJS
- Don't kill L8X as a standalone application, integrate it with PKP WAL
- Package/brand/SEOize parser/lookup library separately for re-use in other document based OSS applications (ECM)
Proposed Features / Integration Points (by descending priority)
- automatic citation lookup and editing in submission process ("citation box")
- allow readers to view citations in multiple citation formats (including Zotero integration)
- addition of citation data in XML export (e.g. for PubMed, Synergies, and CrossRef)
- generation of COinS (Context Object in Span) from citations
- add citation editing and lookup to editorial process
- automatic citation data extraction from ODT in submission process
- automatic metadata extraction from ODT in submission process
- add section parser / editor to editorial process (generate and edit full semantic XML structure) in OJS
- add automatic document conversion to submission process * -> ODT to allow automatic extraction for more formats
Proposed Architecture (citation integration only)
Citation Backend Services Library
- Move L8X citation parser components to pkp/classes/citation/CitationParser*.inc.php
- Move L8X citation lookup components to pkp/classes/citation/CitationLookup*.inc.php
- Specific implementations extend a base object that enforces the API contract (template pattern), interfaces are no-go in PHP4
- Make sure that the API can be used by all PKP applications
- Use migrated code in L8X standalone
- Make sure that the components can be integrated/extended for metadata/section parsers/editors later
Citation DAO Library
- We can use the usual PKP DAO pattern for all citation data persistence requirements
Citation GUI Pages
- We might need an extra step in the submission process and an extra page in the editorial process for citation editing/lookup
- Pages have to be application specific so we cannot usually share them between applications.
- We'll however try to move as much as possible to the GUI components library for re-use, the page will only consist of a very high-level outline (GUI components library)
- Apart from citation editing I don't think we'll invent new pages, it's more about integrating new components into existing pages.
Citation GUI Components Library
L8X editing capability is a lot more demanding than anything I know so far in OCS/OJS. I didn't look at OMP so far (Alec, can you comment?)
Anyway... I think we'll have to think thoroughly about the GUI component library before we start implementing it.
My bet is that 90% of the migration effort will go into the GUI migration (MJ, can you comment, please?). We have to port from scriptaculous to jQuery and from CakePHP's MVC-implementation to PKP's (including smarty). To achieve re-use between PKP applications and between pages we'll have to "componentize" the GUI more than it currently is in L8X.
These are my ideas for the GUI architecture:
- Create a low-level AJAX-based GUI component library in WAL (re-usable in all PKP applications)
- Create re-usable high-level template fragments in WAL (e.g. citation editor component, re-use in all PKP applications)
- Create an L8X citation renderer template library
- One smarty template per citation style, including COinS
- Migrate COinS plugin to use COinS template fragment
AJAX Request Architecture
- We should probably think of an AJAX specific high-performance MVC controller architecture. This means to implement shortcuts in the request processing for AJAX requests wherever possible (performance bottleneck!)
- Both, the AJAX handler and the Page handler will be based on the same base classes but will extend them differently
- Make sure, there is no AJAX security bypass of course (maintain the single point of entry + common security infrastructure for all types of request)
Next Steps
- Thoroughly analyze GUI requirements
- Define integration points
- Specify GUI library components
Installation Requirements and Compatibility
- No new initial installation requirements
- Maintain PHP4 compatibility for initial installation
- New installation requirements (additional software, PHP>4) only for optional plug-ins
- Thorough documentation of additional installation / runtime environment requirements
- Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)
- Make sure that L8X standalone will continue working/improving by cleanly backporting/integrating migrated code to L8X (DRY!)
- Use standard UI technology to make sure that backport of new OMP GUI will be easier
- Comments have to follow Doxygen syntax (Alec, which tags are required?)
Further Ideas (Attic)
- Let users configure "content types" (document types) to improve parsing and reduce manual work for batches of similar documents
- extract styles from sample document
- extract sections from sample document
- let user attribute semantic information to styles and sections (e.g. first section = always contains author information)
- parse document (metadata, citations, structure) batch based on these specific user definitions
- Integrate more citation lookup services
- Additional XML schemas for export
- Additional file conversion based on plugins: XSLT, ICE, GD, ImageMagick, etc.
- Improve support for metadata schemas
- Use machine-learning approaches (data mining technology) to improve parser robustness (citations, document structure, metadata)
- Introduce a "batch processing mode" for citation parsing/lookup
- keep the application responsive while citation parsing is going on in the background
- do citation parsing/lookup during off-hours (e.g. every night)