Difference between revisions of "Lemon8-XML Roadmap"

From PKP Wiki
Jump to: navigation, search
(Adapted L8X roadmap to reflect re-priorization of OMP development.)
 
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
=Development Roadmap=
 
=Development Roadmap=
  
== Milestone 1.0 ('''Feb 2009''') ==
+
== Q1 2009 ==
  
 
This is an initial release of the 1.x line, to be shortly deprecated into maintenance mode; we will still be tracking and addressing major / security-related bugs, and you are encouraged to browse [http://pkp.sfu.ca/bugzilla our Bugzilla database] fully.
 
This is an initial release of the 1.x line, to be shortly deprecated into maintenance mode; we will still be tracking and addressing major / security-related bugs, and you are encouraged to browse [http://pkp.sfu.ca/bugzilla our Bugzilla database] fully.
  
== Milestone 2.0 ('''Q4 2009''') ==
+
== Q3 2009 ==
  
This release will be a major refactor of the L8X code onto the [http://pkp.sfu.ca/wiki/index.php/PKP_WAL_Overview PKP WAL] common codebase.  The major theme of this release will be close integration with OJS and early integration with OMP, as well as a new data model based directly on the ODT format.  Users can expect a major change to bring the UI in line with the rest of the PKP suite, while keeping much of the dynamic interface in 1.x.
+
As of Q3 2009, development on L8X as a stand-alone application has been halted in favor of a refactoring of the L8X functionality into the [[PKP WAL Roadmap|PKP Web Application Library]].  The rationale for this approach is to provide direct integration with OJS and OCS, as well as functionality for the initial relase of OMP.  Users can expect a major change to bring the UI in line with the rest of the PKP suite, while keeping much of the dynamic interface in 1.x.
  
=== Major Areas of Development ===  
+
== Q4 2009/Q1 2010 ==
 +
* Port all L8X's citation parsing/lookup/editing functionality to OJS
 +
** citation lookup filters
 +
** citation parsers
 +
* specify and develop supporting infra-structure
 +
** meta-data framework
 +
** filter framework
  
This list isn't all-inclusive, but should give a good snapshot of current areas of interest.
+
== Q2 2010 ==
 +
* Specify and implement citation assistant user interface
 +
* Implement citation output use-cases
 +
** addition of citation data in XML export (NLM/PubMed, Synergies)
 +
** Allow readers to view citations in citation output formats (APA, MLA, Vancouver)
 +
* Initial release of the citation markup assistant in OJS
  
'''Framework'''
+
== Not yet scheduled ==
* Port application to PKP web application library
+
=== Originally Scheduled for 2010 (Pushed back in favor of OMP development) ===
* User self-signup
+
* Additional citation output use cases:
* Add form data validation
+
** addition of citation data in XML export (e.g. for PubMed, Synergies, and CrossRef)
* Better error/warning messages (eg. citations, required fields, etc.)
+
** generation of COinS (Context Object in Span) from citations, including Zotero integration
* Full I18n and L10n to French, Spanish
+
** Allow readers to view citations in all existing citation output formats (EndNote?, RefWorks? integration)
* Refactor to add plugin classes: export, metadata schema
+
* Add document parsing/editing capability to OJS
 +
** automatic citation data extraction from ODT in submission process
 +
** add section parser / editor to editorial process (generate and edit full semantic XML structure) in OJS
 +
* Implement XML-to-PDF and XML-to-HTML rendering
 +
* Add document conversion capability to OJS
 +
** automatic document conversion during submission process (*.*)->(*.odt) to allow automatic extraction for more formats
 +
* Add L8X's meta-data extraction to OJS
 +
** automatic metadata extraction from ODT in submission process
 +
* Market migrated parsing/lookup code as a standalone library
  
'''Document Parser'''
+
=== Additional Use Cases ===
* Investigate alternative DOM-marking approach to retain ODT and display unparsed content
+
* Copyediting: Author match between the name used in body of the text and name used in the citations, as per spelling and reference link between text and bibliography (author with no reference; reference with no link to body of the text);
 +
* Copyediting: Quotation checking, where a quote in the body of the text is checked against the web for accuracy, with candidates proposed for comparison and correction, as well as reference checking;
 +
* Plagiarism: Random check of not-quoted bits of text for matches and possible plagiarism.
  
'''Metadata Editor'''
+
=== Usability ===
* Enable multiple article ID
+
* Let users "lock" citations once they are in their final state. Locked citations won't be overwritten by parser or lookup results.
* Add primary author selector / role-aff association
+
* Introduce a "batch processing mode" for citation parsing/lookup
* Add acknowledgements, reviewers, review dates, etc.
+
** keep the application responsive while citation parsing is going on in the background
* Create markup for abstract sections in XHTML
+
** do citation parsing/lookup during off-hours (e.g. every night)
* Enable collapsable sections (authors, affiliations, etc)
+
  
'''Section Editor'''
+
=== Document Parsing ===
* Change section heading level
+
* Let users configure "content types" (document types) to improve parsing and reduce manual work for batches of similar documents
* Add/paste/edit XHTML tables and sections (TinyMCE)
+
# extract styles from sample document
 +
# extract sections from sample document
 +
# let user attribute semantic information to styles and sections (e.g. first section = always contains author information)
 +
# parse document (metadata, citations, structure) batch based on these specific user definitions
 +
* Additional file conversion based on plugins: XSLT, ICE, GD, ImageMagick, etc.
 +
* Integrate [http://viewer.opencalais.com/ OpenCalais] service for metadata identification and extraction.
 +
** using OpenCalais on the full-text of an article is less accurate, though it does a pretty good job of finding entities
 +
** use L8X to detect the front, body, and back matter of a document, then:
 +
**# send the front matter to Calais to be broken into metadata (more accurately than we do now)
 +
**# send the back matter to the L8X citation handling and associated parse/lookup services
 +
**# send the body to eg. Lucene for full-text indexing and/or Calais for automatic keyword assignment (this works well, eg. with medical terms in MeSH, etc.)
  
'''Citation Editor'''
+
=== Citation Parsing ===
* UI to enable/disable parsers/lookup services & dynamic progress
+
* Use machine-learning approaches (e.g. data mining/classifiers) to improve parser results
  
'''XML Export'''
+
=== Citation Lookup ===
* XML pre-validation & libxml notices
+
* Integrate more citation lookup services: OAIster, CiteSeer, Amazon, LibraryThing, OpenLibrary, SRU/SRW, Z39.50
* NLM: metadata generation w/full aff linking
+
* generic OAI-DC: maybe with a local Harvester as meta-data cache and as a search interface?
* NLM: Improve figure/abstract/list transformation
+
* Port source adapters from Umlaut project, see http://umlaut.rubyforge.org/.
* NLM: Add figures/tables to xref detection
+
* NLM: Integrate feedback from Open Medicine
+
* Integrate Pubmed Central [http://www.pubmedcentral.nih.gov/about/PMC_Utilities.html Style Checker / Article Previewer]
+
  
'''Personalization'''
+
=== Citation Output ===
* Allow upload of custom XSL/CSS for preview/export
+
* Implement citation output plug-ins for Chicago Manual of Style, American Medical Association, American Sociological Association and Council of Science Editors (see mails to pkp-support from Mark and John, 20/10/2009)
* Set default metadata values (eg. copyright statement)
+
* Auto-COinS plugin (WAL): generate COinS in HTML/abstract view for marked references in textarea
 +
* Apply reading tools to references within articles (provide additional information about cited works in RT sidebar)
 +
 
 +
=== Document Export ===
 +
* Additional XML schemas for export
 +
 
 +
=== Backporting to other Applications ===
 +
* Extend L8X functionality to OCS and OMP
 +
* Add citation support to Harvester
 +
** If a metadata element in Harvester looks like a citation, parse the citation and render it in HTML with COinS
 +
** use Harvester to retrieve additional citation meta-data that will be attached to the meta-data we already retrieve (i.e. every single harvester record may contain or point to additional citation records)
 +
 
 +
=Additional Requirements=
 +
* No new initial installation requirements
 +
* Maintain PHP4 compatibility for initial installation, new installation requirements (additional software, PHP>4) only for optional plug-ins - a notable example being the citation editor/parser/lookup which requires at least PHP5.0
 +
* Thorough documentation of additional installation / runtime environment requirements
 +
* Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)
 +
* Closely integrate with OMP to make sure that the GUI components will work in OMP without adaptation
 +
* All contributions should be fully unit-test covered
 +
* All workflows should be fully web-test covered

Latest revision as of 17:02, 7 September 2010

Development Roadmap

Q1 2009

This is an initial release of the 1.x line, to be shortly deprecated into maintenance mode; we will still be tracking and addressing major / security-related bugs, and you are encouraged to browse our Bugzilla database fully.

Q3 2009

As of Q3 2009, development on L8X as a stand-alone application has been halted in favor of a refactoring of the L8X functionality into the PKP Web Application Library. The rationale for this approach is to provide direct integration with OJS and OCS, as well as functionality for the initial relase of OMP. Users can expect a major change to bring the UI in line with the rest of the PKP suite, while keeping much of the dynamic interface in 1.x.

Q4 2009/Q1 2010

  • Port all L8X's citation parsing/lookup/editing functionality to OJS
    • citation lookup filters
    • citation parsers
  • specify and develop supporting infra-structure
    • meta-data framework
    • filter framework

Q2 2010

  • Specify and implement citation assistant user interface
  • Implement citation output use-cases
    • addition of citation data in XML export (NLM/PubMed, Synergies)
    • Allow readers to view citations in citation output formats (APA, MLA, Vancouver)
  • Initial release of the citation markup assistant in OJS

Not yet scheduled

Originally Scheduled for 2010 (Pushed back in favor of OMP development)

  • Additional citation output use cases:
    • addition of citation data in XML export (e.g. for PubMed, Synergies, and CrossRef)
    • generation of COinS (Context Object in Span) from citations, including Zotero integration
    • Allow readers to view citations in all existing citation output formats (EndNote?, RefWorks? integration)
  • Add document parsing/editing capability to OJS
    • automatic citation data extraction from ODT in submission process
    • add section parser / editor to editorial process (generate and edit full semantic XML structure) in OJS
  • Implement XML-to-PDF and XML-to-HTML rendering
  • Add document conversion capability to OJS
    • automatic document conversion during submission process (*.*)->(*.odt) to allow automatic extraction for more formats
  • Add L8X's meta-data extraction to OJS
    • automatic metadata extraction from ODT in submission process
  • Market migrated parsing/lookup code as a standalone library

Additional Use Cases

  • Copyediting: Author match between the name used in body of the text and name used in the citations, as per spelling and reference link between text and bibliography (author with no reference; reference with no link to body of the text);
  • Copyediting: Quotation checking, where a quote in the body of the text is checked against the web for accuracy, with candidates proposed for comparison and correction, as well as reference checking;
  • Plagiarism: Random check of not-quoted bits of text for matches and possible plagiarism.

Usability

  • Let users "lock" citations once they are in their final state. Locked citations won't be overwritten by parser or lookup results.
  • Introduce a "batch processing mode" for citation parsing/lookup
    • keep the application responsive while citation parsing is going on in the background
    • do citation parsing/lookup during off-hours (e.g. every night)

Document Parsing

  • Let users configure "content types" (document types) to improve parsing and reduce manual work for batches of similar documents
  1. extract styles from sample document
  2. extract sections from sample document
  3. let user attribute semantic information to styles and sections (e.g. first section = always contains author information)
  4. parse document (metadata, citations, structure) batch based on these specific user definitions
  • Additional file conversion based on plugins: XSLT, ICE, GD, ImageMagick, etc.
  • Integrate OpenCalais service for metadata identification and extraction.
    • using OpenCalais on the full-text of an article is less accurate, though it does a pretty good job of finding entities
    • use L8X to detect the front, body, and back matter of a document, then:
      1. send the front matter to Calais to be broken into metadata (more accurately than we do now)
      2. send the back matter to the L8X citation handling and associated parse/lookup services
      3. send the body to eg. Lucene for full-text indexing and/or Calais for automatic keyword assignment (this works well, eg. with medical terms in MeSH, etc.)

Citation Parsing

  • Use machine-learning approaches (e.g. data mining/classifiers) to improve parser results

Citation Lookup

  • Integrate more citation lookup services: OAIster, CiteSeer, Amazon, LibraryThing, OpenLibrary, SRU/SRW, Z39.50
  • generic OAI-DC: maybe with a local Harvester as meta-data cache and as a search interface?
  • Port source adapters from Umlaut project, see http://umlaut.rubyforge.org/.

Citation Output

  • Implement citation output plug-ins for Chicago Manual of Style, American Medical Association, American Sociological Association and Council of Science Editors (see mails to pkp-support from Mark and John, 20/10/2009)
  • Auto-COinS plugin (WAL): generate COinS in HTML/abstract view for marked references in textarea
  • Apply reading tools to references within articles (provide additional information about cited works in RT sidebar)

Document Export

  • Additional XML schemas for export

Backporting to other Applications

  • Extend L8X functionality to OCS and OMP
  • Add citation support to Harvester
    • If a metadata element in Harvester looks like a citation, parse the citation and render it in HTML with COinS
    • use Harvester to retrieve additional citation meta-data that will be attached to the meta-data we already retrieve (i.e. every single harvester record may contain or point to additional citation records)

Additional Requirements

  • No new initial installation requirements
  • Maintain PHP4 compatibility for initial installation, new installation requirements (additional software, PHP>4) only for optional plug-ins - a notable example being the citation editor/parser/lookup which requires at least PHP5.0
  • Thorough documentation of additional installation / runtime environment requirements
  • Make sure that L8X functionality will be easily portable to other PKP products (OMP, OCS, Harvester)
  • Closely integrate with OMP to make sure that the GUI components will work in OMP without adaptation
  • All contributions should be fully unit-test covered
  • All workflows should be fully web-test covered