Lemon8-XML is a web-based application designed to make it easier for non-technical editors and authors to convert scholarly papers from typical word-processor editing formats such as MS-Word .DOC and OpenOffice .ODT, into structured publishing layout formats such as the open, industry-standard NLM Journal Publishing XML format.

To use Lemon8-XML, you don't need to understand XML, all you need is a little time and a general understanding of how scholarly articles are structured. In general, this means a document with:

  1. some information about the article and authors at the top, and usually an abstract
  2. several sections, often titled "introduction", "methods", "results", etc.
  3. optional figures or tables, either in-text or as appendices
  4. a list of references or citations in a standardized format (eg. MLA, APA, etc.)


Lemon8-XML is being developed by the Public Knowledge Project, as part of its efforts to ensure that the latest web-based technologies are directed toward improving not only the quality but the global and public reach of scholarly communication. Lemon8-XML is a stand-alone system that will serve publishing processes more generally, as well as complement the workflow of Open Journal Systems.

For more information, have a look at the FAQ or participate in the Lemon8-XML discussion forum.

Lemon8-XML was inspired by early work undertaken for the Journal of Medical Internet Research, and is currently sponsored by the Canada Foundation for Innovation, as part of the Synergies Initiative.


Lemon8-XML has been developed using 100% free and Open Source software and technology.

  Lemon8-XML is built on the flexible CakePHP framework using PHP5.
  Document conversion is done using Docvert and Google Docs.
Actual document format conversion is done using OpenOffice.org.
The citation parser incorporates the ParaCite, FreeCite, and ParsCit parsing services.
PDF preview rendering is done with XSL-FO via Apache FOP.
  Book citation correction uses data from the ISBNdb and WorldCat databases.
  Journal citation correction uses data from in the PubMed and CrossRef databases.