Lemon8-XML Community Documentation

From PKP Wiki
Jump to: navigation, search

Five steps to an XML document

In this overview, we'll show you how to move through the steps involved in converting a scholarly article in MS-Word format to XML.

Pre-Editing and Format Conversion

The first thing you should do is ensure that your document is properly edited to have the format of a "typical" scholarly article: front matter such as title, author names, etc. followed by body matter with headings (distinguished by their formatting), followed by a list of references (numbered or unnumbered), appendices, etc. You can also include tables and images (figures) anywhere within the document text.

The exact layout is not crucial, but the more effort that is put into formatting your document initially, the better Lemon8-XML will be able to do its job.


Lemon8-XML can handle documents of any word-processing format (eg. Word .DOC, Word .XML, Rich Text .RTF, Wordperfect .WPD, DocBook, HTML, etc.); however, it works best with the OpenDocument format for a number of reasons. Proprietary document material such as Microsoft WordArt, Refman/EndNote OLE references, etc. may not be converted properly (or at all) due to technical and licensing reasons. Articles uploaded in other formats will automatically be converted.

Lemon8 quirks

As of this writing, the Lemon8 document parser has several quirks that may improve the accuracy of documents:

  • References should be double-spaced, especially if numbered with auto numbering
  • Main headers should be 14 point and bold
  • Sub headers should be 12 point and bold
  • You should make an arbitrary name for your first article section, even if one does not exist (e.g., call it "Introduction")
  • To be correctly detected, all in-text references must be in square brackets. (e.g., text matter.[1,2] More text,[3-5] and yet more text.[4])

Editing Document Metadata

Article metadata, or "front matter" as it is often called, may exist in a myriad of formats. As a result, it is exceptionally difficult to automatically extract metadata with a high degree of accuracy. Lemon8-XML will do its best to correctly identify and extract your document's metadata, but it will likely need some correction.

It is in your best interest to ensure that the article metadata is as complete and accurate as possible, as this is the information that will be used for finding your article in bibliographic indexes and repositories.

The metadata editor allows you to edit, correct, add, and remove metadata in a structured way that is quick and easy to do. You can edit the ordered list of authors, provide a list of appropriate affiliated institutions and contact information, provide keywords, standardized article IDs, and edit full-text information such as the abstract, conflicts of interest, etc.

For articles published using Open Journal Systems, the Lemon8-XML OJS plugin will automatically pre-populate the article metadata with the information contained within OJS during the article submission.

Ordering and Renaming Sections

Lemon8-XML will automatically detect the sections, tables, and figures within your document, and provides an overview of their order and hierarchy (in the case of sub-headings, etc). The section editor allows you to re-order and re-organize your document as you desire. eg. in the case that figures are embedded as appendices, you may reorder them to be within sections, or even upload new images to replace or supplement previous ones.

Although the functionality is limited to a very general preview for now, we are aiming to provide comprehensive section and table editing features in the future. For now, however, the word count and overview should give you an idea of how your article is structured, and allow you to ensure that all of the pertinent sections are in the right places.

Editing and Correcting Citations

As any editor or author knows, editing citations (references) to ensure that they're complete and correct is extremely tedious and time-consuming work. When your document is uploaded, the references are automatically detected, parsed, and compared with several online services to ensure their completeness and correctness.

For each, you will be shown a visual indication about the citation's completeness: a red 'X' indicates that it will have to be edited manually; a blue '!' indicates that manual editing may be required, and a green check-mark indicates that it was correctly parsed or corrected ("looked up"). At a glance, you will be able to determine which citations require editing, thus minimizing your work.

Each citation is passed through 4 parsing mechanisms, matching over 400 citation styles, and assigned an accuracy score. The Lemon8-XML administrator can set the threshold for "correctness" to adjust the sensitivity of the parsers. Likewise, each citation is matched against 2 online databases (PubMed and Crossref at present; more are planned shortly) to verify completeness. If a match is found, a correction score is assigned that shows how close the original citation was to the online index. Again, only those that meet the threshold are actually corrected.

When manually editing citations, the editor allows you to either fill in fields individually, or edit the complete citation and re-parse it as you like. If you feel the citation is correct and want to see if it can be enhanced by information from an online index, click "Lookup citation" to attempt an automatic correction.

You should fill out as many fields as possible, as accurately and completely as possible, to ensure both the highest quality of export from your document, as well as the best likelihood that your citation can be automatically looked up. Again, although this may take a little time, it will ensure the best quality for your article. We have great plans for a more dynamic, AJAX-based citation editor in the near future.

Citation tips

  • A large amount of grey literature, including some books and magazines, do not get automatically looked up
  • If your document is not correctly looked up, Search for it on PubMed or CrossRef. There is nothing more useful than a PMID or DOI to help fill in all the other information.

Previewing and Exporting the XML

The final step in editing your document is previewing it in HTML and PDF formats and ensuring that it renders as you would expect it to appear in an online journal or institutional archive.

This should be a good opportunity to double-check that the editing you've made on the article metadata, section ordering, and citations is all reflected correctly in the final preview. All of the previews are generated dynamically, so any iterative changes you make will be immediately reflected.

Once you are content that your article is complete, you can export it in fully-structured XML format. At the moment, only the NLM Journal Publishing DTD is supported, although we are planning a number of alternate export formats (including ODT) in the future. Lemon8-XML has been developed in close relation with Pubmed Central and strives to meet their stringent standards for archival quality standards.