Lemon8 Updates, Decembertime

Forum for PKP's Lemon8-XML.

Moderators: jmacgreg, mj

Forum rules
The Public Knowledge Project Support Forum is moving to http://forum.pkp.sfu.ca

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.
Site Admin
Posts: 304
Joined: Fri Mar 26, 2004 9:32 am
Location: Toronto, Canada

Lemon8 Updates, Decembertime

Postby mj » Sun Dec 02, 2007 7:46 pm

Hello everyone,

As always, sincere apologies for the lack of updates on both the mailing list and the forum. Just because it's been quiet on the email front doesn't mean we haven't been working hard.

For those of you new to the list, welcome! If, for some reason, you don't have credentials to log in to http://lemon8.org, please let me know. I've noticed that a few people have commented on the wiki, although I'd openly appreciate more feedback; please share your opinions as we move ahead.

== What's Changed ==

Lemon8-XML (or L8X as we've come to call it) is almost constantly in flux, although more recently it's been refinement rather than overhaul. You may have noticed that the "login" mechanism has changed from a site-based to a session-based profile. Lemon8-XML now uses strong ACL user management, and once we hit public beta, each user will have their own distinct profile.

You may have also noticed the "english | español | français" links in the top menu. Lemon8-XML has been partially (soon to be fully) internationalized. At the moment, we have sample (sometimes inaccurate, often funny) translations for Spanish and French -- however, if you are able to contribute by writing a proper translation, we would be very interested in hearing from you. Lemon8-XML is designed to be a language-agnostic application.

== Documentaion ==

Every developer hates to write documentation -- however, it's one of the most important aspects of any application. You'll hopefully find a more complete set of information now at: http://www.lemon8.org/pages/docs

In particular, "Five Steps to an XML Document" (http://www.lemon8.org/pages/docs#5steps) should be much easier to follow (no more 'lorem ipsum'), and the recent "Technology Used" (http://www.lemon8.org/pages/docs#technology) should give an indication of the kind and breadth of technology that is incorporated into Lemon8-XML. Pretty logos abound.

== Docvert ==

Yes, it's true -- the important document conversion aspect of Lemon8 is finally online -- the latest version of Matthew Cruickshank's Docvert and OpenOffice.org are now built-in to Lemon8-XML. You can upload documents in almost any file type and it will be automatically converted to a format that Lemon8-XML can understand. There are still limitations with proprietary tools such as Refman/Endnote embedded OLE, but for the vast majority of documents, Lemon8-XML will take what you throw at it.

Just as a teaser/preview, Docvert is experimenting with providing a web URL as a document source -- this means that, in the situation that you don't have the original .DOC file available for conversion, you can point Lemon8-XML at an online version of the article to process. The technology is still experimental, but it shows great promise. More to come...

== XML Export ==

One of the (if not the most) powerful aspects of Lemon8-XML is its ability to export fully-structured XML adhering to a specific DTD. For many of our partner journals, this is the NLM Journal Publishing DTD, which is a prerequisite for archiving in Pubmed Central. We have gathered an incredible amount of detailed information from NLM based on our partnership with Open Medicine. Although the current XML export is somewhat rudimentary, we have a long-list of details that will ensure that XML export from Lemon8-XML will adhere to the stringent standards of Pubmed Central. For those of you interested, hang on, improvements to the XML export details are soon to follow.

== Citation Parser ==

Wow, where to begin. The citation parsing mechanism has been completely overhauled to incorporate the excellent ParaTools parsing algorithms, in addition to the tested regular-expression parsing module. All citations are now parsed with a confidence score as well as a lookup score, and we are confident (with over 400 citation formats) that the quality has been vastly improved as well. At present, the excellent Pubmed eUtils and the ISBNdb web services are used for lookup, with more (such as WorldCat and CrossRef) on the way. If you know of other services that could be used as a lookup source, please send them along as well.

If you feel that your document citations are consistently not being parsed properly, please attempt to document the citation styles that are causing problems and send us the details. While citation matching currently has a high accuracy rate, we are always looking to improve both the parsing and lookup services. Let us know what you think.

== What's Next ==

The next major work on Lemon8-XML will involve a total and complete refactor of the document parsing algorithms. Not to worry, the current quality set will still be maintained, but there are a number of newer articles which have given us great data to work with. If your documents have parsed poorly to date -- don't worry! We are aiming to have a far more robust document parser sometime in the new year.

Once the document parser has been improved to work for our new partners (First Monday, this is for you), the main goal will be to improve the quality of export and begin building new export schemas such as the oft-requested ODT and LaTeX.

As always, we want to encourage your feedback and comments during the development progress. Lemon8-XML is nearing the end of its "alpha" stage, and as we near public "beta", we want to ensure that nothing major is overlooked. If you have a serious concern or consideration that you'd like addressed, *please* contact us at pkp-support@sfu.ca.

Best wishes, and looking forward to a groundbreaking new year,

Return to “Lemon8-XML”

Who is online

Users browsing this forum: No registered users and 1 guest