Revisiting Document Conversion

Are you responsible for making OJS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
The Public Knowledge Project Support Forum is moving to

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.
Posts: 45
Joined: Wed Jul 18, 2007 8:03 am
Location: Troy, NY

Revisiting Document Conversion

Postby codonnell » Mon Aug 13, 2007 9:28 am

So I've searched the forums for "document conversion" which seems to be how folks are talking about taking submissions and automatically converting them from .DOC into other formats. While I really do appreciate the technical complexity of this problem (I used to write file parsers for 3D formats), every test user says it's really important. We've also gotten pretty concerned about making submitted reviews anonymous. I don't mind asking submitters to ensure their own anonymity, but since reviewers are doing a service, I hate asking them to jump through hoops.

So, in addition to document conversion, what about PDF field stripping as well? I saw a test reviewer upload a PDF with the author field filled out. I actually don't mind asking people to submit in a non DOC format, but I wonder if there is a good way to simply ensure that files don't have those extra fields filled out.

Thoughts from anyone? How close is OJS to some sort of file conversion system?

Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm

Re: Revisiting Document Conversion

Postby asmecher » Mon Aug 13, 2007 10:51 am

Hi codonnell,

Watch for developments in the Lemon8 project; it'll allow OJS to use NLM Journal Article XML as a back-end instead of PDF or Word documents. This will allow for a lot more in the way of rich interactions between the system and the documents it contains.

Currently, OJS imposes no requirements for file formats on submission and review documents. An author can submit .doc, .rtf, .odt, .abw, .pdf, or anything else, and reviewers are likewise not constrained with their responses. While this flexibility is good for users, it also makes it difficult to ensure automatically that nothing compromising is being submitted. Each format has its own peculiarities and metadata storage system. That's why it's currently left to the Reviewer to ensure that their review is blind. Comment-based reviews, in which reviewers enter their comments directly into the text box on the review page, aren't affected by hidden metadata complications.

Alec Smecher
Public Knowledge Project Team

Site Admin
Posts: 304
Joined: Fri Mar 26, 2004 9:32 am
Location: Toronto, Canada

Re: Revisiting Document Conversion

Postby mj » Mon Aug 13, 2007 12:48 pm

Hi codonnell,

As Alec mentions, converting documents from MS-Word .DOC format to other formats (via the NLM Journal Publishing DTD is precisely what Lemon8-XML is designed to achieve); one of the potential "benefits", although we haven't discussed it too much yet, is the ability for it to strip identifying author information from an article, although this certainly isn't its primary purpose.

We are starting a mailing list for users interested in following Lemon8-XML developments, more information on subscribing is available on the Lemon8-XML page. As well, we have a Lemon8-XML forum specifically for discussion on these areas, and I'd be more than happy to start up a new thread there, so others can contribute and benefit from it.

You should see more information coming out about Lemon8-XML, including a private beta test, in the next week or so.

Return to “OJS Technical Support”

Who is online

Users browsing this forum: No registered users and 1 guest