OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



digital preservation of OJS content

OJS development discussion, enhancement requests, third-party patches and plug-ins.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
Developer Resources:

Documentation: The OJS Technical Reference and the OJS API Reference are both available from the OJS Documentation page.

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome, but if you have a workflow or usability question you should probably post to the OJS Editorial Support and Discussion subforum; if you have a technical support question, try the OJS Technical Support subforum.

digital preservation of OJS content

Postby kshawkin » Wed Mar 23, 2011 7:38 am

The University of Michigan Library has been publishing journals online for over ten years using DLXS, our home-grown digital library platform. We have considered using OJS for our digital-text (not page-image) publications but have not done so for a number of reasons. One is a desire to avoid a proliferation of platforms. Another is the resource investment that would be required to migrate content.

But the third concern is more abstract, and I'm looking for others' thoughts on it in case I misunderstand OJS's capabilities. DLXS requires that we normalize content (into XML) before publication, requiring an up-front investment in content conversion for ease of long-term preservation. OJS, on the other hand, relies on outside tools to convert author manuscripts (in Word etc.) into galleys (usually HTML and PDF), which leads to great variability in these HTML and PDF files depending on who does the work, how internally consistent they are, and what software they use. I have long worried about ending up with a mess of content that will be difficult to preserve and migrate in the long run. While I am sympathetic to approach of John Maxwell (and others) that if you have HTML, you're pretty safe because there are so many tools that can read and manipulate it, I still have the feeling that some day browsers will stop rendering bad HTML in the way that today's (and yesterday's) browsers did, leaving us with a situation where content fidelity will be difficult to maintain without painstaking cleanup of markup. The same problem essentially exists for PDFs except that they are almost impossible to fix by hand the way you can with HTML.

I am curious to hear others' thoughts on this issue.
Last edited by kshawkin on Mon Apr 18, 2011 1:19 pm, edited 1 time in total.
kshawkin
 
Posts: 6
Joined: Wed Mar 23, 2011 7:26 am

Re: digital preservation of OJS content

Postby jmacgreg » Sat Apr 02, 2011 9:23 am

Hi kshawkin,

Apologies for the delay in approving your post and getting back to you -- it's been a busy week. I can give you my thoughts as a PKP team-member and a former employee at UNB Libraries, which publishes in OJS using XML.

OJS can actually handle XML as a galley file-type in a number of different ways. There's an XML Galleys plugin that will convert XML galleys into HTML on the fly if you also provide it with an XSL stylesheet (it comes with an NLM->XML XSL by default); it can even manage XML->PDF transformation on the fly, if your server has an XML formatter such as FOP installed. By using this plugin, you simply upload the one XML file, and it's automatically available in HTML and (optionally) PDF -- these files are generated once, the first time someone clicks on the galley link in the ToC, and are stored/accessed in the journal's galleys directories subsequently.

At UNB, they are doing things a little differently: they generate archival-quality XML for all articles, but subsequently batch-run their own XSL transformation on all articles on the server but outside of OJS; and then use the XML import plugin to pull all of these articles into OJS at once.

If you have any further questions about this, please let me know.

Cheers,
James
jmacgreg
 
Posts: 4190
Joined: Tue Feb 14, 2006 10:50 am

Re: digital preservation of OJS content

Postby lwang » Fri Apr 15, 2011 1:14 pm

Hi James,

I am reading your post about the XML Galleys plugin. I try to use your test server to see how it works. Not sure where I should upload the xml file, then it generate the pdf file. I am interested in how to generate pdf file from xml. Is this what you meant in your note?

Thanks!

Ling
lwang
 
Posts: 45
Joined: Mon Jan 12, 2009 1:03 pm

Re: digital preservation of OJS content

Postby kshawkin » Mon Apr 18, 2011 1:25 pm

I can't find the "XML Galley" plugin -- could you point me to it? Everything I find relating to the NLM DTDs and XML importing and exporting refers to metadata, not full content.
kshawkin
 
Posts: 6
Joined: Wed Mar 23, 2011 7:26 am

Re: digital preservation of OJS content

Postby lwang » Tue Apr 19, 2011 7:08 am

It is under the generic plugins ...

Ling
lwang
 
Posts: 45
Joined: Mon Jan 12, 2009 1:03 pm

Re: digital preservation of OJS content

Postby jmacgreg » Wed Apr 20, 2011 4:16 pm

Hi Ling,

The process is pretty "easy", with a few caveats.

1. XML files can be converted to PDF and/or HTML by using an appropriate XSLT file. By default, OJS comes with an XSLT file that is compatible with NLM 2.3 XML, a fairly popular XML journal publishing format. If you want to use another XML type, you will have to either download or create an XSLT file appropriate to that XML type, and then upload that XSLT to OJS from Journal Management -> System Plugins -> Generic Plugins -> XML Galley Plugin, where it says "Custom XSL Stylesheet". Note that you will need a different XSLT for each conversion to HTML and PDF; and that currently, the XML Galley plugin only allows for one custom stylesheet to be uploaded, which means that if you are not using NLM XML, you can only convert to PDF or HTML, not both.

2. XML->PDF conversion also requires an external application, installed on the server, to work: a Formatting Objects (FO) processor. I'm not sure that we've installed any such processor on our test install server, so you may be out of luck with respect to generating PDFs on that particular install.

With those caveats out of the way: to convert XML to HTML or PDF, all you have to do is enable and configure the XML Galley plugin; and then upload XML files as your galleys during submission editing. They will be converted to the appropriate file type the very first time they are viewed online, and the converted file will be stored in OJS for future access.

If you have any further questions, please let me know!

Cheers,
James
jmacgreg
 
Posts: 4190
Joined: Tue Feb 14, 2006 10:50 am

Re: digital preservation of OJS content

Postby kshawkin » Thu May 26, 2011 5:15 pm

I have looked again through the list of plugins at viewforum.php?f=28 and can't find one called anything like "XML galley". I have found mention of an "xmlGalleys" plugin elsewhere, so I know you're not making this up. I feel incredibly dense for not being able to find it.
kshawkin
 
Posts: 6
Joined: Wed Mar 23, 2011 7:26 am

Re: digital preservation of OJS content

Postby ramon » Fri May 27, 2011 5:41 am

Dear Kshawkin,

In OJS 2.3.4, this plugin is located under Generic Plugins:
http://your.domain.com/index.php/journa ... ns/generic

Login as Administrator or Journal Manager and access a hosted journal.
Then, under the Administration options, where Configuration, Language and Emails links are, you will see the link to the System Plugins.
Click there and you will see a list of categories.
Click on Generic Plugins and search the list.

You will definitely need to install a Java application on your server for this to plugin to work.
I've never used it myself. I still need to test it.
ramon
 
Posts: 940
Joined: Wed Oct 15, 2003 6:15 am
Location: Brasí­lia/DF - Brasil

Re: digital preservation of OJS content

Postby jmacgreg » Tue May 31, 2011 7:38 am

Hi all,

In addition to Ramón's advice, I just wanted to note that you don't strictly need a Java/external application for this plugin to work -- you only need one if you want to convert XML to PDF (eg. FOP), or if your version of PHP doesn't have libxslt/Sablotron support compiled in it (ie. you may need Xalan on your server).

Cheers,
James
jmacgreg
 
Posts: 4190
Joined: Tue Feb 14, 2006 10:50 am

Re: digital preservation of OJS content

Postby BHD » Fri Aug 05, 2011 6:40 am

I don't follow OJS that much, but I wonder if the current approach might be in need of a rethink.

The only way to get high-quality PDF from open source tools is to compile them using pdftex.

There is a much more widely-used XML document format (OpenDocument) that includes a wider range of robust conversion tools (notably, OpenOffice/LibreOffice, which can be run headless, and writer2latex/writer2xhtml).

So what about a workflow where user uploads, say, a DOC or DOCX file, OJS converts it using OOo headless to both clean XHTML and LaTeX, and then you use pdflatex to convert to PDF?

Luatex or xetex gives you unicode and easy access to professional fonts, so if you just packaged it in a way that made it easy for journals to choose different output styles, you get much better results, and an XML source format that's should be good for preservation as well.
BHD
 
Posts: 2
Joined: Wed Mar 19, 2008 5:57 am

Re: digital preservation of OJS content

Postby kshawkin » Fri Aug 05, 2011 8:33 am

My concern is that if the DOC or DOCX files are inconsistent in structure and formatting, the XHTML generated by converting to OpenDocument format (using OpenOffice.org/LibreOffice in headless mode) will also be quite inconsistent and not really very preservable in the long run. That is, while OpenDocument and Office Open XML are both standards, both are very messy and not especially usable outside of OpenOffice.org, LibreOffice, and Microsoft Word.
kshawkin
 
Posts: 6
Joined: Wed Mar 23, 2011 7:26 am

Re: digital preservation of OJS content

Postby jmacgreg » Tue Aug 09, 2011 3:28 pm

Hi folks,

I'll add a quick comment to kshawkins valid concerns. We had actually developed a standalone Word/OO.O->NLM XML conversion tool (see http://pkp.sfu.ca/lemon8), that IIRC used a headless OO.o instance, or Google Docs, to help with the conversion, which was not perfect. We're not actively developing Lemon8 any more, but we are planning on taking various components of it and adding them to our PKP library so that they are available to all our applications. You can find a rough roadmap for that here. So it's part of the plan to provide these tools in an integrated fashion down the road; but we'll acknowledge right off the bat that doing so in a stable, scalable, consistent manner is going to be difficult, and will likely rely on a certain amount of diligence from eg. journal managers.

Cheers,
James
jmacgreg
 
Posts: 4190
Joined: Tue Feb 14, 2006 10:50 am

Re: digital preservation of OJS content

Postby sttis » Mon Feb 10, 2014 9:28 pm

Hi,

I've got some sample NLM DTD tagged XML and I want to be able to use that to generate a HTML and PDF galley. We're on OJS v 2.4.2. We've enabled the XML Galley Plugin and installed FOP on the server. We're using PHP 5.0.0+. I uploaded an XML file as the galley document, and when I view the published article, it just displays XML. This topic led me to believe it would generate PDF / HTML. Is there more documentation somewhere?

I've probably just stuffed up the Settings. I've chosen:

PHP 5.0.0+ with XSL functions (libxslt)

I haven't specified a path to an XSLT renderer (as I thought it wasn't required?)

I've chosen:

NLM Journal Publishing DTD → XHTML

and checked:

Enable rendering PDF galleys using XSL-FO (eg. FOP)

and specified the FO processor path.

I haven't specified a custom XSL stylesheet (again, I thought this was optional).

Does that seem ok?

Thanks,
Suzy
sttis
 
Posts: 27
Joined: Tue May 14, 2013 3:35 am


Return to OJS Development

Who is online

Users browsing this forum: Baidu [Spider], Google [Bot] and 4 guests