Writing Docbook Documentation for the PKP
This document is currently extremely incomplete, and undergoing revisions as I switch from DocBook4.5 to 5.0.
Creating a Docbook XML sourcefile is mainly a matter of identifying what kind of document you are working on, what kinds of tags you should be using to describe your document. This meta-document details the steps I use to create book- and article-level source documentation for the Public Knowledge Project. It will also describe the steps I take to transform XML source files to HTML and PDF.
Writings, books on DocBook
- DocBook v5.0: The Definitive Guide
- DocBook XSL: The Complete Guide
- Dave Pawson's DocBook site, may be slightly out-of-date
Other interesting stuff
- 'Writing "Learning PHP 5" -- an article by David Sklar; he talks about writing his book using DocBook Lite and XEmacs, with some neat keybindings.
- a good intro into the whats and whys of DocBook
Writing an Article
First, choose whether you are writing a book or an article. This identifies the root element you'll start with: <book> or <article>. For the rest of this document, we'll assume we're writing an article.
At bare minimum, your source file will look like so:
<?xml version="1.0" encoding="UTF-8"?> <article xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0"> <info> <title>Importing and Exporting Data with OJS</title> <author> <orgname>The Public Knowledge Project</orgname> <address> <city>Burnaby</city> <street>8888 University Drive</street> <postcode>V5A 1S6</postcode> <country>Canada</country> </address> <email>email@example.com</email> </author> </info> <sect1 xml:id="preface"><title>Preface</title> <para>Open Journal Systems is a research and development initiative of the Public Knowledge Project at the University of British Columbia. Its continuing development is currently overseen by a partnership among UBC's Public Knowledge Project, the Canadian Center for Studies in Publishing, and the Simon Fraser University Library. For more information, see the Public Knowledge Project web site: <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://pkp.sfu.ca">http://pkp.sfu.ca</link>. </para> <para>This work is licensed under the Creative Commons Attribution-Share Alike 2.5 Canada License. To view a copy of this license, visit <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://creativecommons.org/licenses/by-sa/2.5/ca/">http://creativecommons.org/licenses/by-sa/2.5/ca/</link> or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA. </para> </sect1> </article>
Using <sect> tags, you can create a nicely-nested article with main section, subsections, etc. When you transform your document, sect1 tags will become items in the article's Table of Contents; sect2 items will become nested items in the Table of Contents; but sect3 and below will not appear.
When I start a new document, I typically take the above template and flesh out the overall document structure by section. You can view an example framework here. Keep in mind that you need to add xml:id attributes to the sect element, and that they need to be unique; and that each section should have a title as well. Then, it's just a matter of filling in the para tags with relevant information.
Writing a Book
Info will come as I convert the OxS in an Hour docs as well as the Technical Reference. But mostly it's about using the following as your root element/namespace identifier:
<?xml version="1.0" encoding="UTF-8"?> <book xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0">
A general outline of a DocBook book would look like so:
<?xml version="1.0" encoding="UTF-8"?> <book xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0"> <chapter xml:id="preface"> <title>Preface</title> <para> ... </para> </chapter> <chapter xml:id="introduction"> <title>Introduction</title> <para> ... </para> <chapter xml:id="chapter1"> <title>Chapter 1</title> <sect1> <title>Section 1</title> <para> ... </para> </sect1> </chapter> </book>
And so on.
Common DocBook Elements You Should be Using
You can find a list of all elements for DocBook 5.0 here.
- Use <filename>/plugins/importExport/native/native.dtd</filename> when you are referring to file names and locations.
- Use <command>php importExport.php</command> when you reference commands.
- Use <userinput>User Home</userinput> for referencing ... user ... input.
<para>To import a file, you can use <userinput><embed></userinput> to place a file directly within your XML document, or use <userinput><href></userinput> to link to one.</para>
- Use <![CDATA[[<p>all this text up in here</p>]]> for tags that should be ignored. This example would tell the parser to ignore those <p> tags. (Let me know if you come up with a clever way to ignore <![CDATA]> itself.)
- Use <element xl:href="http://pkp.sfu.ca">Public Knowledge Project</element> to hyperlink to an external page. "element" can be any inline element. You can also use "link" as a generic elementu.
- Use <element linkend="sectionId">link text</element> for linking within the document itself. "element" can be any inline element. You can also use "link" as a generic element.
Block elements come into play when you have paragraph-level blocks of text that need to be identified and formatted differently than a normal paragraph, for example lists, examples, tips, and so on.
- For code examples, listings, etc.:
for large, multiline code blocks. You can also use <informalexample> and omit the title information.
<example> <title>Example Code Snippet</title> <programlisting>/multiple lines of code/</programlisting> </example>
- For Tips:
<tip> <para>this is a tip</para> </tip>
- For Warnings
<warning> <para>this is a warning</para> </warning>
- For non-numbered lists:
<itemizedlist> <listitem> <para>Item one</para> </listitem> <listitem> <para>Item two</para> </listitem> <listitem> <para>Item three</para> </listitem> </itemizedList>
Setting up the Tools you need to work with DocBook
DocBook XSL: The Complete Guide has a very good chapter on setting up all the tools you'll need to transform DocBook XML into HTML and PDF. You can download OS/platform-specific packages here, or at a minimum you can make sure you have the following installed:
- DocBook DTD (you can always point to an online DTD, of course)
- DocBook XSL Stylesheets
- XSLT Processor (to transform to HTML and FO)
- XSL-FO Processor (to transform from FO to PDF)
Installing this stuff on Ubuntu
I'm currently handling all of my transformations on Ubuntu using xsltproc for HTML and FO, and FOP for FO->PDF. The following instructions will assume you are using the same tools in the same general environment.
To install the first three items in Ubuntu, install the following packages through apt-get or Synaptic: docbook-xml (installs the DTD), docbook-xsl (the stylesheets), xsltproc (the tool to transform to HTML and FO). Ubuntu installs the stylesheets to /usr/share/xml/docbook/stylesheet/nwalsh/. You can put them anywhere you want because you'll just be pointing to particular ones with xsltproc, but I'll reference that location below. You can also download them separately if you're not running Ubuntu from here.
You can't use Synaptic or apt-get to install FOP on Ubuntu, but the DocBook XSL guide has a page on installing it here. I had some minor difficulty in getting it to work, but if memory serves that was a Java problem that got fixed by paying attention to the guide.
Installing this stuff on Mac OS X
Rough notes on moving my environment from Ubuntu to OS X:
- For the stylesheets: Download the docbook xsl stylesheets from Sourceforge; install them somewhere reasonable (I extracted the tar file to /Users/jmacgreg/docbook). You'll have to remember this path later.
- For the XSLT Processor: Use xsltproc -- this should already be on your machine and available from the terminal.
- For the XSL-FO Processor: Install macports. With macports, install fop:
sudo port install fop
- Additional tips: Oxygen XML is available for OS X.
Using the tools
You can use xsltproc to transform from DocBook to HTML and FO; and then FOP to transform your FO file to PDF. You can also create rudimentary DocBook files from existing Word documents using Openoffice.org.
.doc/.odt to DocBook XML
- Download and install openoffice or NeoOffice (for OS X).
- Open the .doc (Microsoft Office) or .odt (Open Document Type) file in openoffice.
- Use the "Save As ..." function, and choose DocBook XML.
- Clean up big time. I recommend opening in Oxygen, and using Oxygen's built-in DocBook 4->DocBook 5 transformation.
DocBook to HTML
If you have a valid DocBook XML file by the name of example.xml, you should now be able to run the command
xsltproc --output example.html /usr/share/xml/docbook/stylesheet/nwalsh/xhtml/docbook.xsl example.xml
That line is basically saying "take example.xml, and use the xHTML docbook stylesheet in conjunction with xsltproc to spit out example.html".
To "chunk", or split your outputted by section into multiple pages, use
xsltproc --output /usr/share/xml/docbook/stylesheet/nwalsh/xhtml/chunk.xsl example.xml
I've created a PKP-specific customization layer that adds header image links and a link to the PKP documentation stylesheet, available here. You'll have to add it to the xhtml stylesheet directory, as it references chunk.xsl, and then point xsltproc to it instead of chunk.xsl. This customization layer is still under development.
DocBook to PDF
To transform to PDF, you'll first have to transform to FO.
xsltproc --output example.fo --stringparam fop1.extensions 1 /usr/share/xml/docbook/stylesheet/nwalsh/fo/docbook.xsl example.xml
This command has xsltproc take example.xml and transform it to example.fo. Using FOP, you can then transform example.fo to example.pdf:
/path/to/fop -fo myfile.fo myfile.pdf
You should now have a set of HTML files, an FO file that you can pitch, and a PDF file.
I've created a PKP-specific customization layer that works around some of the more common PDF transformation issues, available here. You'll have to add it to the fo stylesheet directory, as it references docbook.xsl, and then point xsltproc to it instead of docbook.xsl. This customization layer is still under development.
DocBook to Drupal
Although Drupal supports DocBook export of a book, there is currently no way to import a DocBook XML file into Drupal. This is an issue for us as we're using Drupal to manage the PKP site, and we'd understandably like to integrate our documentation as best we can. We'll be working on this in one way or another in the near future.
DocBook to OJS
Or am I?