PKP Bugzilla – Bug 5648
Implement NLM-OAI citation output
Last modified: 2012-09-21 16:05:51 PDT
Implement NLM 2.3 compatible citation mark-up output for the OJS NLM-OAI interface.
As you're going to move the NLM output stuff to a filter anyway I won't try to include my reference list output into the class right now.
Here's the simple code snippet that you can insert anywhere in your code to get the reference list for a submission.
$submission =& ... any Submission object ...
// Generate NLM 3.0 XML
$nlmFilter = new PKPSubmissionNlmXmlFilter();
$nlmXml = $nlmFilter->execute($submission);
// Downgrade to an NLM 2.3 ref-list
$downgradeFilter = new XSLTransformationFilter('NLM 3.0 to 2.3 ref-list downgrade', array('xml::*', 'xml::*'));
$nlmXml = $downgradeFilter->execute($nlmXml);
You'll have to call that downgrade internally if you're going to extend the PKPSubmissionNlmXmlFilter for full NLM 2.3 or 3.0 output. I'm going to provide both, 2.3 and 3.0, output via the citation editor once I've got both versions so easily available.
Filter parts of the implementation, see:
Alec, I accidentally closed this bug. It's the one for your OAI filter implementation so it shouldn't be closed. I'm also re-assigning to you as you're currently working on this. Hope that's ok.
Florian, the NLM code needs to be embedded in an OAI response, but the filter is generating an XML header:
...which is causing validation problems. Is it possible to suppress this header? If not, it should be. (IMO the header, when enabled, should also refer to a DTD or schema for validation.)
This appears to be coming from the downgrade XSL.
I'm also getting empty metadata when I make the changes you've suggested in the toXml() function in plugins/oaiMetadataFormats/nlm/OAIMetadataFormat_NLM.inc.php, i.e. the generated XML contains only:
Looking at the XML output from the filter before the XSL, I get roughly the same:
Note that there are no references entered for the submission. A dumb question that might resolve my confusion: is this filter supposed to generate metadata for the article, or just the reference list?
1) Empty metadata: Is it possible that you don't have markup for the citations to be extracted? You'd have to use the citation markup assistant to prepare mark-up for your references first.
2) XML header: You can configure the XSLTransformationFilter to return a DOM rather than a string: $downgradeFilter->setResultType($XSL_TRANSFORMER_DOCTYPE_DOM), then you can use the $resultDom->saveXML($resultDom->documentElement) trick to get rid of the XML header. I hope I've got not typo in this but it should be easy for you to check this in the PHP manual once you got the general idea.
Thanks, Florian. It boils down to my question at the tail end of comment #4 -- I think I might've misunderstood the scope of this filter.
Oh, I didn't read that question thoroughly enough. Here's the answer: The filter is conceived to generate full NLM metadata (which comprises an article), that's what I mean by "expanding" this filter and including the downgrade internally. IMO it's the right place to move the full XML generation to. The sample code I gave was just to represent the current state of development and to leave the option open to postpone the migration of full NLM generation into a filter.
I think the explanation I gave in my email is a lot more complete. I'll post it here again for future (public) reference:
The above filter is a good example for what is possible with the
meta-data framework. This is also the filter you'd have to expand upon
if you wanted to migrate the NLM-OAI stuff to the filter (or even
Another good example for you to look at is the filter that actually
builds the element-citation tags, see here:
Both filters show the kind of abstract coding that is possible with the
two frameworks. They do not show the filter persistence and
configuration stuff because these filters need no configuration and
users don't (yet) have to be aware of them. But you've already seen an
example for filter persistence and configuration when you looked at the
citation editor setup grids.
If you stick to the implementation standards that you see in the two
filters shown above then configured instances of them can later easily
be persisted when required.
BTW: All the important infrastructural classes (e.g. Filter,
MetadataDescription, etc.) have extensive class and inline documentation
that explains the implemented design concepts quite thoroughly. So it's
often helpful to go back to the base classes and see whether I've
written some doc there.
What needs to be done in the case of the NLM OAI export can be
summarized like this:
1) Find out which parts of the NLM conversion are app-specific and which
can be implemented in the lib.
2) Find out which parts should be implemented separately because they
might have a separate use-case or for better separation of concerns (as
in the element-citation case).
3) Migrate all NLM-code to the filters and keep only OAI-specific code
in the OAI classes with a very lean call to the NLM filter.
1) and 2) give you the filter class hierarchy you'll have to implement.
3) decouples the NLM code from the OAI code which makes it re-usable
across use cases and (as far as the lib part is concerned) applications.
Our usual implementation pattern would be to put all the pkp-lib code in
a common base class and extend from it for the different apps (template
An alternative (and maybe in this case more adequate pattern for better
re-usability) would be the strategy pattern. In this case you'd have the
main NLM filter in the pkp-lib and then let it call out to app-specific
modules (i.e. separate classes out of the inheritance hierarchy) that do
the app-specific parts. The app-specific modules will be injected or
configured into the main filter in app-specific configuration code (e.g.
in the calling plug-in). This avoids the ugly cross-app duplication of
class names as we have it for Application, Request, Locale and the like
without having to give up the cross-app re-usability of calling code.
As NLM is our central "meta-data hub" it is important that all code that
builds on top of your filter can be kept in the library. The
application-object-to-NLM code that you're about to write is in fact the
only one that really should be (partially) app-specific. Import-export,
OAI, indexing, etc. code (across all apps) can then reside in the lib
from now on based on that single filter implementation. So the classes
we're writing here are really important on our mid-term meta-data and
application consolidation roadmap.
Another important decision will be the choice of the right interface
definition for your filter. I guess you'll have a submission object on
input and full Journal Publishing Tag Set NLM 2.3 (or 3.0?) on output
for OJS and OCS. I'm really unsure about 2.3 vs. 3.0 right now. 3.0 is
the standard currently recommended by NLM, that's why I chose it. I
didn't know so far that we're working with 2.3 for Synergies. So maybe
it's better to stick to 2.3 and I'll later re-factor our meta-data
framework to 2.3 also. I'm not very happy with this "discovery" of
course at this point. But what can we do? This is as much my error
because I didn't ask more explicitly as it is that of those who should
have reviewed the specs I sent around. And this although I'm known to
write the shortest specs ever (like this email)! ;-)
Once you know how to correctly split up the filter classes and got the
interface right, all the rest should fall into place. Of course I'll be
there to help you with these decisions. I practiced a lot with filters
over the last few weeks. I also don't think that we'll get a very
complicated class hierarchy. Probably two or three classes will do the
trick for full OJS NLM support.
A few more words about the importance of the correct interface definition:
The filter framework has automatic type validation and recognition built
in (currently for primitive types, classes, meta-data schemas and xml).
The input type in your case probably is:
"class::lib.pkp.classes.submission.Submission". The output currently is
"xml::*" but should probably become
"xml::path/to/relax-ng/or/xsd/xyz-schema" once you're done. This will
automatically trigger validation on input and output so that you don't
have to implement that yourself.
If you require any installation prerequisites for your filter then you
should also define runtime requirements (see the RuntimeEnvironment
class) which will be automatically checked in the framework.
The type definition is also very important for filter selection once we
work with filter persistence. I've implemented a selection algorithm
(see FilterDAO::getCompatibleObjects()) that can select all applicable
filters in the database for a given input/output combination even if the
type does not match exactly. Example: If you want all filters that
potentially give you NLM output then this method will consider all
filters with "primitive::string", "xml::*", "xml::some/matching/xsd",
"xml::some/matching/relax-or-dtd", etc. for output. On the input side
Article, Paper and Monograph classes will all be matched by
This "polymorphism capability" is one of the most important differences
between filters and plug-ins on one side and filters and normal helper
classes like "String" on the other side. Filters are community-friendly
extension-points like plug-ins but also re-usable like a helper class.
When you implement a filter you don't have to know all use-cases for it.
You just have to make it as re-usable as possible and get its
input/output definition as precise as possible and as generic as
necessary. As plug-ins have to hook into existing joinpoints, you have
to know about all use-cases they are meant for during implementation.
Both concepts are useful in their own right. But meta-data conversion is
a prime example where the filter concept is more adequate than both
plug-ins and helper classes IMO.
A good use case for "filter polymorphism" can be seen in
CitationDAO::_instantiateLookupFilters() where both filters and code
that uses those filters are completely unaware of each other. You can
add a new parser or lookup connector without that class ever having to
know where it's going to be used. And the code that uses these filters
also does not have to know about specific filter implementations. Both
are completely de-coupled.
Thanks, Florian. Are you not currently expecting the PKPSubmissionNlmXmlFilter.inc.php to return only the citation list? If I understand correctly, we'll have to either make sure anything currently using the PKPSubmissionNlmXmlFilter knows that it may receive more than just the reference list, or we should rename the PKPSubmissionNlmXmlFilter to something more indicative of its purpose.
Hi Alec, no, it's absolutely ok if the filter returns full NLM. That would immediately result in people having access to full NLM output (including references) from the citation assistant. I guess that's much more useful than just getting a citation list. At least that's what I gathered from Juan's feedback and from the initial intent of L8X. I just didn't hope that we'd get that far before the release. :-) The ref-list output was just a compromise. I don't think there's any general re-use of reference-only output as long as you make sure that you keep the app-specific parts apart in the object model so that we can re-use the re-list part internally (see my mega-comment before).
And for the renaming: Of course. I'd be happy if you could think of something more descriptive. I usually name my filters like this:
[PKP]<FromType><ToType>Filter (i.e. transforming a Submission to NLM XML).
Nlm30 rather than Nlm only would probably not hurt as well as an indication that it's about the full ArticlePublishing element set and not the citation element only.
Deferring the NLM rewrite until later. References should be getting served up (will test).