PKP Bugzilla – Bug 5648
Implement NLM-OAI citation output
Last modified: 2012-09-21 16:05:51 PDT
Implement NLM 2.3 compatible citation mark-up output for the OJS NLM-OAI interface.
Hi Alec! As you're going to move the NLM output stuff to a filter anyway I won't try to include my reference list output into the class right now. Here's the simple code snippet that you can insert anywhere in your code to get the reference list for a submission. $submission =& ... any Submission object ... // Generate NLM 3.0 XML import('lib.pkp.classes.importexport.nlm.PKPSubmissionNlmXmlFilter'); $nlmFilter = new PKPSubmissionNlmXmlFilter(); $nlmXml = $nlmFilter->execute($submission); // Downgrade to an NLM 2.3 ref-list import('lib.pkp.classes.xslt.XSLTransformationFilter'); $downgradeFilter = new XSLTransformationFilter('NLM 3.0 to 2.3 ref-list downgrade', array('xml::*', 'xml::*')); $downgradeFilter->setXSLFilename('lib/pkp/classes/importexport/nlm/nlm-ref-list-30-to-23.xsl'); $nlmXml = $downgradeFilter->execute($nlmXml); You'll have to call that downgrade internally if you're going to extend the PKPSubmissionNlmXmlFilter for full NLM 2.3 or 3.0 output. I'm going to provide both, 2.3 and 3.0, output via the citation editor once I've got both versions so easily available.
Filter parts of the implementation, see: http://github.com/pkp/pkp-lib/commit/97312b7c06075a064090a2cc32883d62293bed1b http://github.com/pkp/pkp-lib/commit/2970b05235c2fa0cc496b3d238eefcd880b1f6cb http://github.com/pkp/ocs/commit/cbf42dc8bb7ea2296278ae7bda9c2d555474c655 http://github.com/pkp/omp/commit/99b73e2306e525058855413b5e6b62d370980f01
Alec, I accidentally closed this bug. It's the one for your OAI filter implementation so it shouldn't be closed. I'm also re-assigning to you as you're currently working on this. Hope that's ok.
Florian, the NLM code needs to be embedded in an OAI response, but the filter is generating an XML header: <?xml version="1.0"?> ...which is causing validation problems. Is it possible to suppress this header? If not, it should be. (IMO the header, when enabled, should also refer to a DTD or schema for validation.) This appears to be coming from the downgrade XSL. I'm also getting empty metadata when I make the changes you've suggested in the toXml() function in plugins/oaiMetadataFormats/nlm/OAIMetadataFormat_NLM.inc.php, i.e. the generated XML contains only: <?xml version="1.0"?> <ref-list> </ref-list> Looking at the XML output from the filter before the XSL, I get roughly the same: <ref-list> </ref-list> Note that there are no references entered for the submission. A dumb question that might resolve my confusion: is this filter supposed to generate metadata for the article, or just the reference list?
1) Empty metadata: Is it possible that you don't have markup for the citations to be extracted? You'd have to use the citation markup assistant to prepare mark-up for your references first. 2) XML header: You can configure the XSLTransformationFilter to return a DOM rather than a string: $downgradeFilter->setResultType($XSL_TRANSFORMER_DOCTYPE_DOM), then you can use the $resultDom->saveXML($resultDom->documentElement) trick to get rid of the XML header. I hope I've got not typo in this but it should be easy for you to check this in the PHP manual once you got the general idea.
Thanks, Florian. It boils down to my question at the tail end of comment #4 -- I think I might've misunderstood the scope of this filter.
Oh, I didn't read that question thoroughly enough. Here's the answer: The filter is conceived to generate full NLM metadata (which comprises an article), that's what I mean by "expanding" this filter and including the downgrade internally. IMO it's the right place to move the full XML generation to. The sample code I gave was just to represent the current state of development and to leave the option open to postpone the migration of full NLM generation into a filter. I think the explanation I gave in my email is a lot more complete. I'll post it here again for future (public) reference: <http://github.com/pkp/pkp-lib/blob/master/classes/importexport/nlm/PKPSubmissionNlmXmlFilter.inc.php> The above filter is a good example for what is possible with the meta-data framework. This is also the filter you'd have to expand upon if you wanted to migrate the NLM-OAI stuff to the filter (or even meta-data) framework. Another good example for you to look at is the filter that actually builds the element-citation tags, see here: <http://github.com/pkp/pkp-lib/tree/master/classes/citation/output/nlm> Both filters show the kind of abstract coding that is possible with the two frameworks. They do not show the filter persistence and configuration stuff because these filters need no configuration and users don't (yet) have to be aware of them. But you've already seen an example for filter persistence and configuration when you looked at the citation editor setup grids. If you stick to the implementation standards that you see in the two filters shown above then configured instances of them can later easily be persisted when required. BTW: All the important infrastructural classes (e.g. Filter, MetadataDescription, etc.) have extensive class and inline documentation that explains the implemented design concepts quite thoroughly. So it's often helpful to go back to the base classes and see whether I've written some doc there. What needs to be done in the case of the NLM OAI export can be summarized like this: 1) Find out which parts of the NLM conversion are app-specific and which can be implemented in the lib. 2) Find out which parts should be implemented separately because they might have a separate use-case or for better separation of concerns (as in the element-citation case). 3) Migrate all NLM-code to the filters and keep only OAI-specific code in the OAI classes with a very lean call to the NLM filter. 1) and 2) give you the filter class hierarchy you'll have to implement. 3) decouples the NLM code from the OAI code which makes it re-usable across use cases and (as far as the lib part is concerned) applications. Our usual implementation pattern would be to put all the pkp-lib code in a common base class and extend from it for the different apps (template pattern). An alternative (and maybe in this case more adequate pattern for better re-usability) would be the strategy pattern. In this case you'd have the main NLM filter in the pkp-lib and then let it call out to app-specific modules (i.e. separate classes out of the inheritance hierarchy) that do the app-specific parts. The app-specific modules will be injected or configured into the main filter in app-specific configuration code (e.g. in the calling plug-in). This avoids the ugly cross-app duplication of class names as we have it for Application, Request, Locale and the like without having to give up the cross-app re-usability of calling code. As NLM is our central "meta-data hub" it is important that all code that builds on top of your filter can be kept in the library. The application-object-to-NLM code that you're about to write is in fact the only one that really should be (partially) app-specific. Import-export, OAI, indexing, etc. code (across all apps) can then reside in the lib from now on based on that single filter implementation. So the classes we're writing here are really important on our mid-term meta-data and application consolidation roadmap. Another important decision will be the choice of the right interface definition for your filter. I guess you'll have a submission object on input and full Journal Publishing Tag Set NLM 2.3 (or 3.0?) on output for OJS and OCS. I'm really unsure about 2.3 vs. 3.0 right now. 3.0 is the standard currently recommended by NLM, that's why I chose it. I didn't know so far that we're working with 2.3 for Synergies. So maybe it's better to stick to 2.3 and I'll later re-factor our meta-data framework to 2.3 also. I'm not very happy with this "discovery" of course at this point. But what can we do? This is as much my error because I didn't ask more explicitly as it is that of those who should have reviewed the specs I sent around. And this although I'm known to write the shortest specs ever (like this email)! ;-) Once you know how to correctly split up the filter classes and got the interface right, all the rest should fall into place. Of course I'll be there to help you with these decisions. I practiced a lot with filters over the last few weeks. I also don't think that we'll get a very complicated class hierarchy. Probably two or three classes will do the trick for full OJS NLM support. A few more words about the importance of the correct interface definition: The filter framework has automatic type validation and recognition built in (currently for primitive types, classes, meta-data schemas and xml). The input type in your case probably is: "class::lib.pkp.classes.submission.Submission". The output currently is "xml::*" but should probably become "xml::path/to/relax-ng/or/xsd/xyz-schema" once you're done. This will automatically trigger validation on input and output so that you don't have to implement that yourself. If you require any installation prerequisites for your filter then you should also define runtime requirements (see the RuntimeEnvironment class) which will be automatically checked in the framework. The type definition is also very important for filter selection once we work with filter persistence. I've implemented a selection algorithm (see FilterDAO::getCompatibleObjects()) that can select all applicable filters in the database for a given input/output combination even if the type does not match exactly. Example: If you want all filters that potentially give you NLM output then this method will consider all filters with "primitive::string", "xml::*", "xml::some/matching/xsd", "xml::some/matching/relax-or-dtd", etc. for output. On the input side Article, Paper and Monograph classes will all be matched by "class::lib.pkp.classes.submission.Submission". This "polymorphism capability" is one of the most important differences between filters and plug-ins on one side and filters and normal helper classes like "String" on the other side. Filters are community-friendly extension-points like plug-ins but also re-usable like a helper class. When you implement a filter you don't have to know all use-cases for it. You just have to make it as re-usable as possible and get its input/output definition as precise as possible and as generic as necessary. As plug-ins have to hook into existing joinpoints, you have to know about all use-cases they are meant for during implementation. Both concepts are useful in their own right. But meta-data conversion is a prime example where the filter concept is more adequate than both plug-ins and helper classes IMO. A good use case for "filter polymorphism" can be seen in CitationDAO::_instantiateParserFilters() and CitationDAO::_instantiateLookupFilters() where both filters and code that uses those filters are completely unaware of each other. You can add a new parser or lookup connector without that class ever having to know where it's going to be used. And the code that uses these filters also does not have to know about specific filter implementations. Both are completely de-coupled.
Thanks, Florian. Are you not currently expecting the PKPSubmissionNlmXmlFilter.inc.php to return only the citation list? If I understand correctly, we'll have to either make sure anything currently using the PKPSubmissionNlmXmlFilter knows that it may receive more than just the reference list, or we should rename the PKPSubmissionNlmXmlFilter to something more indicative of its purpose.
Hi Alec, no, it's absolutely ok if the filter returns full NLM. That would immediately result in people having access to full NLM output (including references) from the citation assistant. I guess that's much more useful than just getting a citation list. At least that's what I gathered from Juan's feedback and from the initial intent of L8X. I just didn't hope that we'd get that far before the release. :-) The ref-list output was just a compromise. I don't think there's any general re-use of reference-only output as long as you make sure that you keep the app-specific parts apart in the object model so that we can re-use the re-list part internally (see my mega-comment before). And for the renaming: Of course. I'd be happy if you could think of something more descriptive. I usually name my filters like this: [PKP]<FromType><ToType>Filter (i.e. transforming a Submission to NLM XML). Nlm30 rather than Nlm only would probably not hurt as well as an indication that it's about the full ArticlePublishing element set and not the citation element only.
Deferring the NLM rewrite until later. References should be getting served up (will test).