PKP Bugzilla – Bug 3914
Less redundancy with batch uploads and re-uploaded material.
Last modified: 2013-05-29 15:21:50 PDT
We are moving to Git Issues for bug tracking in future releases. During transition, content will be in both tools. If you'd like to file a new bug, please create an issue.
Almost everything we upload at UNB is done on the command line with the Native Import/Export Plugin. Sometimes, however, there are complications with our uploads during the process which stops the upload half way through. This leaves a half-published issue online, which is typically deleted once the full issue goes up. However, those articles in that half-uploaded issue, once deleted, are shuffled over to the "Unassigned" category under Editor functions.
Since these files are redundant, it would be nice to be able to remove them. Archiving and deleting these files is currently arduous since it involves at least 8 clicks (or more, depending on the depth of your archives) per redundant article.
There's probably a number of ways to solve this: 1) The ability for a system admin or journal manager to blow away an issue entirely (perhaps automatically archiving the articles therein?) 2) an element in native.dtd that denotes an issue overwrite for quick replacement of HTML/PDF/Metadata (this might be nice regardless, and possibly a solution to my other feature request here: http://pkp.sfu.ca/bugzilla/show_bug.cgi?id=3913 ) without any deletion to begin with. 3) A system admin or journal manager function to purge unassigned articles (this probably has limited use for most single journals, but would be handy at the aggrigating-level).
I'll second the "XML import as overwrite" option. See http://pkp.sfu.ca/support/forum/viewtopic.php?f=2&t=4081 for further discussion; but in a nutshell, it'd be nice to be able to batch overwrite previous issue data and content; or at the very least, batch delete obsolete information, if updating a large print run.
- what to do if the obsolete content has more information (galleys; metadata) than the new content: should the old content be removed entirely and replaced with the new set, or should the content be merged? Especially an issue if one were to want to add additional new content to an already-published issue (like a previously missing editorial) but not mess with the already-existing content. Maybe we could flag imported content as new/overwrite/merge?
- how to identify whether and which issue to overwrite. Since the Title element is mandatory, I would suggest matching to that.
If it was me, I'd blow away the existing content for the issue. If a journal editor or a content editor wants to batch upload material, they should be aware of the consequences of doing so, and I think that not doing this might leave an issue in a state where there could be a mix of old and new material and that could lead to confusion.
James, I know the title is mandatory, but it's probably entirely possible for mor than one issue to have the same title. Perhaps just using the unique issue id would be enough? That value would already exist, since the issue had to have been created already.
(In reply to comment #2)
Thank God for brighter minds than mine:
> If it was me, I'd blow away the existing content for the issue. If a journal
> editor or a content editor wants to batch upload material, they should be aware
> of the consequences of doing so, and I think that not doing this might leave an
> issue in a state where there could be a mix of old and new material and that
> could lead to confusion.
I'd agree with this. It should be the journal's responsibility for archiving old XML upload files; it would be far cleaner to update these if necessary in anticipation of a complete deletion/rewrite than to start managing incremental (and subsequently messy) updates.
One general question I have about this: should users see any indication on the website that files have been updated? This probably isn't a huge issue in most cases, but generally speaking, people like to know whether updates have happened. Even a simple line item in the article metadata ("Last updated on: 2009-12-25") might be nice. Probably outside of the scope of this report (and applicable to a host of other situations), but I thought I'd raise the issue.
> James, I know the title is mandatory, but it's probably entirely possible for
> mor than one issue to have the same title. Perhaps just using the unique issue
> id would be enough? That value would already exist, since the issue had to
> have been created already.
Good call; I agree.
(In reply to comment #3)
> One general question I have about this: should users see any indication on the
> website that files have been updated? This probably isn't a huge issue in most
> cases, but generally speaking, people like to know whether updates have
> happened. Even a simple line item in the article metadata ("Last updated on:
> 2009-12-25") might be nice. Probably outside of the scope of this report (and
> applicable to a host of other situations), but I thought I'd raise the issue.
We've wrestled with this here, actually. I'm not sure which journal editor wanted it, but they'd like to have the date attached to the HTML version of the journal. If it was me, I'd just modify the .tpl that generates the abstract page with the links to the HTML and PDF versions to indicate the last modified date. Probably a fairly easy thing to implement.
There's code in place (though it obviously needs a review) to track the entities that are created when an import takes place, and delete it if an error occurs during import. I'd suggest extending this somehow (generate a log that can be turned into database deletes?) to allow the content to be removed again. I'm nervous about anything that deletes existing content when importing, because of the effect this might have on e.g. editorial records.
This may be moving beyond the scope of the bug, but what about archiving the material instead of removing it? It would become unaccessible for all except journal editors and content managers. This would provide the benefit of not mixing old content with new, but also providing a bit of security and an audit trail, as Alec seems to suggest.
I would like to support the "XML import as overwrite" idea. In our workflow we have the most valid metadata included in our XML galleys while the metadata provided by authors during submission often are incomplete, preliminary or faulty. That's why we would like to import the metadata into OJS overwriting the existing field values of submitted articles. This would be a tremendous improvement for our workflow as we would not have to update manually all article level metadata.
Armin, see bug #8076 for a more relevant entry.