PKP Bugzilla – Bug 8414
Duplicate author entries in "browse by author", with identical names/URL:s
Last modified: 2014-06-25 08:47:00 PDT
We are moving to Git Issues for bug tracking in future releases. During transition, content will be in both tools. If you'd like to file a new bug, please create an issue.
In the browse by author list, some authors are displayed twice, or even more. Names are identical, having generated the same URL, so that clicking on any of the duplicates gets the user to the same author page.
See http://journals.lub.lu.se/ojs/index.php/rig/search/authors?searchInitial=&authorsPage=18#authors ("Myrdal, Janken" being listed three times), or http://journals.lub.lu.se/index.php/STK/search/authors ("Andersson, Stefan" being listed twice) for examples.
Some of the articles were imported through the XML import plugin, some entered through the QuickSubmit plugin.
I haven't been able to try to reproduce this error one step at a time, but I think it might be related to what dak said in this forum thread: http://lib-pkp2.lib.sfu.ca/support/forum/viewtopic.php?f=8&t=2596#p10162. I believe authors affected have articles both as single authors and co-authors.
The first example link should be http://journals.lub.lu.se/ojs/index.php/rig/search/authors?searchInitial=M.
Another clue might be that for one of my example authors mentioned earlier ("Myrdal, Janken", http://journals.lub.lu.se/ojs/index.php/rig/search/authors?searchInitial=M), the list displays either two or three entries, depending on which language (Swedish/English) is chosen by the user.
The articles by this author are tagged with either the Swedish language code, or no language code at all.
Martin, since every article authorship is stored as a different record, they need to be disambiguated when displaying the author list. To do that, we use the first name, middle name, last name, affiliation, and country. If all of those values match *exactly* (including affiliation both in the current locale and in the journal's primary locale), then the author is considered the same. If any of them differ, you'll see additional entries.
The query for this is implemented in classes/article/AuthorDAO.inc.php in the getAuthorsAlphabetizedByJournal function.
It's not a very good disambiguation method, and our solution for it will be to address author disambiguation more systematically -- which will also permit us to implement e.g. ORCID support. But that will take a while.
In the meantime, I'd suggest double-checking the author records to make sure they're identical with respect to the 5 fields used to disambiguate.
Thank you Alec for clarifying how the author listing work!
However, we still have a duplicate entry after having made sure that all the fields you mention (first name, middle name, last name, affiliation and country) are identical across all records of the author. (We are paying attention to the difference between NULL fields and fields set to "", so that should no longer be an issue.)
How would you suggest we proceed in trying to solve this issue?
Martin, I would suggest working directly with database queries; it's possible that there are character encoding issues that are invisible to the eye but still affecting the database. Take the query from the getAuthorsAlphabetizedByJournal function in classes/article/AuthorDAO.inc.php and run it directly from the MySQL client and see if you still get duplicates there; if so, try to identify where the SELECT DISTINCT is going wrong by tweaking the query until you can narrow it down to e.g. a single column.
Checking for news on this.