Bug 8414 - Duplicate author entries in "browse by author", with identical names/URL:s
Duplicate author entries in "browse by author", with identical names/URL:s
Status: NEW
Product: OJS
Classification: Unclassified
Component: Readers
2.4.x
PC Linux
: P3 normal
Assigned To: PKP Support
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-05 00:50 PDT by Martin Persson
Modified: 2013-09-06 08:44 PDT (History)
1 user (show)

See Also:
Version Reported In:
Also Affects:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Persson 2013-09-05 00:50:59 PDT
In the browse by author list, some authors are displayed twice, or even more. Names are identical, having generated the same URL, so that clicking on any of the duplicates gets the user to the same author page.

See http://journals.lub.lu.se/ojs/index.php/rig/search/authors?searchInitial=&authorsPage=18#authors ("Myrdal, Janken" being listed three times), or http://journals.lub.lu.se/index.php/STK/search/authors ("Andersson, Stefan" being listed twice) for examples.

Some of the articles were imported through the XML import plugin, some entered through the QuickSubmit plugin.

I haven't been able to try to reproduce this error one step at a time, but I think it might be related to what dak said in this forum thread: http://lib-pkp2.lib.sfu.ca/support/forum/viewtopic.php?f=8&t=2596#p10162. I believe authors affected have articles both as single authors and co-authors.
Comment 1 Martin Persson 2013-09-05 01:40:44 PDT
The first example link should be http://journals.lub.lu.se/ojs/index.php/rig/search/authors?searchInitial=M.
Comment 2 Martin Persson 2013-09-05 02:18:54 PDT
Another clue might be that for one of my example authors mentioned earlier ("Myrdal, Janken", http://journals.lub.lu.se/ojs/index.php/rig/search/authors?searchInitial=M), the list displays either two or three entries, depending on which language (Swedish/English) is chosen by the user.

The articles by this author are tagged with either the Swedish language code, or no language code at all.
Comment 3 Alec Smecher 2013-09-05 09:06:53 PDT
Martin, since every article authorship is stored as a different record, they need to be disambiguated when displaying the author list. To do that, we use the first name, middle name, last name, affiliation, and country. If all of those values match *exactly* (including affiliation both in the current locale and in the journal's primary locale), then the author is considered the same. If any of them differ, you'll see additional entries.

The query for this is implemented in classes/article/AuthorDAO.inc.php in the getAuthorsAlphabetizedByJournal function.

It's not a very good disambiguation method, and our solution for it will be to address author disambiguation more systematically -- which will also permit us to implement e.g. ORCID support. But that will take a while.

In the meantime, I'd suggest double-checking the author records to make sure they're identical with respect to the 5 fields used to disambiguate.
Comment 4 Martin Persson 2013-09-06 01:45:35 PDT
Thank you Alec for clarifying how the author listing work!

However, we still have a duplicate entry after having made sure that all the fields you mention (first name, middle name, last name, affiliation and country) are identical across all records of the author. (We are paying attention to the difference between NULL fields and fields set to "", so that should no longer be an issue.)

How would you suggest we proceed in trying to solve this issue?
Comment 5 Alec Smecher 2013-09-06 08:44:07 PDT
Martin, I would suggest working directly with database queries; it's possible that there are character encoding issues that are invisible to the eye but still affecting the database. Take the query from the getAuthorsAlphabetizedByJournal function in classes/article/AuthorDAO.inc.php and run it directly from the MySQL client and see if you still get duplicates there; if so, try to identify where the SELECT DISTINCT is going wrong by tweaking the query until you can narrow it down to e.g. a single column.