Bug 8190 - urlencoded DOIs return display issue
urlencoded DOIs return display issue
Status: RESOLVED FIXED
Product: OJS
Classification: Unclassified
Component: Submissions and Publishing
2.4.3
All All
: P3 normal
Assigned To: PKP Support
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-04-09 12:52 PDT by James MacGregor
Modified: 2014-01-09 11:18 PST (History)
5 users (show)

See Also:
Version Reported In:
Also Affects:


Attachments
Patch against OJS 2.4.x (800 bytes, patch)
2013-04-16 15:27 PDT, Alec Smecher
Details | Diff
Patch to unescape parenthesis (774 bytes, patch)
2014-01-09 06:06 PST, Giovani Pieri
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James MacGregor 2013-04-09 12:52:33 PDT
DOIs are displaying some strange formatting issues, for example where "/" is urlencoded: 

http://dx.doi.org/10.1234%2Fojsdj.v1i1.29

instead of 

http://dx.doi.org/10.1234/ojsdj.v1i1.216

This is because the getResolvingURL() function in eg. plugins/pubIds/doi/DOIPubIdPlugin.inc.php urlencodes the pubId: 

function getResolvingURL($journalId, $pubId) {
		return 'http://dx.doi.org/'.urlencode($pubId);
	}

Does this pubId need to be urlencoded?

See also http://pkp.sfu.ca/support/forum/viewtopic.php?f=8&t=9806.
Comment 1 Bozana Bokan 2013-04-10 05:11:50 PDT
I think in the display to the user it doesn't have to be encoded, but I am not sure where it should, if at all, so that we can differentiate i.e. totally remove the encoding. Could someone tell me where those should be encoded and if at all?
Comment 2 Alec Smecher 2013-04-16 15:27:10 PDT
Created attachment 3924 [details]
Patch against OJS 2.4.x

See http://www.niso.org/apps/group_public/download.php/6590/Syntax%20for%20the%20Digital%20Object%20Identifier.pdf appendix E. My take: it's not as simple as removing or keeping the urlencode; when DOIs appear in URLs, the suffix needs to be encoded. Otherwise it should be kept as is (URLs are a special case). James and Bozana, mind reviewing the attached patch? It should address the problem without causing problems for existing DOIs.
Comment 3 James MacGregor 2013-04-16 19:48:29 PDT
Hey Alec, that patch looks good to me (against OJS 2.4.2).
Comment 4 Bozana Bokan 2013-04-17 04:21:58 PDT
to me too :-)
thanks!!!
Comment 5 Alec Smecher 2013-04-17 08:55:47 PDT
Thanks! Committed to master and ojs-stable-2_4; adding to recommended patch list.
Comment 6 Alec Smecher 2013-04-17 09:00:03 PDT
Fix urlencoding of DOI
https://github.com/pkp/ojs/commit/b2294f145f3a137d969105baee085abdacfc35cb
Comment 7 Alec Smecher 2013-04-17 09:00:03 PDT
Fix urlencoding of DOI
https://github.com/pkp/ojs/commit/251813911874d416f237ea3177a9fdb100637016
Comment 8 Giovani Pieri 2014-01-09 06:06:43 PST
Created attachment 3982 [details]
Patch to unescape parenthesis

Hi,

Some editors here in Brazil are complaining that the parenthesis are being escaped in the DOI URL. They are using the parenthesis to identify the article's issue. Ex: http://dx.doi.org/10.xxxx/journal8(29)834 for an article in issue number 29.

The document cited by Alec in Comment #2 states that it is mandatory to percent-encode the characters %, #, " and spaces, and should escape characters that are not allowed or have special meaning in the URI RFC should be percent-escaped (recommending to escape <, > and { ).

The RFC http://tools.ietf.org/html/rfc3986#section-2 states that parenthesis are valid characters in a URL, thus they may not be escaped. Would it be possible to not escape parenthesis?

I attached a patch that would solve this issue by unescaping these characters. What are your thoughts on this?


Thanks
Comment 9 Giovani Pieri 2014-01-09 06:07:43 PST
(In reply to comment #8)
> Created attachment 3982 [details]
> Patch to unescape parenthesis
> 
> Hi,
> 
> Some editors here in Brazil are complaining that the parenthesis are being
> escaped in the DOI URL. They are using the parenthesis to identify the
> article's issue. Ex: http://dx.doi.org/10.xxxx/journal8(29)834 for an
> article in issue number 29.
> 
> The document cited by Alec in Comment #2 states that it is mandatory to
> percent-encode the characters %, #, " and spaces, and should escape
> characters that are not allowed or have special meaning in the URI RFC
> should be percent-escaped (recommending to escape <, > and { ).
> 
> The RFC http://tools.ietf.org/html/rfc3986#section-2 states that parenthesis
> are valid characters in a URL, thus they may not be escaped. Would it be
> possible to not escape parenthesis?
> 
> I attached a patch that would solve this issue by unescaping these
> characters. What are your thoughts on this?
> 
> 
> Thanks

I forgot to mention: the patch is against the ojs-stable-2_4 branch.
Comment 10 Jason Nugent 2014-01-09 11:18:14 PST
Hi Giovani,

The patch looks good to me.  I'll let Bozana or James take a quick peek at it since they know DOI better, but I can merge your patch into the code if everyone is on board.

Regards,
Jason