Bug 8190

Summary: urlencoded DOIs return display issue
Product: OJS Reporter: James MacGregor <jmacgreg>
Component: Submissions and PublishingAssignee: PKP Support <pkp-support>
Status: RESOLVED FIXED    
Severity: normal CC: abadan, alec, bozana.bokan, giovani, jason.nugent
Priority: P3    
Version: 2.4.3   
Hardware: All   
OS: All   
Version Reported In: Also Affects:
Attachments: Patch against OJS 2.4.x
Patch to unescape parenthesis

Description James MacGregor 2013-04-09 12:52:33 PDT
DOIs are displaying some strange formatting issues, for example where "/" is urlencoded: 

http://dx.doi.org/10.1234%2Fojsdj.v1i1.29

instead of 

http://dx.doi.org/10.1234/ojsdj.v1i1.216

This is because the getResolvingURL() function in eg. plugins/pubIds/doi/DOIPubIdPlugin.inc.php urlencodes the pubId: 

function getResolvingURL($journalId, $pubId) {
		return 'http://dx.doi.org/'.urlencode($pubId);
	}

Does this pubId need to be urlencoded?

See also http://pkp.sfu.ca/support/forum/viewtopic.php?f=8&t=9806.
Comment 1 Bozana Bokan 2013-04-10 05:11:50 PDT
I think in the display to the user it doesn't have to be encoded, but I am not sure where it should, if at all, so that we can differentiate i.e. totally remove the encoding. Could someone tell me where those should be encoded and if at all?
Comment 2 Alec Smecher 2013-04-16 15:27:10 PDT
Created attachment 3924 [details]
Patch against OJS 2.4.x

See http://www.niso.org/apps/group_public/download.php/6590/Syntax%20for%20the%20Digital%20Object%20Identifier.pdf appendix E. My take: it's not as simple as removing or keeping the urlencode; when DOIs appear in URLs, the suffix needs to be encoded. Otherwise it should be kept as is (URLs are a special case). James and Bozana, mind reviewing the attached patch? It should address the problem without causing problems for existing DOIs.
Comment 3 James MacGregor 2013-04-16 19:48:29 PDT
Hey Alec, that patch looks good to me (against OJS 2.4.2).
Comment 4 Bozana Bokan 2013-04-17 04:21:58 PDT
to me too :-)
thanks!!!
Comment 5 Alec Smecher 2013-04-17 08:55:47 PDT
Thanks! Committed to master and ojs-stable-2_4; adding to recommended patch list.
Comment 6 Alec Smecher 2013-04-17 09:00:03 PDT
Fix urlencoding of DOI
https://github.com/pkp/ojs/commit/b2294f145f3a137d969105baee085abdacfc35cb
Comment 7 Alec Smecher 2013-04-17 09:00:03 PDT
Fix urlencoding of DOI
https://github.com/pkp/ojs/commit/251813911874d416f237ea3177a9fdb100637016
Comment 8 Giovani Pieri 2014-01-09 06:06:43 PST
Created attachment 3982 [details]
Patch to unescape parenthesis

Hi,

Some editors here in Brazil are complaining that the parenthesis are being escaped in the DOI URL. They are using the parenthesis to identify the article's issue. Ex: http://dx.doi.org/10.xxxx/journal8(29)834 for an article in issue number 29.

The document cited by Alec in Comment #2 states that it is mandatory to percent-encode the characters %, #, " and spaces, and should escape characters that are not allowed or have special meaning in the URI RFC should be percent-escaped (recommending to escape <, > and { ).

The RFC http://tools.ietf.org/html/rfc3986#section-2 states that parenthesis are valid characters in a URL, thus they may not be escaped. Would it be possible to not escape parenthesis?

I attached a patch that would solve this issue by unescaping these characters. What are your thoughts on this?


Thanks
Comment 9 Giovani Pieri 2014-01-09 06:07:43 PST
(In reply to comment #8)
> Created attachment 3982 [details]
> Patch to unescape parenthesis
> 
> Hi,
> 
> Some editors here in Brazil are complaining that the parenthesis are being
> escaped in the DOI URL. They are using the parenthesis to identify the
> article's issue. Ex: http://dx.doi.org/10.xxxx/journal8(29)834 for an
> article in issue number 29.
> 
> The document cited by Alec in Comment #2 states that it is mandatory to
> percent-encode the characters %, #, " and spaces, and should escape
> characters that are not allowed or have special meaning in the URI RFC
> should be percent-escaped (recommending to escape <, > and { ).
> 
> The RFC http://tools.ietf.org/html/rfc3986#section-2 states that parenthesis
> are valid characters in a URL, thus they may not be escaped. Would it be
> possible to not escape parenthesis?
> 
> I attached a patch that would solve this issue by unescaping these
> characters. What are your thoughts on this?
> 
> 
> Thanks

I forgot to mention: the patch is against the ojs-stable-2_4 branch.
Comment 10 Jason Nugent 2014-01-09 11:18:14 PST
Hi Giovani,

The patch looks good to me.  I'll let Bozana or James take a quick peek at it since they know DOI better, but I can merge your patch into the code if everyone is on board.

Regards,
Jason
Comment 11 Alec Smecher 2014-07-03 04:44:02 PDT
Pull request opened (not merged):
fix URL encoding for DOIs
https://github.com/pkp/ojs/pull/234
Comment 12 Alec Smecher 2014-07-14 06:34:01 PDT
Pull request synchronize (not merged):
fix URL encoding for DOIs
https://github.com/pkp/ojs/pull/234
Comment 13 Alec Smecher 2014-07-14 06:42:01 PDT
Pull request synchronize (not merged):
fix URL encoding for DOIs
https://github.com/pkp/ojs/pull/234