You are viewing the PKP Support Forum | PKP Home Wiki

Bug: Forms fail to preserve Unicode data

Are you responsible for making OCS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, michael, John

Forum rules
What to do if you have a technical problem with OCS:

1. Search the forum. You can do this from the Advanced Search Page or from our Google Custom Search, which will search the entire PKP site. If you are encountering an error, we especially recommend searching the forum for said error.

2. Check the FAQ to see if your question or error has already been resolved. Please note that this FAQ is OJS-centric, but most issues are applicable to both platforms.

3. Post a question, but please, only after trying the above two solutions. If it's a workflow or usability question you should probably post to the OCS Conference Support and Discussion subforum; if you have a development question, try the OCS Development subforum.

Bug: Forms fail to preserve Unicode data

Postby derekp » Wed Oct 10, 2007 2:44 am

OCS 2.0.0 fails to preserve Unicode input to forms, even when the database is properly configured to use UTF-8. For instance, if you create a user whose last name contains the ö (o umlaut) character, it is stored as an 'o' without the umlaut.

I'm using PHP 5.2.0.

The following patch fixes the problem:
Code: Select all
--- ocs-2.0.0-1/classes/form/Form.inc.php.unicode       2007-05-10 20:20:45.000000000 -0700
+++ ocs-2.0.0-1/classes/form/Form.inc.php       2007-10-10 02:05:04.675730000 -0700
@@ -97,15 +97,8 @@
                                // utf8_decode to work in latin-1 (information may be lost)
                                $trans =& new Transcoder('CP1252', 'UTF-8');
                                $value = $trans->trans($value);
-                       } elseif ($value !== utf8_decode($value) && $value !== utf8_encode($value)) {
-                               // string is not within utf-8(?)
-                               // normalize to ASCII (lowest common encoding) - information will be lost
-                               import('core.Transcoder');
-                               $trans =& new Transcoder('UTF-8', 'ASCII');
-                               $value = $trans->trans($value);
                $this->_data[$key] = $value;

I believe that any non-ASCII input would trigger the elseif clause, which mangles the data into ASCII. I don't understand how the CP1252 case is ever encountered, since the charset choices during installation are ISO-8859-1 and UTF-8. Perhaps the if-clause should be removed as well.

I have tested this patch -- it works whether config.inc.php contains
Code: Select all
client_charset = iso-8859-1
connection_charset = Off
database_charset = Off

Code: Select all
client_charset = utf-8
connection_charset = utf8
database_charset = utf8
Posts: 16
Joined: Wed Oct 10, 2007 12:45 am
Location: University of British Columbia

Re: Bug: Forms fail to preserve Unicode data

Postby asmecher » Wed Oct 10, 2007 9:03 am

Hi derekp,

See http://pkp.sfu.ca/support/forum/viewtopic.php?f=3&t=1909; there is some buggy character set normalization code in OCS 2.0.0, but you can disable it easily by commenting out several lines of code. This will be fixed in the next release.

Alec Smecher
Public Knowledge Project Team
Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm

Re: Bug: Forms fail to preserve Unicode data

Postby derekp » Thu Mar 20, 2008 12:04 pm

A different patch is now available.
Posts: 16
Joined: Wed Oct 10, 2007 12:45 am
Location: University of British Columbia

Return to OCS Technical Support

Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest