Bug: Forms fail to preserve Unicode data

Are you responsible for making OCS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, michael, John

Forum rules
The Public Knowledge Project Support Forum is moving to http://forum.pkp.sfu.ca

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.
derekp
Posts: 16
Joined: Wed Oct 10, 2007 12:45 am
Location: University of British Columbia
Contact:

Bug: Forms fail to preserve Unicode data

Postby derekp » Wed Oct 10, 2007 2:44 am

OCS 2.0.0 fails to preserve Unicode input to forms, even when the database is properly configured to use UTF-8. For instance, if you create a user whose last name contains the ö (o umlaut) character, it is stored as an 'o' without the umlaut.

I'm using PHP 5.2.0.

The following patch fixes the problem:

Code: Select all

--- ocs-2.0.0-1/classes/form/Form.inc.php.unicode       2007-05-10 20:20:45.000000000 -0700
+++ ocs-2.0.0-1/classes/form/Form.inc.php       2007-10-10 02:05:04.675730000 -0700
@@ -97,15 +97,8 @@
                                // utf8_decode to work in latin-1 (information may be lost)
                                import('core.Transcoder');
                                $trans =& new Transcoder('CP1252', 'UTF-8');
                                $value = $trans->trans($value);
-
-                       } elseif ($value !== utf8_decode($value) && $value !== utf8_encode($value)) {
-                               // string is not within utf-8(?)
-                               // normalize to ASCII (lowest common encoding) - information will be lost
-                               import('core.Transcoder');
-                               $trans =& new Transcoder('UTF-8', 'ASCII');
-                               $value = $trans->trans($value);
                        }
                }
                $this->_data[$key] = $value;
        }


I believe that any non-ASCII input would trigger the elseif clause, which mangles the data into ASCII. I don't understand how the CP1252 case is ever encountered, since the charset choices during installation are ISO-8859-1 and UTF-8. Perhaps the if-clause should be removed as well.

I have tested this patch -- it works whether config.inc.php contains

Code: Select all

[i18n]
client_charset = iso-8859-1
connection_charset = Off
database_charset = Off

or

Code: Select all

[i18n]
client_charset = utf-8
connection_charset = utf8
database_charset = utf8

asmecher
Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm
Contact:

Re: Bug: Forms fail to preserve Unicode data

Postby asmecher » Wed Oct 10, 2007 9:03 am

Hi derekp,

See http://pkp.sfu.ca/support/forum/viewtopic.php?f=3&t=1909; there is some buggy character set normalization code in OCS 2.0.0, but you can disable it easily by commenting out several lines of code. This will be fixed in the next release.

Regards,
Alec Smecher
Public Knowledge Project Team

derekp
Posts: 16
Joined: Wed Oct 10, 2007 12:45 am
Location: University of British Columbia
Contact:

Re: Bug: Forms fail to preserve Unicode data

Postby derekp » Thu Mar 20, 2008 12:04 pm

A different patch is now available.


Return to “OCS Technical Support”

Who is online

Users browsing this forum: No registered users and 1 guest