OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Bug: Forms fail to preserve Unicode data

Are you responsible for making OCS work -- installing, upgrading, migrating or troubleshooting? Do you think you've found a bug? Post in this forum.

Moderators: jmacgreg, michael, John

Forum rules
The Public Knowledge Project Support Forum is moving to http://forum.pkp.sfu.ca

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.

Bug: Forms fail to preserve Unicode data

Postby derekp » Wed Oct 10, 2007 2:44 am

OCS 2.0.0 fails to preserve Unicode input to forms, even when the database is properly configured to use UTF-8. For instance, if you create a user whose last name contains the ö (o umlaut) character, it is stored as an 'o' without the umlaut.

I'm using PHP 5.2.0.

The following patch fixes the problem:
Code: Select all
--- ocs-2.0.0-1/classes/form/Form.inc.php.unicode       2007-05-10 20:20:45.000000000 -0700
+++ ocs-2.0.0-1/classes/form/Form.inc.php       2007-10-10 02:05:04.675730000 -0700
@@ -97,15 +97,8 @@
                                // utf8_decode to work in latin-1 (information may be lost)
                                import('core.Transcoder');
                                $trans =& new Transcoder('CP1252', 'UTF-8');
                                $value = $trans->trans($value);
-
-                       } elseif ($value !== utf8_decode($value) && $value !== utf8_encode($value)) {
-                               // string is not within utf-8(?)
-                               // normalize to ASCII (lowest common encoding) - information will be lost
-                               import('core.Transcoder');
-                               $trans =& new Transcoder('UTF-8', 'ASCII');
-                               $value = $trans->trans($value);
                        }
                }
                $this->_data[$key] = $value;
        }


I believe that any non-ASCII input would trigger the elseif clause, which mangles the data into ASCII. I don't understand how the CP1252 case is ever encountered, since the charset choices during installation are ISO-8859-1 and UTF-8. Perhaps the if-clause should be removed as well.

I have tested this patch -- it works whether config.inc.php contains
Code: Select all
[i18n]
client_charset = iso-8859-1
connection_charset = Off
database_charset = Off

or
Code: Select all
[i18n]
client_charset = utf-8
connection_charset = utf8
database_charset = utf8
derekp
 
Posts: 16
Joined: Wed Oct 10, 2007 12:45 am
Location: University of British Columbia

Re: Bug: Forms fail to preserve Unicode data

Postby asmecher » Wed Oct 10, 2007 9:03 am

Hi derekp,

See http://pkp.sfu.ca/support/forum/viewtopic.php?f=3&t=1909; there is some buggy character set normalization code in OCS 2.0.0, but you can disable it easily by commenting out several lines of code. This will be fixed in the next release.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm

Re: Bug: Forms fail to preserve Unicode data

Postby derekp » Thu Mar 20, 2008 12:04 pm

A different patch is now available.
derekp
 
Posts: 16
Joined: Wed Oct 10, 2007 12:45 am
Location: University of British Columbia


Return to OCS Technical Support

Who is online

Users browsing this forum: No registered users and 2 guests