You are viewing the PKP Support Forum | PKP Home Wiki

Converting from ISO to UTF8

OJS development discussion, enhancement requests, third-party patches and plug-ins.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
The Public Knowledge Project Support Forum is moving to http://forum.pkp.sfu.ca

This forum will be maintained permanently as an archived historical resource, but all new questions should be added to the new forum. Questions will no longer be monitored on this old forum after March 30, 2015.

Converting from ISO to UTF8

Postby christo » Thu Jan 24, 2008 7:02 am

We have been dealing with converting out data from ISO-8859 to UTF8. This can be quite a mission. Here is a short list of what we have done in the hope that it might help someone else. Note, you will need to change these commands to correspond to your specifics. This is just a guideline

1. create a mysqldump of your database
1. mysqldump -K -r/path/to/olddatabase.sql -uusername -ppassword databaseName
2. (perhaps use --quick to prevent it from doing batch inserts - this might resolve the max_packet received error )
2. run iconv over your SQL file to convert it from ISO to UTF8
1. iconv -f ISO-8859-1 -t utf-8 < olddatabase.sql > newdatabase.sql
3. you now need to open newdatabase.sql:
1. find: DEFAULT CHARSET=latin1
2. replace with: DEFAULT CHARSET=utf8
4. Create a database - make sure it uses utf8 encoding
1. create database dbname DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci
2. use this database (\u dbname)
5. run the sql file to install the database
1. \. newdatabase.sql

..hopefully this helps anyone who is having this same problem!
Posts: 22
Joined: Mon Jan 09, 2006 12:04 am
Location: Grahamstown, South Africa

Re: Converting from ISO to UTF8

Postby mj » Thu Jan 24, 2008 10:41 am

Hi Christo,

Thanks very much for such a detailed and helpful way to convert the database encoding from ISO-8859-1 (Latin-1) to UTF-8. For those intending to use this approach, also note that you may be able (or have) to use the mb_convert_encoding() function in lieu of the iconv() function on your SQL file, depending on your PHP platform and which libraries you have installed.

As well, OJS 2.2 now contains the charset_normalization parameter in config.inc.php to automatically convert all strings to UTF-8 when they are entered into OJS and exported to XML; however, this doesn't help much in the case where the database itself is encoded in Latin-1 or another non-UTF-8 encoding. Your instructions below provide a very nice complimentary resolution for that problem to help journals move to UTF-8 throughout. Thanks again!
Site Admin
Posts: 304
Joined: Fri Mar 26, 2004 9:32 am
Location: Toronto, Canada

Re: Converting from ISO to UTF8

Postby ramon » Fri Oct 14, 2011 2:09 pm

Hello all,

Searching the forum I bumped into this topic.
However, my situation is a bit more complicated.

I'm trying to assist an institution upgrade their installation to OJS 2.3.6.
While checking their current status, their page charset is being returned as UTF-8. I'm not sure if it's my browser or something else, as I did not find any Apache or PHP setting defining the default_charset to anything. However, their database is latin1, as well as all table collations.
When dumping the database the characters are not being rendered correctly (PUTTY translation is set to UTF-8) and inserting it into another dbase, with UTF-8 as default, I cannot see the characters correctly displayed, even if I try to change the collation of the tables.

I don't think this solution will help me, as the characters set seems to be mixed.
How do I fix this?
Posts: 945
Joined: Wed Oct 15, 2003 6:15 am
Location: Brasí­lia/DF - Brasil

Re: Converting from ISO to UTF8

Postby asmecher » Fri Oct 14, 2011 2:19 pm

Hi Ramon,

Mixed character sets are a really tough situation and unfortunately I don't have a comprehensive solution for the problem -- it depends on how they're mixed and that's hard to determine, never mind rectify.

I find the best solution is to remove as many complicating layers as possible. Working through a terminal, it's difficult to be sure that what you're seeing is an accurate reflection of what's happening on the server or whether the terminal emulator is getting involved.

I would suggest working on your local machine with a database dump. (Try using the "--default-character-set utf8" option to mysqldump; otherwise MySQL may decide to dump UTF8 data in latin1 regardless of your particular database or table's character set configuration.) You can either process the file locally and load the dump back onto the remote machine, or you can script up a set of SQL statements locally to correct the data and then run them against the remote server when you're ready.

Alec Smecher
Public Knowledge Project Team
Posts: 10015
Joined: Wed Aug 10, 2005 12:56 pm

Return to OJS Development

Who is online

Users browsing this forum: Google [Bot], Yahoo [Bot] and 2 guests