OJS OCS OMP OHS

You are viewing the PKP Support Forum | PKP Home Wiki



Converting from ISO to UTF8

OJS development discussion, enhancement requests, third-party patches and plug-ins.

Moderators: jmacgreg, btbell, michael, bdgregg, barbarah, asmecher

Forum rules
Developer Resources:

Documentation: The OJS Technical Reference and the OJS API Reference are both available from the OJS Documentation page.

Git: You can access our public Git Repository here. Comprehensive Git usage instructions are available on the wiki.

Bugzilla: You can access our Bugzilla report tracker here.

Search: You can use our Google Custom Search to search across our main website, the support forum, and Bugzilla.

Questions and discussion are welcome, but if you have a workflow or usability question you should probably post to the OJS Editorial Support and Discussion subforum; if you have a technical support question, try the OJS Technical Support subforum.

Converting from ISO to UTF8

Postby christo » Thu Jan 24, 2008 7:02 am

We have been dealing with converting out data from ISO-8859 to UTF8. This can be quite a mission. Here is a short list of what we have done in the hope that it might help someone else. Note, you will need to change these commands to correspond to your specifics. This is just a guideline

1. create a mysqldump of your database
1. mysqldump -K -r/path/to/olddatabase.sql -uusername -ppassword databaseName
2. (perhaps use --quick to prevent it from doing batch inserts - this might resolve the max_packet received error )
2. run iconv over your SQL file to convert it from ISO to UTF8
1. iconv -f ISO-8859-1 -t utf-8 < olddatabase.sql > newdatabase.sql
3. you now need to open newdatabase.sql:
1. find: DEFAULT CHARSET=latin1
2. replace with: DEFAULT CHARSET=utf8
4. Create a database - make sure it uses utf8 encoding
1. create database dbname DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci
2. use this database (\u dbname)
5. run the sql file to install the database
1. \. newdatabase.sql

..hopefully this helps anyone who is having this same problem!
christo
 
Posts: 22
Joined: Mon Jan 09, 2006 12:04 am
Location: Grahamstown, South Africa

Re: Converting from ISO to UTF8

Postby mj » Thu Jan 24, 2008 10:41 am

Hi Christo,

Thanks very much for such a detailed and helpful way to convert the database encoding from ISO-8859-1 (Latin-1) to UTF-8. For those intending to use this approach, also note that you may be able (or have) to use the mb_convert_encoding() function in lieu of the iconv() function on your SQL file, depending on your PHP platform and which libraries you have installed.

As well, OJS 2.2 now contains the charset_normalization parameter in config.inc.php to automatically convert all strings to UTF-8 when they are entered into OJS and exported to XML; however, this doesn't help much in the case where the database itself is encoded in Latin-1 or another non-UTF-8 encoding. Your instructions below provide a very nice complimentary resolution for that problem to help journals move to UTF-8 throughout. Thanks again!
mj
Site Admin
 
Posts: 304
Joined: Fri Mar 26, 2004 9:32 am
Location: Toronto, Canada

Re: Converting from ISO to UTF8

Postby ramon » Fri Oct 14, 2011 2:09 pm

Hello all,

Searching the forum I bumped into this topic.
However, my situation is a bit more complicated.

I'm trying to assist an institution upgrade their installation to OJS 2.3.6.
While checking their current status, their page charset is being returned as UTF-8. I'm not sure if it's my browser or something else, as I did not find any Apache or PHP setting defining the default_charset to anything. However, their database is latin1, as well as all table collations.
When dumping the database the characters are not being rendered correctly (PUTTY translation is set to UTF-8) and inserting it into another dbase, with UTF-8 as default, I cannot see the characters correctly displayed, even if I try to change the collation of the tables.

I don't think this solution will help me, as the characters set seems to be mixed.
How do I fix this?
ramon
 
Posts: 923
Joined: Wed Oct 15, 2003 6:15 am
Location: Brasí­lia/DF - Brasil

Re: Converting from ISO to UTF8

Postby asmecher » Fri Oct 14, 2011 2:19 pm

Hi Ramon,

Mixed character sets are a really tough situation and unfortunately I don't have a comprehensive solution for the problem -- it depends on how they're mixed and that's hard to determine, never mind rectify.

I find the best solution is to remove as many complicating layers as possible. Working through a terminal, it's difficult to be sure that what you're seeing is an accurate reflection of what's happening on the server or whether the terminal emulator is getting involved.

I would suggest working on your local machine with a database dump. (Try using the "--default-character-set utf8" option to mysqldump; otherwise MySQL may decide to dump UTF8 data in latin1 regardless of your particular database or table's character set configuration.) You can either process the file locally and load the dump back onto the remote machine, or you can script up a set of SQL statements locally to correct the data and then run them against the remote server when you're ready.

Regards,
Alec Smecher
Public Knowledge Project Team
asmecher
 
Posts: 7710
Joined: Wed Aug 10, 2005 12:56 pm


Return to OJS Development

Who is online

Users browsing this forum: No registered users and 4 guests