Convert from iso-8859-1 encoding to utf-8

Author Message

Moderated by: Nicolas Pastorino

Thursday 18 May 2006 6:45:30 am

Hi,

I have searched the forums and documentation on how to convert my database from iso-8859-1 encoding to utf-8 without luck.

Can someone give me some pointers to where such info is located?

The site is running eZ Publish 3.8 on Apache 2.0.55, PHP 4.4.2 and MySQL 4.1.19

Kristof Coomans

Thursday 18 May 2006 8:46:59 am

Hi Hans-Henry

You probably know about the script I posted in the forum ( http://ez.no/community/forum/install_configuration/update_to_3_8_and_codepage_problems ).

You can save it into bin/php/dbconvertutf8.php and then run it like any other CLI script that comes with eZ publish. Of course run it on a test installation first, do not immediately use it on your live portal(s) ;-)

Then in i18n.ini.append, configure the right charset:

[CharacterSettings]
Charset=utf8

And make sure that in site.ini.append, the Charset setting under the DatabaseSettings group is left empty (then the charset from i18n will be used):

[DatabaseSettings]
Charset=

Let us know if it worked.

Good luck!

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Thursday 18 May 2006 11:12:05 pm

Hi,

I had similar problems as Alexandre Cunha had in the post you suggested.
The Norwegian letters Æ, Ø and Å were similar to ç, Ã, ó so I had to revert to a backup version of the database. I had cleared cache and debug reported SET NAMES 'utf8'.

Any suggestions how to fix this?

Kristof Coomans

Thursday 18 May 2006 11:43:59 pm

First revert the charset configuration. Then run the script. And finally set the charset back to utf-8.

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

george kastritseas

Tuesday 06 June 2006 12:28:24 am

hallo,

how can I write in Greek ( Greece ) letters?

Kristof Coomans

Tuesday 06 June 2006 1:42:09 am

Hello George

Welcome to the eZ community!

What exactly do you mean with "writing in Greece"? Do you want the user to be able to input Greece characters into content attributes? Please start another topic then and try to explain clearly what you want to achieve. This topic is about converting an existing eZ publish database from iso-8859-1 to utf-8.

Thank you!

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

Matthew Carroll

Friday 23 June 2006 12:32:26 pm

Hi Hans-Henry

If you are still having difficulties with this, you might like to try this sql which if applied to a 3.8 site does the entire conversion. It's rather nice in that it does the whole thing with sql. I've run this on a number of installs now to convert to utf-8 and it's worked for me. After some testing, I just ran it on a 4-language site and it has converted everything cleanly. I think it should be easily adaptable to 3.7, just make sure all the tables are listed, although I haven't tried it.

Of course, make sure you keep backups!

ALTER TABLE ezapprove_items CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezbasket CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezbinaryfile CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_group CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_item CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_item_group_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_item_message_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_item_participant_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_item_status CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_notification_rule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_profile CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcollab_simple_message CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentbrowsebookmark CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentbrowserecent CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentclass CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentclassgroup CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentclass_attribute CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentclass_classgroup CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject_attribute CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject_tree CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontentobject_version CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcontent_language CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezcurrencydata CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezdiscountrule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezdiscountsubrule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezdiscountsubrule_value CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezenumobjectvalue CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezenumvalue CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezforgot_password CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezgeneral_digest_user_settings CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezimage CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezimagefile CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezimagevariation CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezinfocollection CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezinfocollection_attribute CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezkeyword CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezkeyword_attribute_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezmedia CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezmessage CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezmodule_run CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezmultipricedata CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eznode_assignment CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eznotificationcollection CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eznotificationcollection_item CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eznotificationevent CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezoperation_memento CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezorder CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezorder_item CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezorder_status CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezorder_status_history CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpackage CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpaymentobject CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpdf_export CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpending_actions CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpolicy CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpolicy_limitation CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpolicy_limitation_value CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezpreferences CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezproductcategory CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezproductcollection CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezproductcollection_item CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezproductcollection_item_opt CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezrole CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezrss_export CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezrss_export_item CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezrss_import CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsearch_object_word_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsearch_return_count CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsearch_search_phrase CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsearch_word CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsection CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsession CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsite_data CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezsubtree_notification_rule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eztipafriend_counter CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eztipafriend_request CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE eztrigger CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezurl CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezurlalias CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezurl_object_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuser CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuservisit CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuser_accountkey CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuser_discountrule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuser_role CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezuser_setting CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezvatrule CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezvatrule_product_category CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezvattype CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezview_counter CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezwaituntildatevalue CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezwishlist CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow_assign CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow_event CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow_group CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow_group_link CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE ezworkflow_process CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
UPDATE ezcontentobject_attribute SET data_text = REPLACE(data_text, 'xml version="1.0" encoding="UTF-8"','xml version="1.0" encoding="UTF-8"');

To use:

1. Copy above code into a file e.g. 'ezutf8.sql' on your server
2. run the command:

mysql -u USERNAME -p DATABASENAME < ezutf8.sql

(replace USERNAME and DATABASE with your database connection settings - you will be prompted for the password)

3. Change your ini settings as described above.

4. Clear all caches.

Alternatively, you could paste the sql into phpmyadmin, although for large operations I prefer to use the command line.

Good luck
Matthew

http://carroll.org.uk

zurgutt -

Thursday 28 December 2006 8:01:43 pm

I have successfully used rather crude approach to convert from latin1 to utf-8. Has worked for me though.
Basicly:

1. make a database dump with mysqldump
2. use iconv to convert the dump
3. use sed (or some other replacement tool) to convert table creation passages in it to charset utf8
4. use sed to change back double-converted characters. This is necessary because ezp (3.6 at least) apparently saves fields in mixed charsets depending on browser settings of user, so even if site is set up to latin, database often also contains utf chars.
5. import the dump.

Here is a portion of a longer script i use to semiautomatically convert 3.6.2 sites to 3.8.3 :

iconv -f LATIN1 -t UTF-8 $SITENAME.sql > $SITENAME.sql.iconv
sed -e 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8 COLLATE=utf8_bin/g' \
    -e 's/encoding=\\"iso-8859-1\\"/encoding=\\"UTF-8\\"/g' \
    -e 's/\xC3\x83\xC2\(.\)/\xC3\1/g' \
    -e 's/\xC3\x82\xC2\(.\)/\xC2\1/g' \
    -e 's/\xc3\xa2\xc2\x80\xc2\(.\)/\xe2\x80\1/g' \
    $SITENAME.sql.iconv > $SITENAME.sql.iconv.utf8

Certified eZ developer looking for projects.
zurgutt at gg.ee

Christophe Berthelé

Friday 16 February 2007 2:51:04 pm

Hi,

with serialized fields recently introduced in ezp, you also have to take care of the length of serialized strings when you translate from latin1 to utf8 and vice versa. For example, the following serialized field is false after translating from utf8 to latin1 because the length of Abrégé is 6 and not 8 anymore :
a:2:{s:6:"fre-FR";s:8:"Abrégé";s:16:"always-available";s:6:"fre-FR";}
The correct one is
a:2:{s:6:"fre-FR";s:6:"Abrégé";s:16:"always-available";s:6:"fre-FR";}

For me, that was the reason why, after migration, some attributes name had vanished when displaying objects.

The fields concerned by that issue are at least, serialized_name_list in ezcontentclass_attribute and ezcontentclass tables.

Cheers
Berthy

zurgutt -

Sunday 18 February 2007 4:59:48 pm

oh, crap :(

This would be impossible to fix in some dump conversion script.. and massaging the dump seems to be only conceivable way for total charset-upgrade that i can see.. so one just has to light lots of black candles, sprinkle around goat blood and pray that someone has not used accented chars in any names :P

Certified eZ developer looking for projects.
zurgutt at gg.ee

Xavier Dutoit

Monday 19 February 2007 2:20:01 am

Hi zurgutt,

Could you share with us the script you mention or is that too specific ?

X+

http://www.sydesy.com

Christophe Berthelé

Monday 19 February 2007 3:27:40 am

Hi guys,

you'll find here http://cpjc.free.fr/index.php?ezpublish what I did to migrate FROM utf-8 TO iso-8859-1. I know the topic is the contrary but that may provide guidance, well I hope :-)

Cheers
Berthy

zurgutt -

Monday 19 February 2007 3:42:46 pm

Xavier, look a couple of posts up, the bit with iconv and sed, thats about it.

If you mean the full 3.6 to 3.8 script then rest of it is quite setup specific - creates directory for new site, symlinks as much as possible from central location (kernel, base styles etc), copies necessary dirs from 3.6 (see upgrading instructions), runs the database charset conversion and import, then runs official upgrade scripts. If anyone wants it, ask me on #ezpublish.

zrg

Certified eZ developer looking for projects.
zurgutt at gg.ee

Jonny Bergkvist

Tuesday 20 February 2007 12:39:13 am

Hi,

I've upgraded from 3.6-site with iso-8859-1 on apache 1.3 and mysql 4, to 3.8-site with utf-8 on apache 2.2 and mysql 5.

I made a documentation, of what I did, and scripts I used to do the charset conversion etc.

Click and read :-)
http://home.hit.no/~jonnybe/eZ/docs/upgrading-to-3.8-experiences.pdf

-Jonny

Xavier Dutoit

Tuesday 20 February 2007 3:28:01 am

Many thanks jonny.

May I suggest you to create a page about it on http://www.ezpedia.org ?

X+

http://www.sydesy.com

Kristof Coomans

Friday 30 November 2007 5:57:20 am

Hi Jonny

I discovered there's some unneeded SQL in your tutorial:

UPDATE ezcontentobject_name SET name = REPLACE(name, 'xml version="1.0" encoding="iso-8859-1"','xml version="1.0" encoding="UTF-8"');

The name column of the ezcontentobject_name table does not contain any serialized xml.

independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org

laurent le cadet

Tuesday 18 December 2007 8:09:07 am

---

Shawntat Shawntat

Tuesday 24 March 2009 10:51:35 pm

Hi all,
I m a newbie here thought of joining this forum because find it nice, this forum interest me I personally liked it.

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.