Forums / Developer / MySQL: convert utf8_bin to utf8_general_ci

MySQL: convert utf8_bin to utf8_general_ci

Author Message

Piotrek Karaś

Thursday 12 June 2008 9:00:01 pm

Hi,

I've just discovered that, probably by mistake, one of our eZP databases was encoded with utf8_bin. The site works fine, but there's little I can do with the DB. Does anyone know how to convert that DB back to the utf8_general_ci?

Thanks,
Piotrek

--
Company: mediaSELF Sp. z o.o., http://www.mediaself.pl
eZ references: http://ez.no/partners/worldwide_partners/mediaself
eZ certified developer: http://ez.no/certification/verify/272585
eZ blog: http://ez.ryba.eu

Bartek Modzelewski

Friday 13 June 2008 3:04:27 am

Hi Piotrek,

Just try with simple MySQL query:

ALTER DATABASE database_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

You can do it also in phpMyAdmin.

Cheers
Bartek

Baobaz
http://www.baobaz.com

Piotrek Karaś

Friday 13 June 2008 6:19:28 am

Hey Bartek,

I was afraid that binary encoded fields aren't handled the same way. Thanks for the tip anyway! I had to use slightly different method, meaning CONVERT TO CHARACTER SET, which will actually convert values and apply that to all the text fields in a table.

Thanks,
Piotrek

--
Company: mediaSELF Sp. z o.o., http://www.mediaself.pl
eZ references: http://ez.no/partners/worldwide_partners/mediaself
eZ certified developer: http://ez.no/certification/verify/272585
eZ blog: http://ez.ryba.eu

Bruce Morrison

Friday 13 June 2008 5:22:07 pm

Hi Piotrek

There are two things in play here the Character Set and the Collation. I'll assume you know about character sets. A Collation is <i>a set of rules determining how to compare and sort character data. Collation properties include things like case sensitivity, accent sensitivity, trailing spaces sensitivity, and others. In MySQL, every CHAR, VARCHAR, TEXT column have some collation assumption.</i> The main effect of collations is how the ORDER BY sorts data.

utf8_bin and utf8_general_ci are both collations associated with the utf8 character set. The SQL posted by Bartek should have been adequate as all you needed to do was change the collation.

Hope this clears things up!

Cheers
Bruce

My Blog: http://www.stuffandcontent.com/
Follow me on twitter: http://twitter.com/brucemorrison
Consolidated eZ Publish Feed : http://friendfeed.com/rooms/ez-publish

Piotrek Karaś

Friday 13 June 2008 9:28:28 pm

Hi Bruce,

Yes, I do understand the distinction between character set and collation, or at least I thought I have until this problem that I described above. First, I changed the default collation at the database level, but that didn't change anything, of course. Then, I changed default collation for every db table, but unfortunately collation still had its default setting at the column level, and that one remained unchanged. I couldn't find any SQL statement to modify those column-level collations in an easy way, and that's when I found this CONVERT TO CHARACTER SET option, which - as stated in the MySQL doc - would take care of all column settings without having to explicitly address them (which in case of eZ Publish DB would be a long process or would require a little script, that I didn't have time for). So now I realize, that I haven't tried changing column's default collation, and as you and Bartek mention, that might have done the trick. I was also misled by the fact that until that convertion, every SQL dump/export included binary strings instead of plain text.

1) Is there any SQL statement to alter default collation "recursively"? I mean, at all levels: database, each table, each string-like field?
2) Would convert be a risk in this case, or just an unnecessary operation?

Thanks,
Piotrek

--
Company: mediaSELF Sp. z o.o., http://www.mediaself.pl
eZ references: http://ez.no/partners/worldwide_partners/mediaself
eZ certified developer: http://ez.no/certification/verify/272585
eZ blog: http://ez.ryba.eu