Forums / Install & configuration / Help please on database utf-8 encoding

Help please on database utf-8 encoding

Author Message

Jorge estévez

Thursday 29 November 2007 6:56:26 pm

One of my sites under development is running almost right, but once hosted some texts do not appear, so digging up some information here and there I found out that some ez-users are struggling to transfer their database as UTF-8, others to iso-8859-2 encoding and latin1.

I suppose I should also struggle too as my site is in English and Spanish (some texts do not show in the Spanish version) so I suppose my encoding schema is not right at all.

My database is in UTF-8

Oops… got it wrong!!!!

Please can someone tell me what encoding should I use and changes I should make (and how to do them) to get my site running, or is there a workaround to solve this issue?
Are there rules to follow before installing an ez site regarding the encoding issue?

Pages do not show texts (eg. á, é, í ó ú)

Diseño Web Cuba
Web Design Cuba
www.elfosdesign.com

Jorge estévez

Friday 30 November 2007 3:02:28 am

Is there anybody out there?
please tel me jus ta a hint!

thanks

Diseño Web Cuba
Web Design Cuba
www.elfosdesign.com

Jorge estévez

Friday 30 November 2007 3:09:09 am

Hello,

Found this:

http://ez.no/doc/ez_publish/technical_manual/3_8/reference/configuration_files/site_ini/mailsettings/allowedcharsets

could this solve the texts problems?

Diseño Web Cuba
Web Design Cuba
www.elfosdesign.com

Steven E. Bailey

Friday 30 November 2007 3:45:13 am

Have you tried iconv? - http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html

However, this assumes that all of your data was entered correctly and has a utf-8 encoding indicated in the xml.

If you don't and you have a mixture of different encodings, this will not work because you'll just end up messing up other data.

The only way that I've been able to solve this problem (data entered/copied with different encodings to the database) is by editing the dumped database using a combination of sed scripts and the replace command of notepad++ or textedit to convert the incorrectly encoded characters to utf-8 and to convert the xml headers to utf-8.

If someone has an easier way of doing this I'd like to hear it.

Certified eZPublish developer
http://ez.no/certification/verify/396111

Available for ezpublish troubleshooting, hosting and custom extension development: http://www.leidentech.com

Gaetano Giunta

Friday 30 November 2007 4:05:49 am

"My database is in UTF-8

Oops… got it wrong!!!!"

In fact both english and spanish can correctly be represented in UTF8 AND ISO-8859-1 charsets, so you are not forced to pick either one.

If you plan to move to eZP 4 later, going for UTF8 for the db is surely the best choice.

Just take care that:
- if you are going to deploy on a hosted server, the production db uses the same charset as the dev one
- all data you enter in "text files" in eZ has the correct encoding or is tagged as such, eg. ini files, translation files, templates, etc...

Principal Consultant International Business
Member of the Community Project Board

Jorge estévez

Friday 30 November 2007 7:23:13 am

Hello again,

What a mess! Please help me out of this!

I am not planning to update to a higher version of ez, I will still be using version 3.8 .
My hosting service at www.siteground.com has a tutorial that encourages the client when building the database for ezpublish to “make it” UTF-8, so it seem right to me as I also had my database as such an schema so compatibility should behave properly.

I recently rebuilt my site from zero with a new 3.8 ezsite in a week or so.

You wrote “all data you enter in "text files" in eZ has the correct encoding or is tagged as such, e.g. ini files, translation files, templates, etc...”

And that has had me investigate further my whole site:

I have my ini files that has <?php /* #?ini charset="iso-8859-1"? at the beginning of the files… I must have copied them without noticing, Must I change them?

---------

I also I found the <?php /* #?ini charset="iso-8859-1"? at the beginning of the .INI files of the extension “Updatecache”. Do I have to change it?

---------

Found also at “ezdhtml extension”
\extension\ezdhtml\design\standard\templates\content\datatype\edit\ ezxmltext_ezdhtml at line 90:

    {run-once}
    <script type="text/javascript" src={"javascripts/ezdhtml/ezeditor.js"|ezdesign} charset="iso-8859-1"></script>
    <link rel="stylesheet" type="text/css" href={"stylesheets/ezdhtml/toolbar.css"|ezdesign}>
    {/run-once}

And also at line 106

---------
Now while rebuilding the site I exported all the information of the nodes from an earlier site, I extracted the data from the database as a .SQL script file to upload later to the new site, the previous site installation was in iso-8859-1. I just when over the .SQL to trace the iso-8859-1 text and found 11 instances at ezcontentobject_attribute table.

Some lines of the occurrence shows:

'<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<section xmlns:image=\"http://ez...... etc. etc.

And other

'<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<section xmlns:image=\"http://ez.no/nam.... etc. etc.

Is this wrong, what should I do to correct the errors…

Thanks so much!!

Diseño Web Cuba
Web Design Cuba
www.elfosdesign.com

Jorge estévez

Saturday 01 December 2007 4:42:40 pm

Hi,

Regarding the database, all tables end up like the following:

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Diseño Web Cuba
Web Design Cuba
www.elfosdesign.com

Gaetano Giunta

Sunday 02 December 2007 4:51:36 am

Sorry, I think I was a bit too all-encompassing in my previous message.

The first thing to understand is that much of the configuration depends really on how those files are edited and saved: as utf8 or as iso or as cp1252 (if you eg. edit them on windows then upload them to prod via ftp...)?
As long as the files have a way to indicate their internal charset, and that indication is correct, you should not have any problems, since eZ Publish will recognize the encoding declarations and do the needed conversion for you.

<i>I have my ini files that has <?php /* #?ini charset="iso-8859-1"? at the beginning of the files… I must have copied them without noticing, Must I change them?</i>
Not unless you have some non-ascii char inside them (which is a unusual anyway, unless you really like native-language-names for your folders and files), AND you have saved those files in utf8.

The same applies for the updatecache extension.

<i><script type="text/javascript" src={"javascripts/ezdhtml/ezeditor.js"|ezdesign} charset="iso-8859-1"></script></i>
This you should not imho change, unless you have edited the file ezeditor.js and saved it as utf8

For the database, it is a bit trickier.

If there is some iso-encoded xml blocks inside, you should definitely not only change the xml encoding declaration, but also make sure that the entire xml block is properly converted.
The caveat is that you cannot simply run a charset conversion script on your sql dump, because there are some fields where ez stores data in php-serialized format. And those are the nastier ones to fix: basically if you run a plain charset conversion and the php serialized data contains non-ascii chars, the data will not be unserialized back anymore (the string lenght tags will be wrong!)

Hope it helps
Gaetano

Principal Consultant International Business
Member of the Community Project Board

Jorge estévez

Sunday 02 December 2007 6:24:27 am

Hi Gaetano,

I have changed my .INI headers to <?php /* #?ini charset="utf-8"?, uploaded it, cleared the caches and still got those truncated texts, It seems that my database has to be changed, the curious thing is that if I edit the information (texts) using the admin interface (words with áéíóú and so on) and write the back again they display without problems in my public view of the site, so I guess one way of fixing up everything could to edit all nodes affected and rewrite them again.

Anyway it would be nice (easier) if I could just change all accented vowels words to not accented one and make that a starting point, any way I could export the database, change those words and upload it again, or any workaround of any kind?

Any other hint?

Thanks, jorge

Diseño Web Cuba
Web Design Cuba
www.elfosdesign.com