iso-8859-1 to UTF-8 conversion.

Author Message

laurent le cadet

Tuesday 23 August 2005 2:05:09 am

Hi,

Here's one more message about db charset conversion but what I read previously poses me many questions ( http://ez.no/products/ez_publish_cms/documentation/configuration/configuration/language_and_charset/unicode_with_ez_publish ).

I'm using eZp 3.5.2 revision 10972 with PHP 4.3.8 / MySQL 3.23.58, DB internal charset iso-8859-1.

Actually there is a lot of content on the site (fre-FR + en-GB) and I want to add Chinese and Japanese.

What I understand is to change the Charset to UTF-8.

The previous threads about thid refered to mysql version problem or many different things to do...

Actually, is there any step by step doc to perform this ?

Regards.

Laurent

Georg Franz

Tuesday 23 August 2005 6:27:51 am

Hi Laurent,

1st of all, you need a newer mysql, (the version should be greater or equal than 4.1.11).

Then you need to change the charset of the ez tables. I've rewritten a small script for that purpous which I found at the forums of mysql:

<?php
// put in your username, password
$conn = mysql_connect("localhost", "root", "mypassword");

//change this to false to alter on the fly
$printonly=true; 

$charset="utf8";
$collate="utf8_general_ci";

$altertablecharset=true;
$alterdatabasecharser=true;

// put here your databases ...
$currentDBArray = array();
$currentDBArray[] = "mydb";


function PMA_getDbCollation($db)
{
	$sq='SHOW CREATE DATABASE `'.$db.'`;';
	$res = mysql_query($sq);
	if(!$res) echo "\n\n".$sq."\n".mysql_error()."\n\n"; else
	if($row = mysql_fetch_assoc($res))
	{
		$tokenized = explode(' ', $row[1]);
		unset($row, $res, $sql_query);
		for ($i = 1; $i + 3 < count($tokenized); $i++)
		{
			if ($tokenized[$i] == 'DEFAULT' && $tokenized[$i + 1] == 'CHARACTER' && $tokenized[$i + 2] == 'SET')
			{
				if (isset($tokenized[$i + 5]) && $tokenized[$i + 4] == 'COLLATE')
				{
					 return array($tokenized [$i + 3],$tokenized[$i + 5]); // We found the collation!
				}
				else
				{
					return array($tokenized [$i + 3]);
				}
			}
		} 
	}
	return '';
}

$rs2 = mysql_query("SHOW DATABASES"); 
if(!$rs2)
	echo "\n\n".$sq."\n".mysql_error()."\n\n";
else
	while ($data2 = mysql_fetch_row($rs2))
	{
		$db=$data2[0];
		$db_cha=PMA_getDbCollation($db);
		if ( in_array ( $db, $currentDBArray ) )
			if ( substr($db_cha[0],0,4)!='utf8' ) // limit to charset
			{
				mysql_select_db($db);
				$rs = mysql_query("SHOW TABLES"); 
				if(!$rs)
					echo "\n\n".$sq."\n".mysql_error()."\n\n";
				else
					while ($data = mysql_fetch_row($rs))
					{
						if ( substr ( $data[0], 0,2 ) == "ez" )
						{
							$rs1 = mysql_query("show FULL columns from $data[0]");
							
							if(!$rs1)
								echo "\n\n".$sq."\n".mysql_error()."\n\n";
							else
								while ($data1 = mysql_fetch_assoc($rs1))
								{
									if(in_array(array_shift(split("\\(",$data1['Type'],2)),array(
																				//'national char',
																				//'nchar',
																				//'national varchar',
																				//'nvarchar',
																				'char',
																				'varchar',
																				'tinytext',
																				'text',
																				'mediumtext',
																				'longtext',
																				'enum',
																				'set'
																				  ))) 
									 {
										if(substr($data1['Collation'],0,4)!='utf8') // limit to charset
										{
											$sq="ALTER TABLE `$data[0]` CHANGE `".$data1['Field'].'` `'.$data1['Field'].'` '.$data1['Type'].' CHARACTER SET binary '.($data1['Default']==''?'':($data1['Default']=='NULL'?' DEFAULT NULL':' DEFAULT \''.mysql_escape_string($data1['Default']).'\'')).($data1['Null']=='YES'?' NULL ':' NOT NULL').';';
											if(!$printonly&&!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n"; 
											else
											{
												echo ($sq."\n") ; 
												$sq="ALTER TABLE `$data[0]` CHANGE `".$data1['Field'].'` `'.$data1['Field'].'` '.$data1['Type']." CHARACTER SET $charset ".($collate==''?'':"COLLATE $collate").($data1['Default']==''?'':($data1['Default']=='NULL'?' DEFAULT NULL':' DEFAULT \''.mysql_escape_string($data1['Default']).'\'')).($data1['Null']=='YES'?' NULL ':' NOT NULL').($data1['Comment']==''?'':' COMMENT \''.mysql_escape_string($data1['Comment']).'\'').';';
												if(!$printonly&&!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n"; 
												else echo ($sq."\n") ; 
											}
										}
									}
								}
								if($altertablecharset)
								{
									/*
									  $sq='ALTER TABLE `'.$data[0]."` DEFAULT CHARACTER SET binary";
									  echo ($sq."\n") ; 
									  if(!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n";
									*/
									$sq='ALTER TABLE `'.$data[0]."` DEFAULT CHARACTER SET $charset ".($collate==''?'':"COLLATE $collate");
									echo ($sq."\n") ; 
									if(!$printonly)
										if(!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n";
								}
						}
						else
							echo $data[0] . " nicht geƤndert.\n";
						if( $alterdatabasecharser )
						{
						  /*
						  $sq='ALTER DATABASE `'.$data2[0]."` DEFAULT CHARACTER SET binary";
						  echo ($sq."\n") ; 
						  if(!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n";
						  */ 
						  $sq='ALTER DATABASE `'.$data2[0]."` DEFAULT CHARACTER SET $charset ".($collate==''?'':"COLLATE $collate");
						  echo ($sq."\n") ; 
							if(!$printonly)
								if(!mysql_query($sq)) echo "\n\n".$sq."\n".mysql_error()."\n\n";
						}
					}
				}
			}
?>

Then you need to change the ini-settings of ezpublish.

-> site.ini.append: charset at db entry
-> i18n.ini.append: charset-setting

After that, don't forget to clear the ezpublish cache completly.

HTH.

Best wishes,
Georg.

--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004

laurent le cadet

Tuesday 23 August 2005 8:42:30 am

Hi Georg,

Thanks for your repply.
First step : upgrade mysql...

After that, no risks for contents ?
Is the content of each table is re-encode ? (no need ?)

About the script, I presume I just have to launch it one time from the root for the site (for example) ?

Regards

Laurent.

Georg Franz

Tuesday 23 August 2005 9:56:46 am

Hi Laurent,

backup - backup - backup ... of course :-))

The script converts the tables first to a binary format and then to utf8, so no data should be lost.

The script simply produces sql strings for the conversion. If you run it the first time and the var $printonly is set to true (at the begin of the script), only the sql strings are written to the screen, nothing else happen.

If you really want to do the conversion, set $printonly to false.

Best wishes,
Georg.

--
http://www.schicksal.com Horoskop website which uses eZ Publish since 2004

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.