Thursday 04 May 2006 9:10:45 am
Some time ago I've written a command line script which we used to convert all our eZ databases to UTF8. It is made for running on a latin-1 (default) encoded database, so you will need to run it on the initial database. We've used it with success on 5 sites (2 rather big and 3 small ones). First it goes looking for all attributes who use utf8 encoded XML in the db. It will fetch them and reinsert their content into the db with latin-1 encoding. Then it uses the SQL command "CONVERT TO CHARACTER SET utf8" on all tables, which will convert all data to UTF8. As a final step, it again goes through all attributes looking for latin-1 encoded XML (the xml encoding attribute is not right now, since we converted everything to utf8 in the previous step). It will replace the iso-8859-1 string with utf-8 and then reinserts the data. I know the way this step is done now is not 100% safe, because it replaces every instance of "iso-8859-1" with "utf-8". But that was no issue for us.
#!/usr/bin/env php
<?php
include_once( 'lib/ezutils/classes/ezcli.php' );
include_once( 'kernel/classes/ezscript.php' );
include_once( 'lib/ezdb/classes/ezdb.php' );
$cli =& eZCLI::instance();
$script =& eZScript::instance( array( 'description' => ( "Changes your eZ publish database tables to use UTF8" ),
'use-session' => false,
'use-modules' => false,
'use-extensions' => true ) );
$script->startup();
$options = $script->getOptions();
$db =& eZDB::instance();
// first store XML saved as utf-8 back as iso-8859-1
$selectSQL = "SELECT id, version, data_text FROM ezcontentobject_attribute where data_text LIKE '<?xml%utf-8%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );
while ( count( $result ) > 0 )
{
include_once( 'lib/ezxml/classes/ezxml.php' );
foreach ( array_keys( $result ) as $i )
{
$cli->output( 'converting attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
$value = $result[$i]['data_text'];
$xml = new eZXML();
$doc =& $xml->domTree( $value );
if ( $doc )
{
$value = $doc->toString( 'iso-8859-1' );
$success = $db->query( "UPDATE ezcontentobject_attribute SET data_text='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );
if ( !$success )
{
$cli->output( 'unable to update the db.' );
}
}
else
{
$cli->output( 'error while creating DOM document' );
}
}
unset( $result );
$result = $db->arrayQuery( $selectSQL );
}
$selectSQL = "SELECT id, version, data_text5 FROM ezcontentclass_attribute where data_text5 LIKE '<?xml%utf-8%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );
while ( count( $result ) > 0 )
{
include_once( 'lib/ezxml/classes/ezxml.php' );
foreach ( array_keys( $result ) as $i )
{
$cli->output( 'converting class attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
$value = $result[$i]['data_text5'];
$xml = new eZXML();
$doc =& $xml->domTree( $value );
if ( $doc )
{
$value = $doc->toString( 'iso-8859-1' );
$success = $db->query( "UPDATE ezcontentclass_attribute SET data_text5='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );
if ( !$success )
{
$cli->output( 'unable to update the db.' );
}
}
else
{
$cli->output( 'error while creating DOM document' );
}
}
unset( $result );
$result = $db->arrayQuery( $selectSQL );
}
$tables = $db->arrayQuery( 'SHOW tables' );
foreach ( $tables as $table )
{
$tableName = false;
foreach( array_keys( $table ) as $i )
{
$tableName = $table[$i];
break;
}
if ( $tableName )
{
$cli->output( 'Changing table: ' . $tableName );
$db->query( 'ALTER TABLE ' . $db->escapeString( $tableName ) . ' CONVERT TO CHARACTER SET utf8' );
}
}
$selectSQL = "SELECT id, version, data_text FROM ezcontentobject_attribute where data_text LIKE '<?xml%iso-8859-1%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );
while ( count( $result ) > 0 )
{
foreach ( array_keys( $result ) as $i )
{
$cli->output( 'converting attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
$value = $result[$i]['data_text'];
$value = str_replace( 'iso-8859-1', 'utf-8', $value );
$success = $db->query( "UPDATE ezcontentobject_attribute SET data_text='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );
if ( !$success )
{
$cli->output( 'unable to update the db.' );
}
}
unset( $result );
$result = $db->arrayQuery( $selectSQL );
}
$selectSQL = "SELECT id, version, data_text5 FROM ezcontentclass_attribute where data_text5 LIKE '<?xml%iso-8859-1%' LIMIT 10";
$result = $db->arrayQuery( $selectSQL );
while ( count( $result ) > 0 )
{
foreach ( array_keys( $result ) as $i )
{
$cli->output( 'converting class attribute ' . $result[$i]['id'] . ' version ' . $result[$i]['version'] );
$value = $result[$i]['data_text5'];
$value = str_replace( 'iso-8859-1', 'utf-8', $value );
$success = $db->query( "UPDATE ezcontentclass_attribute SET data_text5='" . $db->escapeString( $value ) . "' WHERE id=" . $result[$i]['id'] . " AND version=" . $result[$i]['version'] );
if ( !$success )
{
$cli->output( 'unable to update the db.' );
}
}
unset( $result );
$result = $db->arrayQuery( $selectSQL );
}
$script->shutdown();
?>
If you want to run something similar on your database as it is now, you can try to use only the last step.
independent eZ Publish developer and service provider | http://blog.coomanskristof.be | http://ezpedia.org
|