Forums / Setup & design / Clean URL for Vietnamese pages

Clean URL for Vietnamese pages

Author Message

Guillaume Marty

Thursday 09 June 2011 9:22:19 am

I saw a topic and a bug related to this issue, but they date back to 2009.

The problem is clean URL are not generated for pages written in Vietnamese, falling back to /content/view/full/ type URL.

I installed the transformation file attached to the bug report and override transform.ini this way:

[Transformation]
Charsets[]=utf-8;vietnamese

[vietnamese]
Files[]=vietnamese.tr
Extensions[]

 

That's almost OK as some characters are not caught by the transformation rules and are replace by a hyphen.

Character: ệ
Rule in tranformation file: U+1EC7 = "e"
Result: -
Expected result: e

Any ideas why not all characters are transformed?

Ivo Lukac

Thursday 09 June 2011 10:28:12 am

Hi

Try this custom url translator, place the file in "urlfilters/ngvietnamesefilter.php" in your extension with content:

<?php
class nGVietnameseFilter extends eZURLAliasFilter
{
static $mappingArray = array('\u00C0' => 'A', '\u1EA2' => 'A', '\u00C3' => 'A', '\u00C1' => 'A', '\u1EA0' => 'A', '\u1EB0' => 'A','\u1EB2' => 'A', '\u1EB4' => 'A', '\u1EAE' => 'A', '\u1EB6' => 'A', '\u1EA6' => 'A', '\u1EA8' => 'A','\u1EAA' => 'A', '\u1EA4' => 'A', '\u1EAC' => 'A', '\u00C8' => 'E', '\u1EBA' => 'E', '\u1EBC' => 'E','\u00C9' => 'E', '\u1EB8' => 'E', '\u1EC0' => 'E', '\u1EC2' => 'E', '\u1EC4' => 'E', '\u1EBE' => 'E','\u1EC6' => 'E', '\u00CC' => 'I', '\u1EC8' => 'I', '\u0128' => 'I', '\u00CD' => 'I', '\u1ECA' => 'I','\u00D2' => 'O', '\u1ECE' => 'O', '\u00D5' => 'O', '\u00D3' => 'O', '\u1ECC' => 'O', '\u1ED2' => 'O','\u1ED4' => 'O', '\u1ED6' => 'O', '\u1ED0' => 'O', '\u1ED8' => 'O', '\u1EDC' => 'O', '\u1EDE' => 'O','\u1EE0' => 'O', '\u1EDA' => 'O', '\u1EE2' => 'O', '\u00D9' => 'U', '\u1EE6' => 'U', '\u0168' => 'U','\u00DA' => 'U', '\u1EE4' => 'U', '\u1EEA' => 'U', '\u1EEC' => 'U', '\u1EEE' => 'U', '\u1EE8' => 'U','\u1EF0' => 'U', '\u1EF2' => 'Y', '\u1EF6' => 'Y', '\u1EF8' => 'Y', '\u00DD' => 'Y', '\u1EF4' => 'Y','\u00E0' => 'a', '\u1EA3' => 'a', '\u00E3' => 'a', '\u00E1' => 'a', '\u1EA1' => 'a', '\u1EB1' => 'a','\u1EB3' => 'a', '\u1EB5' => 'a', '\u1EAF' => 'a', '\u1EB7' => 'a', '\u1EA7' => 'a', '\u1EA9' => 'a','\u1EAB' => 'a', '\u1EA5' => 'a', '\u1EAD' => 'a', '\u00E8' => 'e', '\u1EBB' => 'e', '\u1EBD' => 'e','\u00E9' => 'e', '\u1EB9' => 'e', '\u1EC1' => 'e', '\u1EC3' => 'e', '\u1EC5' => 'e', '\u1EBF' => 'e','\u1EC7' => 'e', '\u00EC' => 'i', '\u1EC9' => 'i', '\u0129' => 'i', '\u00ED' => 'i', '\u1ECB' => 'i','\u00F2' => 'o', '\u1ECF' => 'o', '\u00F5' => 'o', '\u00F3' => 'o', '\u1ECD' => 'o', '\u1ED3' => 'o','\u1ED5' => 'o', '\u1ED7' => 'o', '\u1ED1' => 'o', '\u1ED9' => 'o', '\u1EDD' => 'o', '\u1EDF' => 'o','\u1EE1' => 'o', '\u1EDB' => 'o', '\u1EE3' => 'o', '\u00F9' => 'u', '\u1EE7' => 'u', '\u0169' => 'u','\u00FA' => 'u', '\u1EE5' => 'u', '\u1EEB' => 'u', '\u1EED' => 'u', '\u1EEF' => 'u', '\u1EE9' => 'u','\u1EF1' => 'u', '\u1EF3' => 'y', '\u1EF7' => 'y', '\u1EF9' => 'y', '\u00FD' => 'y', '\u1EF5' => 'y','\uFB00' => 'ff', '\uFB01' => 'fi', '\uFB02' => 'fl', '\uFB03' => 'ffi', '\uFB04' => 'ffl', '\uFB05' => 'ft', '\uFB06' => 'st','\u00C2' => 'A', '\u00CA' => 'E', '\u00CE' => 'I', '\u00D4' => 'O', '\u00DB' => 'U','\u00E2' => 'a', '\u00EA' => 'e', '\u00EE' => 'i', '\u00F4' => 'o', '\u00FB' => 'u','\u01A0' => 'O', '\u01A1' => 'o', '\u01AF' => 'U', '\u01B0' => 'u');

static function utf8ToUnicode( $str ) {
$unicode = array();$values = array();$lookingFor = 1;
for ($i = 0; $i < strlen( $str ); $i++ ) {
$thisValue = ord( $str[ $i ] );
if ( $thisValue < ord('A') ) {
if ($thisValue >= ord('0') && $thisValue <= ord('9')) {
$unicode[] = chr($thisValue);
}else {
$unicode[] = '%'.dechex($thisValue);
}
} else {
if ( $thisValue < 128)
$unicode[] = $str[ $i ];
else {
if ( count( $values ) == 0 ) $lookingFor = ( $thisValue < 224 ) ? 2 : 3;
$values[] = $thisValue;
if ( count( $values ) == $lookingFor ) {
$number = ( $lookingFor == 3 ) ?( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 );
$number = dechex($number);
$unicode[] = '\u' . strtoupper(str_pad($number, 4, '0', STR_PAD_LEFT));
$values = array();
$lookingFor = 1;
}
} 
}
} 
return implode("",$unicode);
} 
function process( $text, &$languageObject, &$caller ){
$outputText = '';$textArray = preg_split('/(?<!^)(?!$)/u', $text);
foreach($textArray as $char){
$unicodeChar = nGVietnameseFilter::utf8ToUnicode($char);
$outputText .= (array_key_exists($unicodeChar, nGVietnameseFilter::$mappingArray)) ? nGVietnameseFilter::$mappingArray[$unicodeChar] : $char;
}
return $outputText;
}
}
?>

Add following lines to your site.ini:

 [URLTranslator]
Extensions[]={YOUR EXTENSION NAME}
Filters[]=nGVietnameseFilter

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Guillaume Marty

Tuesday 14 June 2011 5:35:45 am

Thanks for your reply, but it didn't work for me.

First, I tried to do what you described.

Then I regenerated the autoloads array and tried:

[URLTranslator]
FilterClasses[]=nGVietnameseFilter

(Extensions & Filters are deprecated now)

But it didn't work either. It looks like the characters are transformed in a bad way beforehand. I'm still enquiring.

Ivo Lukac

Tuesday 14 June 2011 5:50:21 am

Hi,

Send me your email via "Direct contact" form (http://share.ez.no/authorcontact/form/9504 ) and I'll send you the files, maybe the copy&paste method from post is not good

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac