Forums / Setup & design / Clean URL for Vietnamese pages

Clean URL for Vietnamese pages

Previous topic

Next topic

Author	Message
Guillaume Marty	Thursday 09 June 2011 9:22:19 am I saw a topic and a bug related to this issue, but they date back to 2009. The problem is clean URL are not generated for pages written in Vietnamese, falling back to /content/view/full/ type URL. I installed the transformation file attached to the bug report and override transform.ini this way: [Transformation] Charsets[]=utf-8;vietnamese [vietnamese] Files[]=vietnamese.tr Extensions[] That's almost OK as some characters are not caught by the transformation rules and are replace by a hyphen. Character: ệ Rule in tranformation file: U+1EC7 = "e" Result: - Expected result: e Any ideas why not all characters are transformed?
Ivo Lukac	Thursday 09 June 2011 10:28:12 am Hi Try this custom url translator, place the file in "urlfilters/ngvietnamesefilter.php" in your extension with content: <?php class nGVietnameseFilter extends eZURLAliasFilter { static $mappingArray = array('\u00C0' => 'A', '\u1EA2' => 'A', '\u00C3' => 'A', '\u00C1' => 'A', '\u1EA0' => 'A', '\u1EB0' => 'A','\u1EB2' => 'A', '\u1EB4' => 'A', '\u1EAE' => 'A', '\u1EB6' => 'A', '\u1EA6' => 'A', '\u1EA8' => 'A','\u1EAA' => 'A', '\u1EA4' => 'A', '\u1EAC' => 'A', '\u00C8' => 'E', '\u1EBA' => 'E', '\u1EBC' => 'E','\u00C9' => 'E', '\u1EB8' => 'E', '\u1EC0' => 'E', '\u1EC2' => 'E', '\u1EC4' => 'E', '\u1EBE' => 'E','\u1EC6' => 'E', '\u00CC' => 'I', '\u1EC8' => 'I', '\u0128' => 'I', '\u00CD' => 'I', '\u1ECA' => 'I','\u00D2' => 'O', '\u1ECE' => 'O', '\u00D5' => 'O', '\u00D3' => 'O', '\u1ECC' => 'O', '\u1ED2' => 'O','\u1ED4' => 'O', '\u1ED6' => 'O', '\u1ED0' => 'O', '\u1ED8' => 'O', '\u1EDC' => 'O', '\u1EDE' => 'O','\u1EE0' => 'O', '\u1EDA' => 'O', '\u1EE2' => 'O', '\u00D9' => 'U', '\u1EE6' => 'U', '\u0168' => 'U','\u00DA' => 'U', '\u1EE4' => 'U', '\u1EEA' => 'U', '\u1EEC' => 'U', '\u1EEE' => 'U', '\u1EE8' => 'U','\u1EF0' => 'U', '\u1EF2' => 'Y', '\u1EF6' => 'Y', '\u1EF8' => 'Y', '\u00DD' => 'Y', '\u1EF4' => 'Y','\u00E0' => 'a', '\u1EA3' => 'a', '\u00E3' => 'a', '\u00E1' => 'a', '\u1EA1' => 'a', '\u1EB1' => 'a','\u1EB3' => 'a', '\u1EB5' => 'a', '\u1EAF' => 'a', '\u1EB7' => 'a', '\u1EA7' => 'a', '\u1EA9' => 'a','\u1EAB' => 'a', '\u1EA5' => 'a', '\u1EAD' => 'a', '\u00E8' => 'e', '\u1EBB' => 'e', '\u1EBD' => 'e','\u00E9' => 'e', '\u1EB9' => 'e', '\u1EC1' => 'e', '\u1EC3' => 'e', '\u1EC5' => 'e', '\u1EBF' => 'e','\u1EC7' => 'e', '\u00EC' => 'i', '\u1EC9' => 'i', '\u0129' => 'i', '\u00ED' => 'i', '\u1ECB' => 'i','\u00F2' => 'o', '\u1ECF' => 'o', '\u00F5' => 'o', '\u00F3' => 'o', '\u1ECD' => 'o', '\u1ED3' => 'o','\u1ED5' => 'o', '\u1ED7' => 'o', '\u1ED1' => 'o', '\u1ED9' => 'o', '\u1EDD' => 'o', '\u1EDF' => 'o','\u1EE1' => 'o', '\u1EDB' => 'o', '\u1EE3' => 'o', '\u00F9' => 'u', '\u1EE7' => 'u', '\u0169' => 'u','\u00FA' => 'u', '\u1EE5' => 'u', '\u1EEB' => 'u', '\u1EED' => 'u', '\u1EEF' => 'u', '\u1EE9' => 'u','\u1EF1' => 'u', '\u1EF3' => 'y', '\u1EF7' => 'y', '\u1EF9' => 'y', '\u00FD' => 'y', '\u1EF5' => 'y','\uFB00' => 'ff', '\uFB01' => 'fi', '\uFB02' => 'fl', '\uFB03' => 'ffi', '\uFB04' => 'ffl', '\uFB05' => 'ft', '\uFB06' => 'st','\u00C2' => 'A', '\u00CA' => 'E', '\u00CE' => 'I', '\u00D4' => 'O', '\u00DB' => 'U','\u00E2' => 'a', '\u00EA' => 'e', '\u00EE' => 'i', '\u00F4' => 'o', '\u00FB' => 'u','\u01A0' => 'O', '\u01A1' => 'o', '\u01AF' => 'U', '\u01B0' => 'u'); static function utf8ToUnicode( $str ) { $unicode = array();$values = array();$lookingFor = 1; for ($i = 0; $i < strlen( $str ); $i++ ) { $thisValue = ord( $str[ $i ] ); if ( $thisValue < ord('A') ) { if ($thisValue >= ord('0') && $thisValue <= ord('9')) { $unicode[] = chr($thisValue); }else { $unicode[] = '%'.dechex($thisValue); } } else { if ( $thisValue < 128) $unicode[] = $str[ $i ]; else { if ( count( $values ) == 0 ) $lookingFor = ( $thisValue < 224 ) ? 2 : 3; $values[] = $thisValue; if ( count( $values ) == $lookingFor ) { $number = ( $lookingFor == 3 ) ?( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 ); $number = dechex($number); $unicode[] = '\u' . strtoupper(str_pad($number, 4, '0', STR_PAD_LEFT)); $values = array(); $lookingFor = 1; } } } } return implode("",$unicode); } function process( $text, &$languageObject, &$caller ){ $outputText = '';$textArray = preg_split('/(?<!^)(?!$)/u', $text); foreach($textArray as $char){ $unicodeChar = nGVietnameseFilter::utf8ToUnicode($char); $outputText .= (array_key_exists($unicodeChar, nGVietnameseFilter::$mappingArray)) ? nGVietnameseFilter::$mappingArray[$unicodeChar] : $char; } return $outputText; } } ?> Add following lines to your site.ini: [URLTranslator] Extensions[]={YOUR EXTENSION NAME} Filters[]=nGVietnameseFilter http://www.linkedin.com/in/ivolukac http://www.netgen.hr/eng/blog http://twitter.com/ilukac
Guillaume Marty	Tuesday 14 June 2011 5:35:45 am Thanks for your reply, but it didn't work for me. First, I tried to do what you described. Then I regenerated the autoloads array and tried: [URLTranslator] FilterClasses[]=nGVietnameseFilter (Extensions & Filters are deprecated now) But it didn't work either. It looks like the characters are transformed in a bad way beforehand. I'm still enquiring.
Ivo Lukac	Tuesday 14 June 2011 5:50:21 am Hi, Send me your email via "Direct contact" form (http://share.ez.no/authorcontact/form/9504 ) and I'll send you the files, maybe the copy&paste method from post is not good http://www.linkedin.com/in/ivolukac http://www.netgen.hr/eng/blog http://twitter.com/ilukac