Thursday 10 April 2003 9:00:40 am
Hi Jan, thanks for your response. I understand the issues with PHP support for multibyte character encodings and have been researching them myself. Re: regexp support for UTF8, there are two areas: 1. ereg 2. preg_match (the PCRE libraries) For 1, the solution is simple, which is to use mb_ereg, or even enable the mbstring override mode which replaces standard ereg with mb_ereg etc. I would hope that this is being done throught eZ3 code. For 2, I have made further enquiries. Please see the following post which I made to the php.i18n group: http://news.php.net/article.php?group=php.i18n&article=530 And the response from Wez Furlong, the PCRE maintainer: http://news.php.net/article.php?group=php.i18n&article=531 In other words, it sounds like PCRE should support UTF8 fine from php 4.3 onwards. Are there other specific problems with unicode support which you are aware of? Obviously case handling is a difficult one as it requires locale awareness which I'm not sure if there is any support for in any libraries (possibly ICU? http://oss.software.ibm.com/icu/). We are currently investigating the possibility of supporting UTF8 in ez 2.x. Thanks for your help. Dave
|