PHP Search in nodes escaping special characters

Author Message

Damien MARTIN

Tuesday 01 February 2011 1:26:55 am

Hi there,

I want to do a search using PHP in a tree.

So I use :

$entreprises = eZContentObjectTreeNode::subTreeByNodeID(
    array(
        'ClassFilterType' => 'include',
        'ClassFilterArray' => array('etablissement'),
        'AttributeFilter' => $criteres,
        'SortBy' => array('name', true)
    ),
    2283
);

and where

$criteres[] = array('etablissement/ville', 'like', "*$commune*");

It works very well but I have a problem with accents.

When I'm looking for "Chateau-Thierry" I can't find "Château-Thierry".

But when I search in the back office, I find the same results while typing "chateau" or "château". So I suppose There is something I can do.

But what ?

Could you help me with this ?

Adrien LOCHON

Tuesday 01 February 2011 6:05:16 am

Hi,

I would have used a ID for that. Instead of using a filter on a name, with spaces, accents and things like that, a ID is much better for accuracy.

Or, if you don't want to use any ID or you can't, use a string converter function like this :

function tvReplayImportFiltreBadChars($sContent) {
    $sContent = str_replace(" ", " ", $sContent);
    $sContent = trim($sContent);
    
    $sContent = str_replace("«", "\"", $sContent);
    $sContent = str_replace("»", '"', $sContent);
    $sContent = str_replace("“", "\"", $sContent);
    $sContent = str_replace("”", "\"", $sContent);
    $sContent = str_replace("…", "...", $sContent);
    $sContent = str_replace("´", "'", $sContent);
    $sContent = str_replace("‘", "'", $sContent);
    $sContent = str_replace("’", "'", $sContent);
    $sContent = str_replace("œ", "oe", $sContent);
    $sContent = str_replace(chr(197).chr(34), "oe", $sContent);
    $sContent = str_replace(chr(226) . chr(128) . chr(147), "-", $sContent);
    $sContent = str_replace("Œ", "OE", $sContent);
    $sContent = str_replace("—", "-", $sContent);
    $sContent = str_replace("–", "-", $sContent);
    $sContent = str_replace("–", "-", $sContent);
    $sContent = str_replace("•", "-", $sContent);
    $sContent = str_replace("Ÿ", "Y", $sContent);
    $sContent = str_replace("ž", "z", $sContent);
    $sContent = str_replace("Ž", "Z", $sContent);
    $sContent = str_replace("Š", "S", $sContent);
    $sContent = str_replace("š", "s", $sContent);
    $sContent = str_replace("›", ">", $sContent);
    $sContent = str_replace("‹", "<", $sContent);
    $sContent = str_replace("€", "E", $sContent);
    $sContent = str_replace(chr(226).chr(130).chr(172) , 'euros', $sContent );
    $sContent = str_replace(chr(195).chr(169), "é", $sContent);
    $sContent = str_replace(chr(195).chr(34), "û", $sContent);
    $sContent = str_replace(chr(195).chr(170), "ê", $sContent);
    $sContent = str_replace(chr(195).chr(168), "è", $sContent);
    $sContent = str_replace(chr(195).chr(32), "à" . chr(32), $sContent);
    $sContent = str_replace(chr(195).chr(162), "â", $sContent);
    $sContent = str_replace(chr(195).chr(69), "à", $sContent);
    $sContent = str_replace(chr(194).chr(34), '"', $sContent);
    $sContent = str_replace(chr(226).chr(69).chr(153), "'", $sContent);
    $sContent = str_replace(chr(226).chr(69).chr(166), "...", $sContent);
    $sContent = str_replace(chr(195).chr(167), "ç", $sContent);
    
    $sContent = trim($sContent);

    $sContent = strtoupper($sContent);
    $aLettres = array(",", ";", ".", ":", "°", "-", "_", "'", '"', "&", " ", "/", "\\", "@", "$", "%", "£", "¤", "µ", "*", "!", "§");
    $sContent = str_replace($aLettres, " ", $sContent);
    $sContent = str_replace(" ", "", $sContent);
    
    $sContent = strtr(utf8_decode($sContent), utf8_decode("ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÇçÌÍÎÏìíîïÙÚÛÜùúûüÿÑñ"), "aaaaaaaaaaaaooooooooooooeeeeeeeecciiiiiiiiuuuuuuuuynn");
    $sContent = strtolower($sContent);
    $sContent = utf8_encode($sContent);
    
    return $sContent;
}

This will change something like "île-de-fortûné" into "iledefortune", avoiding search errors, at least most of them i suppose... I use it a lot to match movies titles between our portal and our partners.

My opinion is : use ids :)

Damien MARTIN

Tuesday 01 February 2011 6:35:02 am

Thank you very much Adrien, but it is not what i'm looking for.

Since this morning I wrote a little piece of code to do what I want :

if($commune != "tous")
{
    // Conversion de la chaine de caractere pour la passer en ASCII
    $commune = iconv("UTF-8", "ASCII//TRANSLIT", $commune);

    $resultat = array();
    foreach($entreprises as $e)
    {
        $c = eZContentObject::fetchByNodeID($e->NodeID);
        $dm = $c->DataMap();

        $ville = $dm['ville']->DataText;
        $ville = iconv("UTF-8", "ASCII//TRANSLIT", $ville);

        $commune = strtoupper($commune);
        $ville = strtoupper($ville);

        $commune = str_replace("-", " ", $commune);
        $ville = str_replace("-", " ", $ville);

        //echo "<!-- $commune / $ville -->\n";

        unset($dm);
        unset($c);

        if($commune == $ville)
        {
            $resultat[] = $e;
        }
    }

    $entreprises = $resultat;

}

It works well, but I have 1800+ entries in $entreprises, so the server stop serving the page after a few seconds.

This is why I want to use an embed mecanism in ezpublish to search through a tree without having to take care of accents as it does in the administration panel.

@Adrien : You should tke a look at iconv. I discovered it a few weeks ago and it is very usefull.

Thanks.

Franck Magnan

Tuesday 01 February 2011 11:59:52 am

Hello Damien,
I don't think it's an eZ Publish issue but a MySQL issue. Do you use MySQL as database?
I took your code and it's working on my eZ Publish instance. When I search for "Chateau-Thierry", I find my "Château-Thierry" content and my sql query looks like:

SELECT DISTINCT
ezcontentobject.*,
ezcontentobject_tree.*,
ezcontentclass.serialized_name_list as class_serialized_name_list,
ezcontentclass.identifier as class_identifier,
ezcontentclass.is_container as is_container

, ezcontentobject_name.name as name,  ezcontentobject_name.real_translation 


FROM
ezcontentobject_tree,
ezcontentobject,ezcontentclass
, ezcontentobject_name 

, ezcontentobject_attribute a1 


WHERE
ezcontentobject_tree.path_string like '/1/2/%' and  



a1.contentobject_id = ezcontentobject.id AND
a1.contentclassattribute_id = 343 AND
a1.version = ezcontentobject_name.content_version AND 
( a1.language_id & ezcontentobject.language_mask > 0 AND
( (   ezcontentobject.language_mask - ( ezcontentobject.language_mask & a1.language_id ) ) & 1 )
+ ( ( ( ezcontentobject.language_mask - ( ezcontentobject.language_mask & a1.language_id ) ) & 2 ) )
<
( a1.language_id & 1 )
+ ( ( a1.language_id & 2 ) )
) 
AND                             ( a1.sort_key_string LIKE '%Chateau-Thierry%'  ) AND 
ezcontentclass.version=0 AND
ezcontentobject_tree.node_id != 2 AND
ezcontentobject_tree.contentobject_id = ezcontentobject.id  AND
ezcontentclass.id = ezcontentobject.contentclass_id AND

ezcontentobject.contentclass_id  IN  ( 45 ) AND
ezcontentobject_tree.contentobject_id = ezcontentobject_name.contentobject_id and
ezcontentobject_tree.contentobject_version = ezcontentobject_name.content_version and 
( ezcontentobject_name.language_id & ezcontentobject.language_mask > 0 AND
( (   ezcontentobject.language_mask - ( ezcontentobject.language_mask & ezcontentobject_name.language_id ) ) & 1 )
+ ( ( ( ezcontentobject.language_mask - ( ezcontentobject.language_mask & ezcontentobject_name.language_id ) ) & 2 ) )
<
( ezcontentobject_name.language_id & 1 )
+ ( ( ezcontentobject_name.language_id & 2 ) )
) 

AND ezcontentobject_tree.is_invisible = 0
AND ((ezcontentobject.section_id in (1))) 

AND 
ezcontentobject.language_mask & 3 > 0 

ORDER BY ezcontentobject_name.name ASC

You can enable output debug and sql debug to compare your sql query with mine.
If it does not work in your environment, I suppose your MySQL server is not configured like mine, especially the collation. You may be take a look to that post

--
Developer at Open Wide

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.