Search for 'seb' instead of 'sebastiaan'

Author Message

Sebastiaan van der Vliet

Wednesday 29 September 2010 2:31:14 am

I'm using the 2.1.0-final version of eZ find. I want to search for part of a word/name instead of the complete word, e.g., when I search for 'seb', I also want to find 'sebastiaan'.It is possible using the wildcard (*), but I want to keep things simple for the end users.

In ezfind/java/solr/conf/schema.xml I made a copy of the field info for:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">

and called it:

<fieldType name="text_staff" class="solr.TextField" positionIncrementGap="100">

In this new fieldtype called 'text_staff' I then changed:

<analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

to

<analyzer type="index">
        <tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />

I define a new field in the <fields> section:

<field name="ezf_staff_text" type="text_staff"  indexed="true"  stored="true" multiValued="true" termVectors="true"/>

I then added the following line to the copyFields:

<copyField source="attr_firstname_s" dest="ezf_staff_text"/>
<copyField source="attr_lastname_s" dest="ezf_staff_text"/>

Finally, in the file ezfind\classes\ezfezpsolrquerybuilder.php I change the following code around line 262 from:

 $highLightFields = $queryFields;
        $queryFields[] = eZSolr::getMetaFieldName( 'name' );
        $queryFields[] = eZSolr::getMetaFieldName( 'owner_name');

to

$highLightFields = $queryFields;
        $queryFields[] = eZSolr::getMetaFieldName( 'name' );
        $queryFields[] = eZSolr::getMetaFieldName( 'owner_name' );
        $queryFields[] = eZSolr::getFieldName( 'ezf_staff_text' );

And it works. However, it seems like a lot of changes to get the search for partial words working. Am I overdoing it? Did I miss something easier?

Thanks,
Sebastiaan

Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.

Matthieu Sévère

Wednesday 29 September 2010 4:32:19 am

This is a smart workaround. But as you said quite complicated, I'm also interesting to see if there is a simpler solution.

--
eZ certified developer: http://ez.no/certification/verify/346216

Ivo Lukac

Wednesday 29 September 2010 6:38:51 am

Why not just copying into "ezf_df_text" solr field, then you don't need to hack ezfind code....

Changing schema.xml is normal ;)

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

Sebastiaan van der Vliet

Wednesday 29 September 2010 7:05:56 am

Hi Ivo, I don't like changing the default/standard Solr field settings. For one reason, I am not sure what it would do to the size of the entire index if all text fields are indexed like the staff_text. I also think that using the ezf_df_text solr field would still require adding the line below to ezfezpsolrquerybuilder.php.

$queryFields[] = eZSolr::getFieldName( 'ezf_df_text' );

Certified eZ publish developer with over 9 years of eZ publish experience. Available for challenging eZ publish projects as a technical consultant, project manager, trouble shooter or strategic advisor.

Paul Borgermans

Wednesday 29 September 2010 11:21:53 am

"

Hi Ivo, I don't like changing the default/standard Solr field settings. For one reason, I am not sure what it would do to the size of the entire index if all text fields are indexed like the staff_text. I also think that using the ezf_df_text solr field would still require adding the line below to ezfezpsolrquerybuilder.php.

$queryFields[] = eZSolr::getFieldName( 'ezf_df_text' );
"

Hi all

Ngram tokenisation is something that can inflate your index pretty badly.

@Sebastiaan: I'll patch ezfind soonish so you can have better control on what field types are used (along some more in this realm)

Take care you do the Ngram tokenisation only at index time, not query time.

Another approach that is taken sometimes is to use a synonym filter (at query time) to have fine grained control ... ngrams .. well ... are sometimes very useful, but can lead to many confusing search results as well.

hth

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.