eZFind - Adding extra information in SolR

Author Message

Maxime Thomas

Sunday 09 January 2011 8:27:07 am

Hi eZ People,

I will try to be the most clear as possible to explain what I want to do.

I've made an extension that handles extra data in separate tables in the eZPublish database.

I would like to find a simple and coherent way to index the data and to be able to do some queries in SolR.

There's two scenarii for indexation :

1 - I find a way to use eZFind cronjob to index my data.

2 - I do my own indexer using ezcSearch (which is already in eZPublish).

For the query part, I also have two options :

1 - I use eZFind to get the data and get the cool features already developped.

2 - I do my own queries and don't have the cool features (or I have to do it myself).

As far as I understand how the big thing works, I've thought about some points :

1 - The data schema set in the SolR instance delivered with eZFind is not compliant with my data. It's normal because eZFind returns node ids and not other things.

2 - The best strategy is to make my own indexer and user eZFind to get back my data. By this way, I can control how data is indexed (useful for performance and data update questions) and I deffer eZFind do the query job.

So the question is : can I enhance the current schema to fit my needs without interfering with eZFind ?

Is this the best thing to do ?

Have you guys already done that ?

Thank you for any kind of help !

Maxime Thomas
[email protected] | www.wascou.org | http://twitter.com/wascou

Company Blog : http://www.wascou.org/eng/Company/Blog
Technical Blog : http://share.ez.no/blogs/maxime-thomas

gilles guirand

Sunday 09 January 2011 8:59:46 am

Hi maxime,

  • Do you want to share eZ results and extra data results ?
  • Do you display your extra data full view (link destination) inside eZ ? module ?

Option 1 : i you want to use eZFind (but poor performance)

  • Create a custom datatype
  • Create a class, with 1 attribute / using your custom datatype
  • Create a custom PHP Class to map your datatype and eZFind
  • Create 1 content object / node for each external data
  • enjoy !

Option 2 : without eZ Find, but better performances

  • Index your datas with PHP / Solr (you could copy / paste some parts of ezfind code), or use ezc if you want
  • Create a custom Fetch, like eZFind does (you could copy / paste some parts of ezfind code, and replace ezcontent queries by your extra datas queris)

--
Gilles Guirand
eZ Community Board Member
http://twitter.com/gandbox
http://www.gandbox.fr

Maxime Thomas

Sunday 09 January 2011 1:33:56 pm

The first option is not a real option because it means to duplicate data and this is exactly what I want to avoid.

As mentionned in the documentaion of SolR, the best option is to define a new "core" and index separately my information.

Apparently too, eZFind 2.2 comes with a multicore configuration, one core for each language. I'm going to dig to see if I can use one of these core for my own data.

And another good point, my job is simplified by the ezcSearch component.

Maxime Thomas
[email protected] | www.wascou.org | http://twitter.com/wascou

Company Blog : http://www.wascou.org/eng/Company/Blog
Technical Blog : http://share.ez.no/blogs/maxime-thomas

gilles guirand

Sunday 09 January 2011 2:48:05 pm

No you don't duplicate data. Your attribute (using your custom datatype) just have to store a koreign key to your extrenal tables, and nothing else. You can't use eZ Find without this link.

If you don't want to create ezcontentobjects, so you could try the option 2 :

  • Index your datas with PHP / Solr (you could copy / paste some parts of ezfind code), or use ezc if you want
  • Create a custom Fetch, like eZFind does (you could copy / paste some parts of ezfind code, and replace ezcontent queries by your extra datas queries)

--
Gilles Guirand
eZ Community Board Member
http://twitter.com/gandbox
http://www.gandbox.fr

Maxime Thomas

Monday 10 January 2011 5:12:24 am

Again on the first solution : if I made a custom datatype which will store what I want to index, I duplicate Data (it means that if I update my external data, I have to publish again the content on the eZ side). it's definitively not a good way for what I want to do. Another bad point is that I set new nodes in eZPublish that are not made to be shown on the website.

I've followed the second track and I've succeeded to index external data in another core, different from ezfind standard cores without hacking the whole thing.

It's pretty cool but if you need to index heterogenous and linked data, you need to specify a shared schema for this core.

I was a bit worried about the ability of ezcSearch to handle the multi core but it's possible (in a simple way, without sharding) and it fits my need.

Thank you anyway to make some purpose, it's feeding the discussion.

Maxime Thomas
[email protected] | www.wascou.org | http://twitter.com/wascou

Company Blog : http://www.wascou.org/eng/Company/Blog
Technical Blog : http://share.ez.no/blogs/maxime-thomas

Paul Borgermans

Monday 10 January 2011 1:43:04 pm

Hello Maxime

A few bug fixes/enhancements and docs are keeping you from the desired outcome.

Unfortunately, I am very occupied at the moment .. including adding more of those capabilities to eZ Find for searching native ez publish and "foreign" data at the same time in a flexible way.

Once this is finished (in about 10 days or so), I'll post this enhanced version of eZ Find for general consumption.

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Maxime Thomas

Monday 10 January 2011 3:22:16 pm

Hi Paul,

I'm trying to make this flexible as my extension is an addon to eZPublish.

By the way, I got some errors when enabling all the cores (ez languages + mine), the data searched try to apply the ez language core schema and not mine.

I've got an error : "unknown field 'ezcsearch_type_s'" and this field is of course not in my schema.

Any idea so I can go on ?

EDIT :

Actually it comes from the ezcSearch component which adds a field called "ezcsearch_type" during the query. Very mysterious.

I will try to get some answers on the mailing list.

Maxime Thomas
[email protected] | www.wascou.org | http://twitter.com/wascou

Company Blog : http://www.wascou.org/eng/Company/Blog
Technical Blog : http://share.ez.no/blogs/maxime-thomas

Bertrand Dunogier

Tuesday 11 January 2011 1:31:18 am

Maxime,

the ezcsearch_type field is automatically added by ezcSearch (you can easily see it by greping for _type in ezc/Search). In the solr handler, the index() method indeed adds an attribute named ezcsearch_type_s (handlers/solr.php:873 on my copy).

I suggest you just add this field as a string one in your schema.xml for the core you index on, and see what's in there.

Bertrand Dunogier
eZ Systems Engineering, Lyon
http://twitter.com/bdunogier
http://gplus.to/BertrandDunogier

Maxime Thomas

Sunday 06 March 2011 4:04:22 pm

I finally find the answer to my question.

The ezcsearch_type_s field is the class type you want your results to be instancied in.

For example, if I'm indexing Articles, the ezsearch_type_s will be set with the data inside my index.

Then, searching for some text in my Article, the ezcsearch_type_s is automatically added to the query (as said by bertrand).

Maxime Thomas
[email protected] | www.wascou.org | http://twitter.com/wascou

Company Blog : http://www.wascou.org/eng/Company/Blog
Technical Blog : http://share.ez.no/blogs/maxime-thomas

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.