eZ find & cluster

Author Message

Andreas Kaiser

Sunday 20 April 2008 3:58:08 pm

Just curious if anyone has eZ find extension installed in a eZ Publish cluster environment...

I'm sure it should work, because eZ find overview states "High scalability and performance ensures that the eZ Find search engine can support enterprise-level sites.", but comments are welcome...

Thanks...

eZ Partner in Madrid (Spain)
Web: http://www.atela.net/

Ivo Lukac

Monday 28 April 2008 6:57:13 am

I'm also interested with this issue.

Can someone please confirm it is working?

http://www.linkedin.com/in/ivolukac
http://www.netgen.hr/eng/blog
http://twitter.com/ilukac

A Fowler

Saturday 19 June 2010 7:48:55 pm

I am using it (eZ Find) in a clustered environment. It works. However, I'm not sure it's working completely properly. First of all, I know the indexing is happening on all nodes in the cluster independently. Sometimes a search, when re-run against the cluster, returns different results because the indexes are not all up to date on all the cluster nodes.

Thus, I also have questions. How should it ideally be set up? Should eZ Find run on its own dedicated node instead of multiple copies (and multiple indexes) on multiple nodes? If so, how do I redirect all search requests to that dedicated node? Is that something I can do within eZ Publish, or do I have to do it using special rules in my load balancer?

Gaetano Giunta

Sunday 20 June 2010 12:51:46 pm

The recommended way is to treat Solr as if it was a database:

- a single instance used, even in eZP cluster configs

- on a dedicated server

It makes everything much simpler (except for high-availability, but since the index can be rebuilt at any time, for sites without a huge number of contents and where a small search downtime is ok, rebuilding the index in case of a crash of the solr server might be acceptable. For real ha, you'll have to dig into solr master/slave modes)

About how to set it up: simply put in the solr.ini file the hostname of the server where solr is running. No loadbalancing needed.

Principal Consultant International Business
Member of the Community Project Board

Paul Borgermans

Sunday 20 June 2010 2:16:31 pm

Should the single instance as Gaetano mentioned is not enough (which I really doubt), Solr has a native master-slave cluster mode which is easy to set up.

See http://wiki.apache.org/solr/SolrReplication

In case of replication, to direct eZ Find/Solr backend writes, you can use a reverse proxy for eZ Find up to 2.2. The next version (2.3) may have simple directives to disriminate the master (for writes) and slaves (read) too ... however, work is underway in Solr to make even that unnecessary.

For those who attend the barcamp at the eZ Conference in Berlin next week and want to know more, shout about it and I'll give you some insight

Cheers

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

A Fowler

Monday 21 June 2010 11:18:04 am

Thank you both, Giunta and Borgermans! This helps clarify things for me.

One question remains. Which node is responsible for updating the index? Is the dedicated node expected to have enough of the eZ Publish infrastructure and scripts in place so that it can run the updates in a cron job by itself? Or should one of the web server nodes in the cluster run the update scripts (and will they connect to the dedicated solr instance on the remote host)?

Gaetano Giunta

Tuesday 22 June 2010 1:30:20 am

The indexing of data is always done in a "push" way, ie. one of the servers with ezp installed will make http requests to solr on port 8993.

Which server does the requests depends a little bit:

  • realtime idnexing: the server where the content is being edited will do the push
  • delayedindexing: one server where eZP is fully installed and the indexation cronjob is run

It does not need to be an eZP node dedicated to push the content to the indexing server, but in common scenarios it is one node dedicated to

  • serving the editing interface
  • running eZP cronjobs

The advantage of keeping the editing interface separated is that even under peak traffic conditions editors will still have a snappy interface. Also while heavy cronjobs are running, visitors of the site will not be impacted.

Principal Consultant International Business
Member of the Community Project Board

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.