This article is based on the cluster setup currently running on ez.no. There are two back-end servers running in this setup. Both are Dell PowerEdge 1U dual 2.8GHz Xeon with 2GB RAM and SCSI hard discs.
The software running on the servers is:
The setup contains two servers and two directors as shown on the image below. The directors route incoming requests to the correct server. We use Keepalived for our directors. Keepalived periodically polls the two servers to check their health (and was originally written for the Linux Virtual Server project).
Every eZ publish installation comes with a URL which returns the text 'eZ publish is alive' if everything is working fine. The health check URL for ez.no is http://ez.no/ezinfo/is_alive.
The second director is a backup for the main director. It will automatically take over if the main director fails. This setup provides good failover and load balancing. If you want a simpler setup to get load balancing you can use a round robin Round_robin DNS setup, however this does not provide proper failover functionality.
The servers do the actual work of serving the pages. Each server is identically configured, with one exception: only one server runs a MySQL® server. Alternately, you can use MySQL's master/slave replication or the new cluster functionality.
When we were planning our cluster, we considered using MySQL Cluster for this setup but it required too much memory: the entire cluster database, complete with indexes and all, needs to stay in RAM. We would have required more than 3GB of RAM per server just for the ez.no database. Because the cluster needs to have redundant data in memory we would need to keep one copy per server. If you have more servers this becomes more efficient since the database can be spread across more servers. So if you have a smaller database, lots of RAM or more cluster nodes, MySQL cluster could be a good alternative for you.
To make eZ Publish run well in a cluster we need to ensure that all data is properly synchronized between the servers. In eZ Publish, most data is stored in the database, however some data is also stored on the filesystem. Data stored on the filesystem includes binary files, images, compiled templates and cache files.
Important data like files and images are synchronized via rsync when changed. Cache files and easily re-generated files are deleted from the other server when it changes. To make this work we patched PHP to automatically log every file created or modified.
To make ez.no sync files between the nodes in the cluster we have made some patches to PHP and eZ Publish that automatically log any new, modified or removed files. Since eZ Publish is using Imagemagick to scale and convert images there are also patches to apply to eZ Publish in order to log the changes to images. The log file created by this patch is then parsed and used by rsync to synchronize the servers.
The script used to rsync the changed files from one server to another is available together with the patches. If you want more servers in this cluster setup then you need to alter the rsync script to sync the files to all other servers in the setup instead of just one. Currently this script is written with a master / slave setup in mind.
This cluster setup is not ideal. If a page is visited on server A which causes an image variation to be created, e.g. the first time an article is shown with an embedded image, and server B gets a hit on the same page before the image variation has been synced from server A to B then the image shown on server B will be broken. The reason for this is that the image is not found on server B at time the page is rendered and hence the cached version of the page will be wrong. The fix for this problem is simply to clear the cache for the page which has the problem image, or to re-publish the object. This problem will be solved for eZ Publish in the future.
In order to make the MySQL server run a bit faster we have have enabled the MySQL query cache and created a larger key buffer. The configuration for this is:
Once the cluster is up and running we ran some tests with using the Apache Bench (ab) benchmarking tool which comes with Apache. Some of the tests performed are shown below. These tests only measure the maximum amount of requests the cluster can handle in an ideal situation. The diagram below shows the number of pages the cluster can server per second with different combinations of servers enabled and with different concurrent users.
The detailed output from the benchmark tests are shown below.
By running eZ Publish in a clustered environment you get both failover and load balancing. From the numbers in these tests we see that this cluster with 2 back-end servers can serve a maximum of 64 pages per second. With the ez.no setup this means 753Kb/sec + the bandwidth needed for images and CSS files.