This article explains how to tune a MySQL database for use with eZ Publish. MySQL is a database engine used by eZ Publish to store content. While other databases are supported by eZ Publish (such as PostgreSQL and Oracle), for performance reasons we recommend MySQL. This article has suggestions for optimizing for either read or write operations, and also discusses some other optimization techniques such as DNS and filesystem configuration.
This article is based on MySQL version 4.1. At the time of writing, MySQL version 5 is available; however, eZ Publish does not require the new functionality in MySQL 5 and, being optimized for 4.1, will run slightly faster with this version.
While MySQL is bundled with most Linux distributions, we recommend that you obtain and install an optimized distribution from MySQL. Distributions are available from the MySQL download page.
If you are running on Intel-based hardware, we recommend that you use the distribution of MySQL that is compiled with the Intel ICC compiler, which can be downloaded from the URL shown above. This optmizes MySQL specifically for the Intel platform, which results in a performance gain (as shown below). Our tests show up to a 20% performance increase when using the MySQL ICC-compiled version compared to the standard GCC-compiled version.
When testing MySQL compiled with the ICC compiler, we tested with eZ Publish operations that use the database heavily. We wrote 15 articles to a database containing 35.000 articles and enabled delayed search indexing. The test was run 5 times; we recorded the total time it took the script to run and also the time it took specifically for the database operations.
GCC | ICC | |
---|---|---|
Total time (avg) | 7.702 sec | 6.788 sec (11.87% faster) |
SQL time (avg) | 4.406 sec | 3.15 sec (20.21% faster) |
The results from this test show that the ICC compiler boosts the performance of database transactions up to 20.2% on the most database-intensive eZ Publish operations. This is especially significant in a clustered environment where multiple webserver nodes are using the same database server for inserts. (See the article Clustering in eZ Publish 3.8 for more information about this configuration.)
You can also compile MySQL manually with the ICC compiler. This is a more complicated process and requires an ICC compiler license (which currently costs $399 USD). Instructions on how to compile MySQL can be found in the MySQL documentation.
When running MySQL (or any other software), performance is limited by the hardware on which the software is running. Therefore, it is important that you have the best possible hardware configuration for your site. The article Server Architecture for eZ Publish Hosting contains information about hardware configuration and scalability.
Generally speaking, the faster the hardware the better the MySQL performance (obviously). However, MySQL uses server resources differently than webserver nodes. The two most important factors (assuming that the server has a reasonably recent CPU and decent CPU speed) are:
With webserver nodes, an additional CPU will almost double page-serving capacity, and increasing the CPU speed by 10% will result in a performance increase of almost 10%. However, with a MySQL server the limitations are the performance of the disk and the available memory.
If you have a read-intensive site (that is, an eZ Publish site that serves a lot of pages) you should increase memory so it can be used for caching. If you have a write-intensive site (that is, an eZ Publish site where a lot of content gets added) you should increase the hard disk speed to handle more database inserts. The rotation speed of the hard drive is especially important. Faster disk rotation is better; at the time of writing, 15.000 RPM is the fastest rotation speed available. You could also install a RAID ("Redundant Array of Independent / Inexpensive Disks") to increase disk performance. When we performed mass object insertion on an eZ Publish site (several articles published per second), we were limited by the speed of the hard drive on the database server; there were both free memory and free CPU cycles.
Before you tune your MySQL installation, it is useful to know the current settings. Use the show variables command in the MySQL command line interface to display the current settings.
mysql> show variables +-------------------------+---------+ | Variable_name | Value | +-------------------------+---------+ | back_log | 50 | | basedir | /usr/ | | binlog_cache_size | 32768 | | bulk_insert_buffer_size | 8388608 | .... ....
You can view specific variables by using the like operator in the query. For example:
mysql> show variables like "innodb%"; +---------------------------------+------------------------+ | Variable_name | Value | +---------------------------------+------------------------+ | innodb_additional_mem_pool_size | 104857600 | | innodb_autoextend_increment | 8 | | innodb_buffer_pool_awe_mem_mb | 0 | | innodb_buffer_pool_size | 734003200 | | innodb_data_file_path | ibdata1:10M:autoextend | | innodb_data_home_dir | | | innodb_fast_shutdown | ON | | innodb_file_io_threads | 4 | | innodb_file_per_table | OFF | | innodb_flush_log_at_trx_commit | 0 | | innodb_flush_method | | | innodb_force_recovery | 0 | | innodb_lock_wait_timeout | 40 | | innodb_locks_unsafe_for_binlog | OFF | | innodb_log_arch_dir | | | innodb_log_archive | OFF | | innodb_log_buffer_size | 8388608 | | innodb_log_file_size | 183500800 | | innodb_log_files_in_group | 2 | | innodb_log_group_home_dir | ./ | | innodb_max_dirty_pages_pct | 90 | | innodb_max_purge_lag | 0 | | innodb_mirrored_log_groups | 1 | | innodb_open_files | 300 | | innodb_table_locks | ON | | innodb_thread_concurrency | 8 | +---------------------------------+------------------------+ 26 rows in set (0.00 sec)
To monitor the health of your MySQL server and test the effect of configuration changes, you can use the show status command. This lists information about caches, threads, queries, cached queries and more. The like operator can also be used to limit the results. For example:
mysql> show status like "qcache%"; +-------------------------+----------+ | Variable_name | Value | +-------------------------+----------+ | Qcache_free_blocks | 34 | | Qcache_free_memory | 16466312 | | Qcache_hits | 1313227 | | Qcache_inserts | 78096 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 3328 | | Qcache_queries_in_cache | 140 | | Qcache_total_blocks | 346 | +-------------------------+----------+ 8 rows in set (0.00 sec)
The MySQL Administrator GUI application can also be used to view the settings and monitor the status of your MySQL server. The MySQL Administrator is distributed under the GPL license.
The screenshot below shows the connection health screen in the MySQL Administrator application. It shows the number of connections, the traffic and the number of SQL queries at any given time. You can also add you own custom graphs to monitor specific values of your MySQL server.
MySQL Administrator - Connection Health
The screenshot below shows the MySQL status variables. It is equivalent to the show status command in the command-line interface described above.
MySQL Administrator - Status Variables
To change MySQL settings via the command line, edit the my.ini configuration file. MySQL must be restarted after changing the configuration.
To change MySQL settings via the MySQL Administrator GUI application, edit the settings on the Options page.
MySQL Administrator - Options
For an eZ Publish site, there are basically two different optimization scenarios: optimizing for either read performance or write performance. This section describes optimizing for read performance; the next section focuses on write performance.
Optimizing read performance is a matter of changing buffer sizes to make use of the memory available on the server. There are basically two different types of buffers: those dedicated to individual connections and those used globally by all connections.
The key buffer stores database indexes in memory. This buffer should be large enough to hold all indexes used by eZ Publish. This should be in the range of hundreds of megabytes. Sites with large amounts of data require larger key buffers. To allocate a buffer of 500MB:
key_buffer = 500M
To find a suitable value for the key buffer, investigate the status variables key_read_requests and key_reads. The key_read_requests is the total number of key requests served from the cache while the key_reads shows the number of times MySQL had to access the filesystem to fetch the keys.
The lower the number of key_reads the better. The more memory you allocate to the key buffer the more requests will be served from the cache. There will always be some keys that need to be read from disk (for example when data changes), so the value will never be zero. By comparing the two values you see the hit ratio of your key buffer. The key_read_requests should be much larger than the key_reads. 99% cached requests is a good number to aim for in a read-intensive environment.
The table cache tells MySQL how many tables it can have open at any one time. In SQL queries, several tables are typically joined. The rule of thumb is that you should multiply the maximum number of connections (described below) by the maximum number of tables used in joins. For example, if the maximum number of connections is set to 400, the table cache should be at least 400 * 10. The configuration setting below shows a table cache of 4000:
table_cache = 4000
MySQL sorts query results before they are returned. The sort buffer is per connection, so you must multiply the size of the sort buffer by the maximum number of connections to predict the server memory requirements. In our case we use a 3MB sort buffer with 400 max connections, which can use a total of 1.2GB of memory.
sort_buffer_size = 3M
MySQL has a limitation on the number of concurrent connections it can keep open. If you are using persistent connections in PHP, each process in Apache will keep a connection to MySQL open. This means that you need to set the number of max connections in MySQL to equal or greater than the number of Apache processes that can connect to the database. In a clustered environment, you must add up the processess on each webserver to determine the maximum. Setting sufficient max connections also ensures that users do not get errors about connecting to the MySQL database. The setting for 400 connections is shown below.
max_connections = 400
MySQL is capable of caching the results of a query. The next time the same query is executed the result is immediately returned, as it is read from the cache rather than the database. For a read-intensive site, this can provide a significant performance improvement.
To enable the query cache, set the type to "1":
query_cache_type = 1
You can set the maximim size of each query that can be cached. If the query result is larger than the query cache limit, the results will not be cached. This is normally set to 1M:
<span>query_cache_limit = 1M</span>
The amount of memory globally available for query caches is set with the query cache size setting. This should be fairly large, and should be increased in size for large databases.
<span>query_cache_size = 100M</span>
To tune the query cache, use the show status command. This can be used to determine which settings need to be altered and to see the effect of alterations. The show status command will show you if the query cache is heavily in use and if you have free memory, which indicates whether the query cache buffer settings should be increased or decreased.
mysql> show status like "qcache%"; +-------------------------+----------+ | Variable_name | Value | +-------------------------+----------+ | Qcache_free_blocks | 34 | | Qcache_free_memory | 16466312 | | Qcache_hits | 1313227 | | Qcache_inserts | 78096 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 3328 | | Qcache_queries_in_cache | 140 | | Qcache_total_blocks | 346 | +-------------------------+----------+ 8 rows in set (0.00 sec)
Based on the test case described in the section "Intel ICC-compiled MySQL", the table below shows the changes in the test execution time when various configuration settings were applied. Each test was run 5 times and the averate execution time is shown. We started with the default MySQL configuration and one by one applied the settings shown below. Finally, we compared the end result with the GCC version of MySQL with the same settings.
Configuration | Execution time |
---|---|
Default configuration | 9.18 sec |
innodb_flush_log_at_trx_commit = 0 | 8.45 sec |
innodb_buffer_pool_size = 700M | 7.63 sec |
innodb_additional_mem_pool_size=50M | 6.54 sec |
key_buffer = 1000M | 6.59 sec |
innodb_log_file_size = 175 | 6.50 sec |
innodb_log_buffer_size=8M | 6.53 sec |
GCC version of MySQL | 7.70 sec |
With the above settings, the performance improved from 9.18 seconds to 6.50 seconds, an increase of 2.68 seconds (29%). Therefore we can see that these optimization settings are good for write-intensive operations in MySQL.
When you are running eZ Publish in an environment where you want to use transactions, you must use the InnoDB storage engine in MySQL. The MyISAM storage engine is the default, but does not support transactions. (A "transaction" means that you can group a number of queries to the database and revert all of them if one fails, thus ensuring the integrity of the data.) InnoDB is more efficient than MyISAM when you have a write-intensive environment. Pure read operations are slower than with MyISAM, but when you have combinations of reads and writes InnoDB performs well.
MySQL has recently added a new storage engine type (called Falcon) that is supposed to be better than InnoDB for transaction-based web applications. We have not tested this engine and therefore cannot confirm the performance enhancement.
When using InnoDB, by default MySQL flushes data to disk when transactions are commited. This means that each transaction is flushed to disk when it occurs. This provides data security in case the database server crashes.
The default behaviour can be overridden with the following setting:
innodb_flush_log_at_trx_commit = 0
This setting makes MySQL flush the transaction cache every second instead of after each commit. This means transactions are not flushed to disk the moment they happen. While this improves performance, you must decide whether the risk of losing data due to a server crash is acceptable.
The InnoDB buffer pool caches table data and indexes. The larger the size of the buffer pool, the more data can be cached and the less disk I/O used. The InnoDB memory buffer pool in MySQL is by default quite low and should be made as large as 70% of the available memory. ("Available memory" means the memory not used by any other application or by another buffer in MySQL.) We increase this to 700MB to increase performance.
innodb_buffer_pool_size = 700M
The InnoDB additional mem pool is the buffer used to store internal data structures. The more tables in the database, the more memory is required. If the additional mem pool size is not large enough to store data about the InnoDB tables, MySQL will use system memory and will write warnings to the error log.
innodb_additional_mem_pool_size = 50M
The key buffer is a memory cache of the indexes in a MySQL database. A large key buffer means that more indexes fit in memory and thus there is a faster execution of queries using indexes. We increase this to 500MB; the default is 16MB.
key_buffer = 500M
The log buffer stores the transactions in memory before they are flushed to disk. By making the log buffer size larger, MySQL can wait longer before flushing the transaction log to disk and therefore use less disk I/O. The size recommended by MySQL is between 1MB and 8MB. We used 8MB for our test, which actually made MySQL a bit slower compared to the 1MB default. Therefore, we recommend somewhere in between, for example 4MB.
innodb_log_buffer_size = 4M
When running MySQL in a clustered environment with several webservers accessing the database server(s), we have noticed that the time used to contact the DNS server can be significant. We found that every 100 requests or so produced a significant slowdown of approximately 5 seconds when benchmarking read performance.
After closer inspection we found that it is the MySQL database initalization in eZ Publish that takes time. Periodically there is a delay of about 5 seconds in the MySQL login. The solution is to enter DNS information in the /etc/hosts file on the database server. We add entries for all associated machines in the cluster:
192.168.0.1 giza giza.ezcluster.ez.no 192.168.0.2 babylon babylon.ezcluster.ez.no 192.168.0.3 olympia olympia.ezcluster.ez.no 192.168.0.4 ephesus ephesus.ezcluster.ez.no 192.168.0.5 rhodes rhodes.ezcluster.ez.no 192.168.0.6 alexandria alexandria.ezcluster.ez.no
After the file is saved, there is no longer any slowdown due to DNS requests and we get optimal speed for any number of requests.
Mounting a filesystem with the noatime option disables the storage of information about file access. This information is normally not needed. This increases the performance of the filesystem as fewer disk writes need to be done.
We tested this method with an ext3 filesystem on SATA disks, generating 15 articles in one folder of an eZ Publish installation with 900.000 objects. We ran the test 3 times and recorded the average time:
While this setting does not seem to generate a significant performance improvement (about a 5.3% increase in MySQL processing time), we only did the test with concurrency of 1. If you increase the number of concurrent users the improvement should be more noticeable, as there will be fewer writes to the filesystem.
If you are running ReiserFS you can try applying the notail and data=writeback mount options. These options were not tested by us (because we were testing with an ext3 filesystem) but they should increase performance.
If you are running Linux kernel 2.6 or newer you can try changing the disk I/O schedulers. The deadline or cfg scheduling should provide best performance. This was not tested by us since our test environment was running on Debian Linux with version 2.4 kernel.
As this article has shown, there are numerous MySQL parameters that affect performance. Tuning the database for maximum performance is a matter of ensuring you have sufficient hardware resources, deciding which kind of operations you wish to optimize, and then testing various settings to see which yield the best results.
While this article has described the most common settings that affect performance and provided suggestions based on our testing and benchmarking, other systems might respond well to different values or the modification of other settings, due to variations in the database, the hardware and the environment.
Please add your own experiences and comments to this article. We will update the article in the future as new optimization technicques are discovered and tested.