Thursday 17 June 2010 9:58:28 am
Sorry for my long post, but if these questions get some good answers I think it will be a benefit for many of us. I would appreciate to get elaborated answers from the folks that know very well the internal structure of eZP, but also any other good ideas and suggestions are welcome. So... For last few days I was looking for a way to optimze performance of a high-volume, high-traffic news portal. The main problem is that there are many places on this site where the latest news should be displayed, and should be refreshed very often (every 1-2 minutes). I use all of the eZ caching mechanisms, but in some moments (during the daily visit peak) even this is not enough. I should mention that there are also many different registered users browsing the site. The problem is that fetching 10 latests articles (sorted by the published time) produces big SQL queries (in MySQL) with joins on the ezcontentobject table and ezcontentobject_attribute table. This two tables are the biggest in the database, having aproximately: ezcontentobject ~ 200.000 rows (and growing rapidly) ezcontentobject_attribute ~ 2.000.000 rows (and growing rapidly) and most of this rows (~90%) belogn to the news objects. The queries that eZ produces with this two tables create quite a large temporary table on the disk (I have investigated and due to the structure of the tables there's no way to avoid it) which slows down the database server pretty much. Also, the concept of doing such big operations to get only 10 latest news is really inefficient. So in my quest to eliminate this big queries I came to several ideas and questions that I would like to discuss with the eZP developers... Do an SQL query "only" on ezcontentobject One idea is to make an SQL query (using a custom made eZ operator) that works on ezcontenobject table, without joins on ezcontentobject_attribute table, and which would get only the object/node IDs of the latest news, and then: a) Fetch (using eZ fetch operator) only the objects with these IDs. b) Fetch the objects with these IDs, one by one (using foreach with object/node fetch) Q: Would one of this sollutions give some improvement, and how much? Would the fetching in sollution a) still produce a join on ezcontentobject and ezcontentobject_attribute? P.S. I am aware that this query should have also joins to ezcontentobject_version, and other tables, so it cannot be only on ezcontentobject, but that doesn't matter because the main goal is to eliminate joins with ezcontentobject_attribute table. Possiblites of eZP APIs Q: Are there some eZP APIs that can be used to fetch only the object IDs, and that would not produce joins on ezcontentobject and ezcontentobject_attribute tables? If you get the idea, this is en attempt to make the same thing as in question 1), but using eZ DB abstraction layer instead of raw SQL queries. Use eZ Find for fetching Q: Could eZ Find be used to fetch the latest news? Would this make some improvement? Of course, with the assumption that the SOLR index is updated frequently. Which brings some other questions: Q: Can SOLR index be updated upon publishing of every object? What implication for performance would this bring? What does load_data_map parameter actually do? In Lukas Serwatka's eZP 4.2 hidden gems (http://serwatka.net/blog/ez_publish_4_1_and_4_2_hidden_gems) I found this: The load_data_map parameter for tree and list fetch functions is now only enabled by default if 15 nodes or less are fetched. It can still be specifically enabled or disabled per fetch. Q: So how does this parameter affect the database queries? If it's disabled, does it create joins on ezcontentobject and ezcontentobject_attribute tables or not? Optimizing indexes Q: Is there a way to optimize database indexes on ezcontentobject and ezcontentobject_attribute tables so that this queries run faster? Has anyone had some experience with this?
--
Nothing is impossible. Not if you can imagine it!
Hubert Farnsworth
|