Benchmarks don't look too great, more efficient caching is needed

Author Message

Gabriel Ambuehl

Wednesday 16 April 2003 8:45:38 am

I was doing some benchmarking today and I must say, things don't look too good (you'll certainly NOT survive slashdot effect without massive hardware).

Basically, assembling the pages is TOO expensive.

Most pages without much user interaction (i.e. forums) are rather static so how about a "export everything to static html" button?

Of course it's more elegant to have some sort of AI figure out what to cache but for raw performance in mostly static sites, rendering down the whole site to HTML would increase the performance by orders of magnitude. Effieciently caching a highly dynamic site is hard so an override for those of us who know we don't need this feature would be very nice. (I know some other CMS can do it but there's no way I'm gonna mess with the weird philosophy of OpenACS or OpenCMS)

eZ themselves reach 30 pageviews/s with hardware investments of ~6000EUR which is a bit low for my taste...

Visit http://triligon.org

Bård Farstad

Wednesday 16 April 2003 9:22:36 am

We are working on more efficient caching. Future versions of eZ publish, will have noticable better performance when the pages are cached.

When you consider ez.no you must consider that the pages are generated personalized. This means that permission checking is done. But there are many places we still can optimize, so expect performance improvements.

We've done setups with eZ publish where you generate pure HTML files and use apache rewrite rules to serve them. This means that pages that are cached will be served as static HTMl, but dynamic pages will still work. This is a rather simple setup and will handle very high load, but the cost is that you don't get permission checking on the served content.

--bård

Documentation: http://ez.no/doc

Gabriel Ambuehl

Wednesday 16 April 2003 10:48:45 am

What about a how to as to exporting static pages? I mean it would be simple enough to just have a public section that gets exported. The backend is hopefully not going to get slashdotted ;-).

I thought you guys might have some use for my benchmark results.

Hardware:
Shuttle XPC SK41G (VIA KM266 chipset)
AMD Athlon XP 2000+
80 GB Maxtor Diamond Max 2MB, 7200K RPM, D740X-6L
512 MB Infineon DDR RAM.
100 Mbit Realtek 8100C LAN

Software:
FreeBSD 4.8 RELEASE
Apache/ModSSL 1.3.27/2.8.whatever
PHP 4.3.1, GD, MySQL 3.23.56 client lib, DOMXML, iconv, gettext
ionCube PHP Accelerator (30MB cache, uses about 9.5MB)
MySQL 4.0.12 server running on the same machine.

Siege box is on the same 100Mbit LAN segment and accesses each of the frontpages of the sections in the demo site.

Transactions: 1663 hits
Availability: 100.00 %
Elapsed time: 300.12 secs
Data transferred: 13025435 bytes
Response time: 0.90 secs
Transaction rate: 5.54 trans/sec
Throughput: 43401.16 bytes/sec
Concurrency: 4.99
Successful transactions: 1663
Failed transactions: 0

Hitting a non existent page on the server, thus generating 404 replies with about 300bytes:
Transactions: 30415 hits
Availability: 100.00 %
Elapsed time: 60.21 secs
Data transferred: 9289480 bytes
Response time: 0.01 secs
Transaction rate: 505.14 trans/sec
Throughput: 154282.27 bytes/sec
Concurrency: 4.69
Successful transactions: 0
Failed transactions: 30415

Visit http://triligon.org

Gabriel Ambuehl

Wednesday 16 April 2003 12:01:05 pm

I was looking thru the class hierarchy for obvious places to optimize and stumbled across ezxml. Although it's not online anymore on ez.no, Google Cache says that ezXML is 100% compatible to libxml. So can I use libxml instead of ezXML where available? Or is that already being done?

I guess the other place to improve would be ez template. How do you guys stand with regard to reimplementations of parts of ezpublish in C?

Visit http://triligon.org

Jan Borsodi

Wednesday 16 April 2003 12:14:26 pm

We're currently working on the process cache for the template engine. When this is finished it will compile template code into pure PHP code.

The process cache system will try to generate as optimized code as possible with help from the operators and functions (This means that custom operators and functions will have to give hints to the system for optimized code). For instance the true() and false() will be turned into the builtin boolean type in PHP, the section function will be turned into foreach (or for) loops cutting out code which is not used for the specific loop.
This means that the amount of function calls, newing of objects and other data creations will be reduced significantly.

As for doing the template engine in C, maybe in the future, at the moment we as much optimizing as we can in pure PHP. Creating a PHP extension means that people have to compile in more options in their PHP module than they normally have, and a lot of people don't have that option.

--
Amos

Documentation: http://ez.no/ez_publish/documentation
FAQ: http://ez.no/ez_publish/documentation/faq

Paul Borgermans

Wednesday 16 April 2003 1:57:09 pm

Could you give us 4 words on how to give these hints for custom operators? We're currently building a glossary operator that will traverse dedicated node-id's for fetching (php code) the glossary terms, id's and definitions. Maybe with local caching (in php, not the template cacheblocks) if the speed impact is too high.

Tx

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Gabriel Ambuehl

Thursday 17 April 2003 3:56:49 am

I activated the process template cache line in the settings but as it stands, the system actually got slower!

Visit http://triligon.org

Paul Borgermans

Thursday 17 April 2003 4:45:50 am

Yes, it is not finished yet so leave it off.

Paul

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Scot Wilcoxon

Thursday 17 April 2003 8:38:53 pm

If page content doesn't change often, can't mod_rewrite file cache help out? (Writing HTML to file so it can be presented from there until the file is removed -- with rules which select desired static pages)

There are various PHP cache tools, but I'm sure whomever is working on eZ cache knows the subject.

Gabriel Ambuehl

Friday 18 April 2003 5:02:22 am

If your page is mostly static, squid or some other reverse proxy is probably the way to go.

But Squid doesn't have a clue at all if your site got updated. So IMHO, you'd have to tell Squid to dump its cache when you make changes.

And for forums, all bets are off. Then again, it's probably better to use a real forum app instead of ezpublish ATM.

WRT to the template engine, did you guys consider smarty.php.net? That one already compiles its templates into PHP code for speed.

I have used Gnome libxml/libxslt in a few C++projects and they really proved to be performant (I had 50k XSL stylesheets processing 100K of data in no time) especially when you cache the XSL tree in memory. Personally, I always found XSLT to be one of the most intuitive (save for the fact that you can't redefine variables GRR) ways to transform XML into something else. For the backend, eztemplate is fast enough (that one won't be hammered, that's for sure) but for the frontend, I might look into simply assembling all objects into an XML stream and pipe it thru XSL stylesheets...

Visit http://triligon.org

paco montoro

Friday 18 April 2003 6:55:06 pm

---
We've done setups with eZ publish where you generate pure HTML files and use apache rewrite rules to serve them. This means that pages that are cached will be served as static HTMl, but dynamic pages will still work. This is a rather simple setup and will handle very high load, but the cost is that you don't get permission checking on the served content.

>--bård
---
Do you have a document describing how to implement this, or can you give a description? I'm new to ezPublish (a few days), and this is the type of solution I'm looking for. In fact, I was musing about ezPublish as a "website compiler" where most pages are static and are regenerated only when a dynamic action (eg, admin action, new forum message) alters the content they represent.

The node system and class system of ezpublish is cool.

Thanks,
pacoit

Bård Farstad

Saturday 19 April 2003 2:38:45 am

The setup that we've done with eZ publish to generate static pages are very simple. The basic principle is that we crawl the eZ publish site and store the results in .html files. To to the crawling we used a tool called httrack. This would crawl the eZ publish site on a regular basis, e.g. we would crawl the frontpage and n levels/clicks down every 15 minutes. The rest would be generated nightly.

This would store the contents of eZ publish into a directory structure matching the URL requests. E.g. /your/html/cache/dir/content/view/42.html

For apache to serve the static files you need to configure some apache rewrite rules which will check if a caced file exists in:
/your/html/cache/dir/[url].html
if it exists apache is set to serve that page, if not the request is sendt to eZ publish.

This is a very simple setup but it will give you very high scalability on low end hardware. You can configure httrack to only take parts of your website and cache it. The drawback with this setup is that you will get a n minute publishing delay and login restricted pages will not work, so this setup is for "all is public" sites.

After the vacation I will try to write a tutorial with the details of the setup, rewrite rules and httrack configuration etc..

--bård

Documentation: http://ez.no/doc

Paul Borgermans

Saturday 19 April 2003 4:07:53 am

Great

I was already playing with wget to do the same, but for an interely other goal:

building a CMS at home where part of it would be a personal website to be uploaded to my ISP (who does not have Mysql/php for home users) as the frequency of updates would be low.

That's the "lower" application end for ez publish, the higher end applications are for my professional life.

Oops, ez publish everywhere ;-)

Paul

 

eZ Publish, eZ Find, Solr expert consulting and training
http://twitter.com/paulborgermans

Gabriel Ambuehl

Saturday 19 April 2003 5:41:28 am

Instead of exporting pages, why not give every page a lifetime so Squid et al can cache it for how long is appropriate for a given page?

Visit http://triligon.org

Alessandro Ranellucci

Tuesday 22 April 2003 12:36:45 am

Gabriel, what lifetime do you apply on a forum message? ;-)
BTW, I'd love to get that howto working.

Gabriel Ambuehl

Tuesday 22 April 2003 2:19:51 am

I think I've pointed out elsewhere that I wouldn't go for ezpublish when it comes to forums right now (unless you really really need the integration into your site) there are other, more performant alternatives (phpBB springs to mind although I will always hate it for the that fact that BB misses real threads with depth and all or Phorum and I'm sure there are many others.).

Visit http://triligon.org

Scot Wilcoxon

Tuesday 29 April 2003 12:03:29 am

Here's a silly idea that might help Forum caching:
I think most people see the "user" display of forums, and everyone sees the same stuff unless they're within an edit or post screen. So several cache methods could be used for the common "show a forum page" pages.

I think the difference in what most people see is the "Edit" button on their own articles. But how to cache a Forum page which can have the same content for many people, except those few who have articles showing on that page?

How about putting an "image" on every message, where the "Edit" button goes? So if the rest of the page is cached (through one of several techniques), the eZp server only has to be given the requests for those images and can decide whether to emit an "Edit" button or something else (background color image, 1x1 image, user type icon, avatar...).

So at least we could eliminate all the template and other page display processing and only have to emit the original page with icon URLs which tell eZp to check what icon to emit. Obviously the caching technique would have to be configured to not cache those icon URLs.

If the eZp server still has to deal with a flurry of such icon requests, it would be nice if it could quickly check if a particular user is the creator of an article and thus should get the "Edit" icon. The icon URL could contain the author's name, ID, or a hash value ("author is in user bucket 3481 of 4096 buckets"). It should be something which is in session info or the user environment (if logged in, there are user identifiers), so eZp doesn't need to check the database and only has to do a quick check to decide "not-the-Edit-icon" (as most users would get). Only those few who are the author, or have the same hash value as the author, would require more detailed examination by eZp. So most "icon" queries would have quick replies from the server.

Perhaps it would be simplest to put a user ID in each icon URL, as then only the author would be a match. Depends upon privacy design issues, but that shold not be much of a concern for a forum design which makes author identities visible anyway.

Code values in icon URLs are not a security issue if someone requests a URL directly, bypassing what is actually in forums, as this only indicates if the calling user is, or might be, the author. If an "Edit" icon is actually selected, then the normal authorization checks would take place. Only the display of the icon is what could be done without a DB hit.

Note that if an author ID is in a URL, and is in info which is available to the caching mechanism (such as mod_rewrite cache), the might-be-author test could actually be done within the cache. The rule would have the meaning "Show cached icon content/show/user_icon/john_henry.png if user is not john_henry, else show uncached icon (the latter causing a query to the eZp server)".

Yes, I know a rule such as the above could be used to show other users a different icon for each user (such as logo or avatar).

Is my concept misguided in some way?

J T

Saturday 30 August 2003 11:55:16 am

Bård Farstad wrote:
> After the vacation I will try to write a tutorial with the details
> of the setup, rewrite rules and httrack configuration etc..

I would really appreciate this documentation. I would like to implement eZ publish in a shared server environment for one of my clients. I got the demo site set up in less than an hour, and everything I see looks promising.

My one concern is that many eZ sites seem to run slowly. If all goes well and the client gets into the news, we could get into trouble with our web host for using too many resources. Or worse, the site would start generating errors.

This site will contain no more than 100 "pages" with no forums, so I would use the software mainly to facilitate content updates. We have no plans for login functionality, except in the admin area. So creating a 100% static version of the site would suit the client's needs.

Question: Why use httrack instead of wget? Wget is installed almost everywhere. What additional features does httrack have?

Thanks,
John

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.