Strange cache timeout with ezFS2FileHandler...

Author Message

Nicklas Lundgren

Sunday 27 September 2009 1:32:58 am

Hi folks,

I could need some help with a problem I have with quite big site, running 4.1.1.

Yesterday I noticed that the site perfomance had degraded severely over night. Checking with Top in Linux (Ubuntu 8.04) I noticed a load average of 20-40 (!). Normal load average is around 1...

While doing some tests I noticed that it is only full views that is slow to generate, like articles. Meaning that it is only what is generated in $module_result that pulls performance down.

In addition to that I found this in the cluster.log file:

[ Sep 27 2009 10:14:43 ] Generation of 'var/xxx/cache/content/xxx/1/100-3235a020a1be5aa5d5f30ff885349e94.cache' timed out
[ Sep 27 2009 10:14:50 ] Generation of 'var/xxx/cache/content/xxx/1/100-3235a020a1be5aa5d5f30ff885349e94.cache' timed out

and so on... all the time.

I am using the ezFS2FileHandler to make use of the stale cache advantages and it has been working good for months...

When checking in the var/content cache directory I also noticed that many of the file generation dates when doing ls -la in the system was '1977-05-25'...

I have now removed the whole content cache, and all new cache files are generated with a correct filedate. However, I still get loads of timeout warnings from the cache generation.

I really need to sort this problem out, cant run the site on such a high load.

If anyone has got an idea, ever so vague, you are welcome to share it with me, as I am running out of my own ones....

Otherwise, I guess next step is do dive in to the eZFS2FileHandler code....

Best regards,
/Nicklas

Nicklas Lundgren, Managing Director
Novitell AB, Sweden

Nicklas Lundgren

Sunday 27 September 2009 5:18:20 am

Just to let you all know,

I fell back to the eZ FSFileHandler after extensive testing. Then everything came up fully working again.

I guess I can live without the stale cache in the eZFS2FileHandler, but it is indeed worrying that this could happen.

I haven´t had time to check things more thoroughly, but I think this must be some kind of trouble arising when a combination of errors occur.

I mentioned the file dates on the server being 1977-05-25, and I guess this started the whole thing. Maybe something in eZFS2FileHandler is more sensitive than the eZFSFileHandler...

Nicklas Lundgren, Managing Director
Novitell AB, Sweden

Marko Žmak

Sunday 04 October 2009 5:47:16 pm

Just to inform you, I have also experienced the same problem. I have launched a big eZ site that has a lot of complex and consuming sql queries, so it takes some tome to generate the cache. And the same problem occured. The perfomance degraded severely and mysql even had to be restarted. I found the same entries in the cluster.log as you did.

It seems to me like the stale cache mechanism isn't working properly in cases of heavy server load.

I'm using eZ 4.1.3, is this maybe fixed in eZ 4.2?

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

Jérôme Renard

Monday 05 October 2009 2:57:50 am

Please check there is a "noatime" attribute in your /etc/fstab for the partition which stores all caches.

Nicklas Lundgren

Monday 05 October 2009 3:48:03 am

Hi Jérôme,

we DO have the noatime attribute set in the Fstab for the partition. Are you saying that the ezFS2FileHandler makes use of the accesstime attribute for the files?

We have the noatime attribute set to improve performance on the server. I have not seen any information in the eZ docs stating that this should be an issue.

Would you please explain this more thorouglhy?

Best regards,
/Nicklas

Nicklas Lundgren, Managing Director
Novitell AB, Sweden

Jérôme Renard

Monday 05 October 2009 4:55:19 am

Hi Nicklas,

we DO have the noatime attribute set in the Fstab for the partition. Are you saying that the ezFS2FileHandler makes use of the accesstime attribute for the files?

Yes eZFS2 uses touch() in quite some places in ezfs2filehandler.php [1], especially in eZFS2FileHandler::checkCacheGenerationTimeout() and eZFS2FileHandler::startCacheGeneration(). So you need to remove the "noatime" attribute in your /etc/fstab for the partition that stores the var/*/cache folder.

We have the noatime attribute set to improve performance on the server. I have not seen any information in the eZ docs stating that this should be an issue.

This is eZ Cluster specific, actually you can set everyhing with noatime except for cache files if you want to.

1. http://pubsvn.ez.no/nextgen/trunk/kernel/private/classes/clusterfilehandlers/ezfs2filehandler.php

Cheers :)

Marko Žmak

Thursday 08 October 2009 3:02:11 am

I found out what is the problem! At least in my case where we DON'T have the noatime attribute set for the partition.

There is an ini setting in site.ini under ContentSettings:

# maximum cache generation time. If a file stays in generation mode for more than
# this value in seconds, it is considered timed out (process died, for instance),
# and generation is taken over by the requesting process
CacheGenerationTimeout=60

So with stale cache, if the cache is not generated in 60 seconds, the cache generation process starts over again. It seems that for heavy sites 60 seconds is not enough, so the regeneration of cache repeats and repeats all over again.

Now I have set this limit to 120 sec for my site and it's working OK, and there are no timeout messages in the log.

But be careful not to set this setting to high, because it could generate a lot of web server's processes just waiting for cache generation. You should set this value to the maximum time a cache needs to be generated.
One way to detect the right value is by increasing the value by 10 sec, testing the site and looking at the log for cache generation timeout messages. And then repeating this until you find the value which doesn't generate timeout messages.

P.S. The REAL problem is that CacheGenerationTimeout setting is not mentioned anywhere on ez.no. It's only described in site.ini. This setting is very important and should be mentioned in the docs. Please do it.

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

Marko Žmak

Friday 09 October 2009 3:55:39 am

Well... it's not over yet. At least for me.

After I've setup the limit higher, everything started working OK, but when the first bigger load came, the server got crowded with a lot of apache processes that were just sleeping and doing nothing. And eventually the server's memory got exhausted, and apache had to be restarted.
And when we tested the web under the same heavy load, and without stale cache, everything was working OK.

So it seems to me that there's some problem with stale cache mchanism in eZ 4.1.

Did anyone have similar experiences?

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

Marko Žmak

Saturday 10 October 2009 10:45:28 am

I think that this behaviour is due to the following bug:

http://issues.ez.no/14850

I'll give it a try with CacheTweaks workaround.

--
Nothing is impossible. Not if you can imagine it!

Hubert Farnsworth

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.