Forums / General / How big is really big?

How big is really big?

Author Message

Hugo Sandoval

Monday 29 November 2010 7:31:28 am

Good morning.

This question is about objects numbers and attributes. This is maybe more like a survey than a support question. I have an ezpublish implementation with more than 900K objects, 1 of them have more than 40 attributes defined and is growing with every implementation. The ezcontentobject_attribute table have more than 45.000.000 records (I know that search are made it in another table), the database engine used is postgresql 8.3 and the server memory is 8 GB. This configuration is working very fine with 40 operators editing objects every minute, some custom reports are generated, etc. I am wondering about how much can grow the implementation.

What is your experience about it?

.·. .·. .·. .·. .·. .·. .·.
http://www.softwarelibre.com.ve/

Jérôme Vieilledent

Monday 29 November 2010 8:01:04 am

Hi Hugo

I guess that as long as your hosting architecture supports it, it is fine. However, you should be careful with database load, which might grow as queries can be longer and longer, especially if you use the default search engine and filtered fetches.

My advise would be to use eZ Find for regular fetches and search as this will lighten your database a lot (no more complex queries DB, only fetch by node id from Solr filtered results).

You should also make an extensive use of cache (view cache, cache blocks, static cache) and consider using a reverse proxy cache such as Varnish.

With that your system would be more scalable :)

Hugo Sandoval

Monday 29 November 2010 8:17:52 am

"

Hi Hugo

I guess that as long as your hosting architecture supports it, it is fine. However, you should be careful with database load, which might grow as queries can be longer and longer, especially if you use the default search engine and filtered fetches.

My advise would be to use eZ Find for regular fetches and search as this will lighten your database a lot (no more complex queries DB, only fetch by node id from Solr filtered results).

You should also make an extensive use of cache (view cache, cache blocks, static cache) and consider using a reverse proxy cache such as Varnish.

With that your system would be more scalable :)

"

Thank you very much for your answer.

Indeed, I must do a lot of work. The use of cache is minimal, because i still not understand deeply this, and sometimes with dynamic reports, the cache must be disabled.

About ezsearch or ezfind, ezsearch is working fine, because i defined searchable too few attributes, only the strictly needed, and still need to disable many others. The vaccum is executed every 3 hours and analyze every 6 hours in Postgresql. As i said: i am really wondering about experiences (or successful cases) with big ezpublish implementations... maybe I must call it "ezperiences" (Experiences in eZpublish ;-)

.·. .·. .·. .·. .·. .·. .·.
http://www.softwarelibre.com.ve/

Gaetano Giunta

Monday 29 November 2010 8:40:35 am

45M is not bad, by any standard. And you claim it works without excessive caching? Congrats!

About improving performance for the all-encompassing ezcontentobject_attribute table, I think using a partitioned setup would be a great way to scale. We just need to figure out what is the best set of keys to partition...

Principal Consultant International Business
Member of the Community Project Board

Nicolas Pastorino

Monday 29 November 2010 8:53:02 am

Thanks for sharing Hugo, this is real-life, field experience that our Community should hear more about !

Cheers,

--
Nicolas Pastorino
Director Community - eZ
Member of the Community Project Board

eZ Publish Community on twitter: http://twitter.com/ezcommunity

t : http://twitter.com/jeanvoye
G+ : http://plus.tl/jeanvoye

Hugo Sandoval

Monday 29 November 2010 9:32:38 am

"

45M is not bad, by any standard. And you claim it works without excessive caching? Congrats!

About improving performance for the all-encompassing ezcontentobject_attribute table, I think using a partitioned setup would be a great way to scale. We just need to figure out what is the best set of keys to partition...

"

:-( I didn't pay attention to partition configuration. The partition is standard ext3, I did think that maybe an JFS or XFS would be better, but i must implemented the app ASAP some weeks ago. In the following links you can see 2 snapshots of the tables FYI, if you see any weird please advice ...

http://servicios.solventar.com.ve/images/phppgadmin1.png

http://servicios.solventar.com.ve/images/phppgadmin2.png

As you can see, the ezsearch_object_word_link isn't big.

Partition table:

centauroXX:/var/www # df
S.ficheros         Bloques de 1K   Usado    Dispon Uso% Montado en
/dev/cciss/c0d0p2     15480832   3897120  10797332  27% /
udev                   4024084       148   4023936   1% /dev
/dev/cciss/c0d1p2     17220220   3399540  12945944  21% /usr
/dev/cciss/c0d0p5      2063504    127924   1830760   7% /tmp
/dev/cciss/c0d0p6     46781732  15956952  28448368  36% /var/lib/bbdd
/dev/cciss/c0d1p1    123854812  24510660  93052700  21% /var/lib/pgsql
/dev/sda1            488148160  74179552 413968608  16% /media

The partition is:

/dev/cciss/c0d1p1    123854812  24510660  93052700  21% /var/lib/pgsql

and fdisk -l:

Disco /dev/cciss/c0d1: 146.7 GB, 146778685440 bytes
255 heads, 63 sectors/track, 17844 cylinders
Units = cilindros of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000d5c0b

        Disposit. Inicio    Comienzo      Fin      Bloques  Id  Sistema
/dev/cciss/c0d1p1               2       15666   125829112+  83  Linux
/dev/cciss/c0d1p2           15667       17844    17494785   83  Linux

Top with online users working :

top - 12:57:55 up 6 days, 50 min,  1 user,  load average: 6.39, 6.47, 6.52
Tasks: 137 total,   3 running, 133 sleeping,   0 stopped,   1 zombie
Cpu0  : 15.6%us,  3.7%sy,  0.0%ni, 51.8%id, 28.9%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 23.1%us,  1.9%sy,  0.0%ni,  1.9%id, 73.1%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 13.6%us,  3.3%sy,  0.0%ni, 23.9%id, 59.1%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 29.1%us,  5.0%sy,  0.0%ni,  2.6%id, 62.9%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   8048172k total,  7970412k used,    77760k free,    75972k buffers
Swap:  6289436k total,      912k used,  6288524k free,  7351988k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                              
 1324 postgres  20   0 2161m 1.9g 1.9g D   22 25.3  11:18.78 postmaster                                                           
23746 postgres  20   0 2161m 2.0g 2.0g D   18 25.6  19:41.02 postmaster                                                           
17119 postgres  20   0 2161m 1.8g 1.7g R   16 22.8   3:30.40 postmaster                                                           
19443 wwwrun    20   0  292m  30m 4424 S   15  0.4   0:06.90 httpd2-prefork                                                       
19599 postgres  20   0 2169m 1.7g 1.7g D   13 22.2   0:29.56 postmaster                                                           
19699 wwwrun    20   0  284m  23m 4440 S    2  0.3   0:02.68 httpd2-prefork                                                       
19774 wwwrun    20   0  277m  16m 4288 S    2  0.2   0:01.04 httpd2-prefork                                                       
19902 wwwrun    20   0  277m  15m 3780 S    2  0.2   0:00.40 httpd2-prefork                                                       
17103 root      20   0  234m  27m 5644 S    1  0.4   0:01.40 php                                                                  
 1562 root      39  19     0    0    0 S    0  0.0  10:37.76 kipmi0                                                               
 1758 root      15  -5     0    0    0 D    0  0.0   7:59.93 kjournald                                                            
    1 root      20   0  1064  412  348 S    0  0.0   0:03.80 init

I am posting this for information about the implementation.

.·. .·. .·. .·. .·. .·. .·.
http://www.softwarelibre.com.ve/

Hugo Sandoval

Monday 29 November 2010 9:51:49 am

"

Thanks for sharing Hugo, this is real-life, field experience that our Community should hear more about !

Cheers,

"

Yes, that is the point, share experiences with too many records per table and performance. I did read about if ezpublish is good or not with some implementations, and maybe this kind of posts can help. Anyone else want to share eZperience? :-)

.·. .·. .·. .·. .·. .·. .·.
http://www.softwarelibre.com.ve/

Gaetano Giunta

Monday 29 November 2010 10:08:24 am

I was thinking about 'database table partitioning' rather than 'partition table'.

ezsearch_object_word_link is iirc only used by the plain serach engine, and not when ezfind is installed. If this was confirmed, and since you said that ezfind is in use, you might just delete all rows in that table.

As for "standard" server tuning, I am sure you can find plenty of tips both in this forum and elsewhere. Random ideas:

- connect to db via a pipe insted of using tcp if it's on the same machine

- mount disk partitions -noatime

- disable all unused service (avahi, cups and the like)

- disable all unused apache modules and php extensions

Principal Consultant International Business
Member of the Community Project Board

Hugo Sandoval

Tuesday 30 November 2010 3:55:11 am

"

I was thinking about 'database table partitioning' rather than 'partition table'.

ezsearch_object_word_link is iirc only used by the plain serach engine, and not when ezfind is installed. If this was confirmed, and since you said that ezfind is in use, you might just delete all rows in that table.

As for "standard" server tuning, I am sure you can find plenty of tips both in this forum and elsewhere. Random ideas:

- connect to db via a pipe insted of using tcp if it's on the same machine

- mount disk partitions -noatime

- disable all unused service (avahi, cups and the like)

- disable all unused apache modules and php extensions

"

Sorry, about database table partition, i'd not change anything, except memory, max_fsm_pages in postgresql.conf and some others. Search is plain.

What do you mean with "connect to db via a pipe"?

.·. .·. .·. .·. .·. .·. .·.
http://www.softwarelibre.com.ve/

André R.

Friday 10 December 2010 2:24:09 am

Thanks for sharing, shows why people should strongly consider Postgres I guess.. (Don't think mysql would handle that size well with eZ Publish data structure).

eZ Online Editor 5: http://projects.ez.no/ezoe || eZJSCore (Ajax): http://projects.ez.no/ezjscore || eZ Publish EE http://ez.no/eZPublish/eZ-Publish-Enterprise-Subscription
@: http://twitter.com/andrerom