Forums / Suggestions / Sharding ez publish database

Sharding ez publish database

Author Message

Remigijus Kiminas

Tuesday 02 June 2009 10:01:38 pm

Hellow,

I'm wondering is it possible to implement sharding in ez publish database model. I mean splitting main tables into smaller one, like youtube, facebook and many others does. Idea is simple instead of storing all record's in one monolitic database split records acroos smaller tables, databases.

Why this is needed ?
It would give unlimited scalability. Like currently i realy don't know how could single mysql server handle database with millions of records of content object attributes...

How can this be archieved ?
Some ideas there. Actualy in one of extension i implemented range sharding it's quite easy.
http://blog.maxindelicato.com/2008/12/scalability-strategies-primer-database-sharding.html
If it would be implemented, i thik ez publish would become just perfect :)

Any ideas ?

---------------------------------------------
Remigijus Kiminas

Christian Rößler

Wednesday 03 June 2009 12:37:02 am

Hy,

it would be a simple thing to try out partitioning.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html

instead of storing millions of ezcontentobject_attributes in one physical table, with partitioning you are able to partition the table into 'virtually' multiple ones, each one holding a subset of all the data.
Nothing has to be changed on the ezpublish-side, as the manipulated/partitioned table looks like any other table, but the (ie. mysql) DBMS takes care of managing the data... pretty complex thing, try to read in. But shurely will make things a bit faster :)

Partitioning: store ezco_attrib from id a to c in part A, from id d to e in part B ... and so on.
It's like partitioning a harddisk...

----------

The second thing you could try is your mentioned sharding. Sharding has to be implemented in the model part of mvc. So ezpublish needs to me modified. This is a more complex part and nearly not possible, as it breaks a lot of code/logic...

----------

A third solution would be to use clustering-feature or master-slave feature.
Master-slave feature is simple to activate as it seems to be active-code in ezpublish. one db-server is used for read-operations, the other one for write-operations which get replicated on the 'read-only'-server.

But you seemed to be interested in sharding - so solution A is an option (partititoning) and the sharding feature itself is nice but very complicated to implement on such a complex system as ezpublish. Also remember that ezP already exists. Sharding is more easy to implement when beginning a new project. Thus you don't have to take care for any upgrades/downgrade issues...

One thing that came in my mind right now: memcache. that is such a thing that would significantly improve performance, but also needs alteration of ezPublish-models (persistent db layer)...

just my 2 cents.

if you intend to write such modifications, let me know. I'm interested in it (not needing it, but extremely interested how you'll solve it)

Christian

Hannover, Germany
eZ-Certified http://auth.ez.no/certification/verify/395613

Gaetano Giunta

Wednesday 03 June 2009 12:48:47 am

Maybe not as cheap as mysql to install, configure or maintain, but Oracle has had table-partitioning and server-clustering (rac) for ages.
They also claim that their handling of blobs is excellent, which should make it a good platform for eZ Publish "cluster mode".
It might be worth a try, if you're going to have a huge eZ Publish installation and money is not a problem.

I thing in general it is a good idea to let the db do the scaling instead of pushing more complexity into the web layer...

Principal Consultant International Business
Member of the Community Project Board