Forums / Developer / What's the best way to do remote backups ?

What's the best way to do remote backups ?

Author Message

Xavier Dutoit

Sunday 09 October 2005 3:11:16 am

Hi,

I want to remote backup an ez site, but want to be as efficient as possible on the bandwidth and storage (ie don't backup something generated by ez, only the original things).

I have two problems :

1) rsyncing var/cache is going to copy the images variations (_large,_small...).
Do you know a pattern to exclude them ?

2) mysqldump swallows a lot of ressource, and the backup contains lots of datas I don't have to backup (eg the records for the search engine).
Do you know a better alternative ?

My goal is to be able to use it with rsnapshot.

Any idea ? How do you do it ?

X+

P.S. Running a clearcache before the backup isn't a good option, as I want to run the backup often.

http://www.sydesy.com

Kristian Hole

Wednesday 12 October 2005 7:42:30 am

1) You offcourse do not need the /var/cache or /var/myvar/cache. Otherwise you need everything, maybe except the image variations as you suggest. You should be able to figure out which files you dont need with some pattern (not really helping here, am i?)
2). You can skip the ezsession table. On a big site that will be a big table, with information that is not needed in your dump. I dont remember the syntax for skipping a table with mysqldump from the top of my head...

Kristian

http://ez.no/ez_publish/documenta...tricks/show_which_templates_are_used
http://ez.no/doc/ez_publish/techn...te_operators/miscellaneous/attribute

Łukasz Serwatka

Wednesday 12 October 2005 11:06:06 am

mysqldump has some disadvantages. First is that mysqldump locking tables when creating database dump, so will lock access to data of database for seconds or minuets, depends how big is database. So here is important time when you doing backups, dont use it when is huge traffic. Second is that mysldump works on MySQL server so doing database dump is slower than using mysqlhotcopy.

If you have enough space on disk you can use more efficient mysqlhotcopy which copy database files, so it is faster since not using connection with MySQL server.

You can use it like:

mysqlhotcopy -u user database destination

You can create shell script which uses standard copy and compress (create tar.gz files) programmes together with mysqlhotcopy.

Personal website -> http://serwatka.net
Blog (about eZ Publish) -> http://serwatka.net/blog

Xavier Dutoit

Wednesday 12 October 2005 12:07:13 pm

Ok, so far I'm backuping:

var/<siteaccess>/storage/original
var/<siteaccess>/images

I have image variations' into /images
toto.jpg
toto_reference.jpg
toto_small.jpg
toto_large.jpg

So I'm not quite sure about what is the purpose of /images-versioned

and even with kristian's useful tips ;) can't imagine any good criteria to avoid them (I could have an original image named xxx_large.jpg for instance).

As for the table, that's a good idea to do a hot copy to avoid the lock of the "main" table, but I'm not that keen on the idea of backuping a binary format as a matter of principle.

As the tables I think I could clean without any problem:
ezsearch_object_word_link
ezsearch_word
ezsession

But I'm not sure that's a big saving so far ;)

What do you think ? I'd really like to find a pattern on how to exclude the images variations.

X+

http://www.sydesy.com

Gabriel Ambuehl

Wednesday 12 October 2005 12:14:13 pm

rsync --exclude="*imagevariationname.jpg" for all imagevariations maybe?

Edit: this obviously wont backup the odd original file that actually is NAMED *imagevariationname ;)

However: this means upon restore, that ezpublish will have to regenerate ALL images likely resulting in pretty horrible performance upon initial page loads.

I'd just bloody do the rsync. The mysqldump compresses *very well* (rsync --compress even does it good there), so I'm not sure if I'd even bother with filtering it.

Around here, we just rsync complete servers at least daily (some much more often). In many cases that means in excess of 1 MILLION files getting synced. ezpublish is a somewhat minor offender in that picture ;).

As for mysql backup, if you can use it (realistically only on a LAN), mysql replication can provide you with damn near zero lag "backup" (it also means that you'll lose all data if you do bad query, though!).

Visit http://triligon.org

Björn [email protected]

Wednesday 12 October 2005 7:21:31 pm

2) mysqldump swallows a lot of ressource, and the backup contains lots of datas I don't have to backup (eg the records for the search engine).
Do you know a better alternative ?

Have you ever tried Mysql Administrator?

You can define backup projects with it. those are executed on a remote maschine. It will let you backup just the stuff you want.

Looking for a new job? http://www.xrow.com/xrow-GmbH/Jobs
Looking for hosting? http://hostingezpublish.com
-----------------------------------------------------------------------------
GMT +01:00 Hannover, Germany
Web: http://www.xrow.com/

Brendan Pike

Wednesday 12 October 2005 8:10:10 pm

This is an intersting topic we have been looking into also.

Can you tell me is it safe to simply rsync live mysql databases or could you do nightly dumps gzip them and rsync those?

www.dbinformatics.com.au

We are always interested in hearing from experienced eZ PHP programmers and eZ template designers interested in contract work.

Gabriel Ambuehl

Wednesday 12 October 2005 11:34:05 pm

rsyncing live database files is a rather bad idea. They aren't at all guaranteed to be in a consistent state while you read them.

If your site isn't huge, mysqldump won't take that long. Otherwise, do research mysqlhotcopy but it still seems in BETA.

Visit http://triligon.org

Ole Morten Halvorsen

Thursday 13 October 2005 12:18:06 am

Gabriel Ambuehl: If your site isn't huge, mysqldump won't take that long.

You are right. Dumping ez.no (which results in a 2.3GB .sql file) takes just less than 3 minutes.

Senior Software Engineer - Vision with Technology

http://www.visionwt.com
http://www.omh.cc
http://www.twitter.com/omh

eZ Certified Developer
http://ez.no/certification/verify/358441
http://ez.no/certification/verify/272578

Alexandre Abric

Thursday 13 October 2005 12:25:55 am

I would also go for mysql replication + mysqldump + tgz of the dump.

Gabriel Ambuehl

Thursday 13 October 2005 12:26:36 am

Most would consider that huge, even ;)

That comes down to somewhere around 13MB/s which is pretty fast (unless you get mighty fast SCSI drives ;) especially on disks that do other things.

If you leave the dumps in plain text (could of course also just feed them thru gzip to save space in the first place), rsync's delta algorithm will save you LOTS of transfer time.

Visit http://triligon.org

Lazaro Ferreira

Thursday 13 October 2005 9:47:49 am

Hi,

I would recomend you rdiff-backup, it is based on rsync syncro algorithm but it has some advantages like increments backups

http://www.nongnu.org/rdiff-backup/

For live MySQL Databases backup you can go with mysqlhotcopy, we have been using it for a while without problems

If you are interested in some automation tool then you can checkup backupninja a nice backup tool that automate backup calling the tools mentioned above to do real job

http://dev.riseup.net/backupninja/

Note: I'm assumed this is for a *nix environment

Lazaro
http://www.mzbusiness.com

Xavier Dutoit

Thursday 13 October 2005 12:02:25 pm

Hi,

Didn't know these tools. What I usualy use is rsnapshot, that sits on the top of rsync+ssh+hardlinks.

Highly appreciated, and for the devs, you don't bother anymore to backup your files before doing changes, you try and if screw it too badly, you retrieve a previous version and that's it.

Still looking for a pattern to avoid the imagevariation that works everytime. I'm afraid it implies changing things on the kernel.

X+

http://www.sydesy.com