Quick EC2 Backups with Duplicity
I’ve been doing online EC2 backups on my Gentoo box for a while, but switched to Duplicity a few months ago and have been very happy with the results. Setting this up took some trial and error, so I figured I’d share my config in case others find it useful. But first, here’s why I switched…
Duplicity is based on librsync, and is designed to be a simple command-line-based tool. Due to librsync the incremental backups it generates are VERY small – if you change one byte in a 2GB file it sends only a few bytes, just like rsync. It supports encryption and EC2 upload (and a bunch of other options) natively, which means I can ditch a bunch of shell scripting. It also stores encrypted manifests on the backup destination, which means that if your local index gets out of sync it can quickly synchronize and continue sending incrementals.
My configuration makes use of an alternates-filename patch which you can find in the duplicity bugzilla, or obtain from the rich0 Gentoo overlay. v0.6.24 of Duplicity will actually implement this in a slightly different manner, so if you do use this patch be aware that you’ll need to do new full backups after you upgrade. You can just drop the –alternate-filenames option and not use the patch, but if you do so you won’t be able to configure Amazon S3 to archive your difftar files to Glacier.
Once you install duplicity from either Gentoo portage or using my overlay, you’ll need your EC2 credentials. Then to execute a backup you can run:
AWS_ACCESS_KEY_ID=foo AWS_SECRET_ACCESS_KEY=bar TMPDIR=/lots-of-space/tmp duplicity --encrypt-key ABCDEF01 --archive-dir /var/cache/duplicity \ --exclude-if-present .noonlinebackup \ --exclude-filelist /etc/duplicity-configs/dup-online1-exclude \ --include /etc --include /home --include /root \ --exclude '**' --full-if-older-than 60D --volsize 500 --asynchronous-upload --alternate-filenames / s3+http://bucket/path/
You should substitute your AWS credentials. Setting TMPDIR is optional, but if you’re running tmpfs you might not have space for all the stuff you’re backing up. Setting archive-dir is also optional, but the archives are expendable and I don’t want it in my home getting backed up – if they get deleted duplicity will automatically fetch them back from EC2 (which will cost you money – so don’t delete them needlessly).
Exclude-if-present means that if you touch .noonlinebackup in a directory it won’t back up that directory. The exclude list is just a file with one path per row which will be excluded. You can have as many include options as you like. The exclude/include/exclude followed by a backup on / was the only way I could get it to back up paths relative to root while excluding files that were otherwise in the include paths. Otherwise the default include/exclude order wasn’t terribly helpful, but I might not be grokking the intent.
Full-if-older-than tells duplicity to create a new full backup every 60d. You can separately run a variation of this to do cleanup:
AWS_ACCESS_KEY_ID=foo AWS_SECRET_ACCESS_KEY=bar TMPDIR=/lots-of-space/tmp duplicity --encrypt-key ABCDEF01 --archive-dir /var/cache/duplicity \ remove-all-but-n-full 2 --force s3+http://bucket/path/
This will delete all but the last two full backups, and any incrementals that depend on them.
As far as how it performs, this log speaks for itself:
Reading filelist /etc/duplicity-configs/dup-online1-exclude Sorting filelist /etc/duplicity-configs/dup-online1-exclude Local and Remote metadata are synchronized, no sync needed. Last full backup date: Tue Jan 21 03:11:45 2014 --------------[ Backup Statistics ]-------------- StartTime 1391501490.67 (Tue Feb 4 03:11:30 2014) EndTime 1391502084.80 (Tue Feb 4 03:21:24 2014) ElapsedTime 594.13 (9 minutes 54.13 seconds) SourceFiles 332755 SourceFileSize 32503585361 (30.3 GB) NewFiles 456 NewFileSize 22332746 (21.3 MB) DeletedFiles 11 ChangedFiles 76 ChangedFileSize 2181064980 (2.03 GB) ChangedDeltaSize 0 (0 bytes) DeltaEntries 543 RawDeltaSize 110980087 (106 MB) TotalDestinationSizeChange 38473179 (36.7 MB) Errors 0 -------------------------------------------------
30GB of source data with 2GB of files that changed, and this was protected by transferring 36MB of data with the whole operation completed in 10 minutes.
By using alternative filenames you can set a prefix in S3 to archive your difftar files to glacier. I wouldn’t archive anything else – the manifests do not account for much space and if you lose your local copy it will re-download them (which is very expensive for glacier if not carefully managed). The future Duplicity release will allow you to specify separate prefixes for each file type created so that you can just put your difftars in a separate directory.