Incremental backups to untrusted hosts
There’s no point in encryption, passphrases, frequent updates, system hardening and retinal scans if all the data can be snapped up from the backup server. I’ve been looking for a proper backup system that can safely handle incremental backups to insecure locations, either my personal server or someone else’s.
This excludes a few of the common solutions:
- Unencrypted backups with rsync. Prevents eavesdropping when done over ssh, but nothing else.
- Rsync to encrypted partitions/images on the server. Protects against eavesdropping and theft, but not admins and root kits. Plus it requires root access on the server.
- Uploading an encrypted tarball of all my stuff. Protects against everything, but since it’s not incremental, it’ll take forever.
My current best solution: An encrypted disk image on the server, mounted locally via sshfs and loop.
This protects data against anything that could happen on the server, while still allowing incremental backups. But is it efficient? No.
Here is a table of actual traffic when rsync uploads 120MB out of 40GB of files, to a 400gb partition.
|Setup||Downloaded (MB)||Uploaded (MB)|
Backups take about 15-20 minutes on my 10mbps connection, which is acceptable, even though it’s only a minute’s worth of actual data. To a box on my wired lan, it takes about 3 minutes.
Somewhat surprisingly, these numbers didn’t vary more than ±10MB with mount options like noatime,nodiratime,data=writeback,commit=3600. Even with the terrible fsck overhead, which is sure to grow worse over time as the fs fills up, ext2 seems to be the way to go, especially if your connection is asymmetric.
As for rsync/ssh compression, encryption kills it (unless you use ECB, which you don’t). File system compression would alleviate this, but ext2/ext3 unfortunately don’t have this implemented in vanilla Linux. And while restoring backups were 1:1 in transfer cost, which you’ve seen is comparatively excellent, compression would have cut several hours off of the restoration time.
It would be very interesting to try this on other FS, but there aren’t a lot of realistic choices. Reiser4 supports both encryption and compression. From the little I’ve gathered though, it encrypts on a file-by-file basis so all the file names are still there, which could leak information. And honestly, I’ve never trusted reiserfs with anything, neither before nor after you-know-what.
ZFS supposedly compresses for read/write speed to disk rather than for our obscure network scenario, and if I had to guess from the array of awesome features, the overhead is probably higher than ext2/3.
However, neither of these two FS have ubiquitous Linux support, which is a huge drawback when it comes to restoring.
So a bit more about how specifically you go about this:
To set it up:
#Create dirs and a 400gb image. It's non-sparse since we really #don't want to run out of host disk space while writing. mkdir -p ~/backup/sshfs ~/backup/crypto ssh vidar@host mkdir -p /home/vidar/backup ssh vidar@host dd of=/home/vidar/backup/diskimage \ if=/dev/zero bs=1M count=400000 #We now have a blank disk image. Encrypt and format it. sshfs -C vidar@host:/home/vidar/backup ~/backup/sshfs losetup /dev/loop7 ~/backup/sshfs/diskimage cryptsetup luksFormat /dev/loop7 cryptsetup luksOpen /dev/loop7 backup mke2fs /dev/mapper/backup #We now have a formatted disk image. Sew it up. cryptsetup luksClose backup losetup -d /dev/loop7 umount ~/backup
To back up:
sshfs -C vidar@host:/home/vidar/backup ~/backup/sshfs losetup /dev/loop7 ~/backup/sshfs/diskimage cryptsetup luksOpen /dev/loop7 backup mount /dev/mapper/backup ~/backup/crypto NOW=$(date +%Y%m%d-%H%M) for THEN in ~/backup/crypto/2*; do true; done #beware y3k! echo "Starting Incremental backup from $THEN to $NOW..." rsync -xav --whole-file --link-dest="$THEN" ~ ~/backup/crypto/"$NOW" umount ~/backup/crypto cryptsetup luksClose backup losetup -d /dev/loop7 umount ~/backup/sshfs
If you know of a way to do secure backups with less overhead, feel free to post a comment!