Incremental backups to untrusted hosts

There’s no point in encryption, passphrases, frequent updates, system hardening and retinal scans if all the data can be snapped up from the backup server. I’ve been looking for a proper backup system that can safely handle incremental backups to insecure locations, either my personal server or someone else’s.

This excludes a few of the common solutions:

  • Unencrypted backups with rsync. Prevents eavesdropping when done over ssh, but nothing else.
  • Rsync to encrypted partitions/images on the server. Protects against eavesdropping and theft, but not admins and root kits. Plus it requires root access on the server.
  • Uploading an encrypted tarball of all my stuff. Protects against everything, but since it’s not incremental, it’ll take forever.

My current best solution: An encrypted disk image on the server, mounted locally via sshfs and loop.

This protects data against anything that could happen on the server, while still allowing incremental backups. But is it efficient? No.

Here is a table of actual traffic when rsync uploads 120MB out of 40GB of files, to a 400gb partition.

Setup Downloaded (MB) Uploaded (MB)
ext2 580 580
ext3 540 1000
fsck 9000 300

Backups take about 15-20 minutes on my 10mbps connection, which is acceptable, even though it’s only a minute’s worth of actual data. To a box on my wired lan, it takes about 3 minutes.

Somewhat surprisingly, these numbers didn’t vary more than ±10MB with mount options like noatime,nodiratime,data=writeback,commit=3600. Even with the terrible fsck overhead, which is sure to grow worse over time as the fs fills up, ext2 seems to be the way to go, especially if your connection is asymmetric.

As for rsync/ssh compression, encryption kills it (unless you use ECB, which you don’t). File system compression would alleviate this, but ext2/ext3 unfortunately don’t have this implemented in vanilla Linux. And while restoring backups were 1:1 in transfer cost, which you’ve seen is comparatively excellent, compression would have cut several hours off of the restoration time.

It would be very interesting to try this on other FS, but there aren’t a lot of realistic choices. Reiser4 supports both encryption and compression. From the little I’ve gathered though, it encrypts on a file-by-file basis so all the file names are still there, which could leak information. And honestly, I’ve never trusted reiserfs with anything, neither before nor after you-know-what.

ZFS supposedly compresses for read/write speed to disk rather than for our obscure network scenario, and if I had to guess from the array of awesome features, the overhead is probably higher than ext2/3.

However, neither of these two FS have ubiquitous Linux support, which is a huge drawback when it comes to restoring.

So a bit more about how specifically you go about this:

To set it up:

#Create dirs and a 400gb image. It's non-sparse since we really
#don't want to run out of host disk space while writing.
mkdir -p ~/backup/sshfs ~/backup/crypto
ssh vidar@host mkdir -p /home/vidar/backup
ssh vidar@host dd of=/home/vidar/backup/diskimage \
        if=/dev/zero bs=1M count=400000

#We now have a blank disk image. Encrypt and format it.
sshfs -C vidar@host:/home/vidar/backup ~/backup/sshfs
losetup /dev/loop7 ~/backup/sshfs/diskimage
cryptsetup luksFormat /dev/loop7
cryptsetup luksOpen /dev/loop7 backup
mke2fs /dev/mapper/backup

#We now have a formatted disk image. Sew it up.
cryptsetup luksClose backup
losetup -d /dev/loop7
umount ~/backup

To back up:

sshfs -C vidar@host:/home/vidar/backup ~/backup/sshfs
losetup /dev/loop7 ~/backup/sshfs/diskimage
cryptsetup luksOpen /dev/loop7 backup
mount /dev/mapper/backup ~/backup/crypto

NOW=$(date +%Y%m%d-%H%M)
for THEN in ~/backup/crypto/2*; do true; done #beware y3k!
echo "Starting Incremental backup from $THEN to $NOW..."
rsync -xav --whole-file --link-dest="$THEN" ~ ~/backup/crypto/"$NOW"

umount ~/backup/crypto
cryptsetup luksClose backup
losetup -d /dev/loop7
umount ~/backup/sshfs

If you know of a way to do secure backups with less overhead, feel free to post a comment!

14 thoughts on “Incremental backups to untrusted hosts”

  1. @ Anonymous
    From now on, so am I! Bah! It looked really promising, but for some reason it randomly died while uploading. Maybe the next version.

  2. I am using almost the same approach like yours, (bad i found your site after setting up my solution) :(

    Anyway have you thought instead of
    losetup /dev/loop7 ~/backup/sshfs/diskimage

    to use
    losetup -e aes /dev/loop7 ~/backup/sshfs/diskimage (or any other encryption like lofish or whatever)

    and skip the
    cryptsetup luksOpen /dev/loop7 backup
    and
    cryptsetup luksClose backup

    I don’t know but I think it will produce better performance :)
    Although haven’t done any tests it seem to be performing very well on my system

    Try it and let us know hope it helped
    great description by the way simple and clean

  3. agggrr lofish=blowfish
    and also the -e switch on losetup adds encryprion to the block device
    :)

  4. @ sassm

    losetup apparently has known security issues ( http://lkml.indiana.edu/hypermail/linux/kernel/0402.2/1137.html ), which can’t be easily patched because changing the defaults will break everyone’s losetup-encrypted fs.

    LUKS stores all the encryption settings used, so they can safely create more secure defaults in the future (as they have done previously). It’s also a plus that you don’t have to know anything other than the passphrase when the box with the decryption script dies in a fire and you have to restore the data from a livecd.

    CPU performance isn’t the bottleneck over the network anyways, and I doubt that it’d be faster given the same encryption algorithm. They most likely use the same crypto implementations from the kernel.

    Given those, I think Luks is the better choice :P

  5. @Vidar
    Hmm looks interesting what is said in the link you provided :O

    In my case I just wanted an encrypted container file to take backups on a remote shared hosting data center (as a backup of the backup)so ddint cared about watermarking being found since there is no illegal content anyway in there :)

    SO if I understood correct (I am using loop-aes anyway) someone could not brake it after it has been written to disk BUT could find watermarks on the encrypted file ? could someone verify this and explain it to me?

    Also regarding performance what i meant is not the cpu performance BUT if it will increase the throughput performance?

    My biggest problem with all this container files (mine is 350gb) is with the sparse space its not easy to transfer over network I mean the file is 350gb on ls but its sparse so only ~40gb real du data (the rest just empty)
    trying to rsync with –sparse is a nightmare
    trying cp over sshfs (with sprasce switch always)is a nightmare too
    trying tar -Scf – filename |gzip (or tar -Szcf) then netcat or sshfs or whatever transfer protocol prefered | untar on the other site
    is also a nightmare if the transfers stops you have to go all over again
    trying to tar -Scf (-S take sparse into consideration) and even gzip is also a solution BUT needs more space BUT has the advantage after a long waiting time you have a small ziped file that can reproduce sparse on the other site

    rsync takes a long time too
    SO till now no perfect solution what i do at the moment is once a while a month or so take the whole file (tared with -S and gziped) and everyday or every few hours rsync from within the encrypted file to another location :)
    or you can use a very inefficient network transfer like just ftp with resume the file to your place and move all the 350gb (of course you can do over sshfs or ssh and have some kind of compression)
    you could even dd into chunks the file and create many smaller files and re-glue them on the other site

    IF someone has a better ideal please reply
    Also regarding the question earlier
    ooops sorry that was a long typing :(
    once again great discussion and nice and clean tutorial
    waiting to hear your thoughts

  6. @sassm

    How about rsync –inplace? If the file you’re updating is already sparse, it will probably remain sparse. With delta transfers, only the changed portions will be transferred. Plus, you can stop and continue whenever you want.

    I don’t use a sparse file though. I’d hate to run out of space while doing a backup and corrupting the fs :O

    There’s no nice way of compressing encrypted data, but that’s a separate problem.

  7. @Vidar
    Its a interesting dialog we started here :)
    yes even with rsync -inplace it has to read the whole fake 350gb file and do the delta calculation to transfer it …. at least that’s how I interpreted it if I am wrong please let me know!
    although I will retry it!

    But could you tell me about

    “SO if I understood correct (I am using loop-aes anyway) someone could not brake it after it has been written to disk BUT could find watermarks on the encrypted file ? could someone verify this and explain it to me?”

    once again thanks for the interesting dialogue.

  8. oh regarding sparse files I agree with you but in my very specific case it better to use it.
    I am backing up the backup (backup the backup of the data)to a dirty cheap hosting that provides the “unlimited” space and “unlimited” bandwidth for hosting just for few bugs per month :D

    It just done for 2 reasons If i have a failed server I of course have backup but don’t have huge upload SO use the dirty cheap hosting to upload and restore faster!

    Although till now never, needed to restore but better have 2 backups and 2 possibilities to restore and have about 10 Mbyte/sec restore with only under $10 :)
    Have tested it did a fake restore and it had average 10.02 Mbytes/sec

  9. rsync –inplace does have to read the whole fake file and do the delta calculations on it, but does that matter much? It’s probably just a dozen extra cpu-seconds and 50kb upload.

    With –sparse (meaning no –inplace), it has to create a backup file to work on, which probably takes 5 minutes with 40gb of actual data.

    I hadn’t even heard about the loop-aes problem before, so I really don’t know. I use LUKS mainly for the settings management anyways.

    The sparsity of the file only affects disk usage though, not upload/download speed. And you can have a non-sparse file locally, and still use rsync –sparse to create a sparse equivalent on the remote.

  10. on post #7 you mention
    \losetup apparently has known security issues ( http://lkml.indiana.edu/hypermail/linux/kernel/0402.2/1137.html ), which can’t be easily patched because changing the defaults will break everyone’s losetup-encrypted fs. \
    but probably not losetup-aes

    Regarding post #12
    rsync -inplace if you connect through sshfs ass bellow
    we have computer A which connects to computer B through sshfs

    IF I am not wrong (please correct me if I am ) rsync has to get the part of the file over sshfs (from B)to computer A so the rsync can do the calculations (locally since sshfs emulates a local fs) so actually you transfer the whole file localy and then process it or at least to memory isn’t it?

    please tell me your thoughts!

  11. Freenode user ‘consolers’ suggests using good, old dump.

    This is fs-specific, requires the device to be umounted, and will not incrementally transmit file modifications, but is still likely to be much quicker than my rsync approach.

Leave a Reply