I'm trying to work up a backup strategy for some clients, and am leaning towards duplicity for remote backup (already use rdiff-backup for internal/on location backups).
Is it reasonable to want a full backup every so often? Since duplicity increments forward, each incremental backup is relying on the previous increment, and all are relying heavily on the last full backup. Should that become corrupt, bad things happen. A related question: Does Duplicity test the incremental backups for consistency?
Assuming I do want a full backup every so often, how efficiently does duplicity create that full backup? Can/does it check file signatures and copy unchanged data from previous full backups/increments? Basically creating a new 'full' archive transferring new/changed data and merging existing unchanged data?
Right now my concern is that running a full backup is needed, but the consistent large bandwidth use of full backups will make this unreasonable for some clients.
I think it's reasonable to want a full backup every so often: most of my machines are configured to do one every few months. There's nothing magic about that number: the right value is going to depend on how much data you have, how fast it changes, how likely you are to want to restore from anything other than the most recent snapshot, how much traffic and storage costs you, and how paranoid you are. Other people might want a full backup every week.
Unless you do a full backup from time to time the archive size and recovery time will continue to grow.
I don't think duplicity specifically has a "check" command http://pad.lv/660895, but it would be nice if it did. It is very prudent to do a test restore every so often.
A related question is whether you should keep more than one backup chain. Again, it depends on the cost. One reason to keep one is that you could restore from it if the current chain is corrupt, either because of hardware failure, OS failure, or a duplicity bug. Of course if the old chain is very old, restoring from it may be of limited value.
Making a full backup always uploads a full copy of the data.
If the client concern is the fraction of bandwidth used, rather than traffic charges, you might want to run it under eg
trickle
.What you are asking for is called a synthetic full backup, which refers to the process of getting a full backup by merging an incremental backup with a previous full backup on the destination side (ie: the backup server).
I'm not familiar with Duplicity, but from their website it appears to not do synthetic full backups. You must keep all of the incrementals back to the full on which they're based. If that is the case, you will probably want to force a full backup every so often, because:
One interesting way to achieve synthetic fulls is to use rsync with the --link-dest=DIR option, or use rsnapshot. It will only store the differences between each incremental backup, but each one will appear to be full. When you delete any of them, it will automatically merge the incrementals appropriately. It does this through the magic of hard links, so the diffs will be file-based (either the file has changed and is included in the diff, or not).