I have two 300 GB files on different volumes:
- encrypted local backup
- encrypted ‘remote’ backup on NAS that is).
By design, these two files are identical in size and also mostly (>90%) identical in content...
Is there an efficient tool to „rsync“ these files, and only copy over the differing sections, so the target file becomes identical with the source?
Perhaps somethings that builds checksums of blocks to figure that out, I don't know... (anything more efficient than cp -f
... rsync would afaik also grab the entire source file to overwrite)
rsync
can be used to do this.--no-whole-file
or--no-W
parameters use the block-level sync instead of the file level syncing.Test case
Generated a random text files using
/dev/random
and large chunks of text file from websites as following. These 4 files are different in all contents.tf_2.dat
is our target file.Then copied them to different hard disk using
rsync
(the destination is empty).The following stat was received.
Now I merge, the files to make a new file which has approx 60% old data.
Now, I sync the two folders , this time using the
--no-W
option.You can see a large data is matched and speedup.
Next, I try again, this time I merge several shell files to the target (
tf_2.dat
) such that change is ~2%,And, again sync using
rsync
.We see a large match and speedup giving fast syncing.
You can also try to use https://bitbucket.org/ppershing/blocksync (disclaimer: I am the author of this particular fork). An advantage over rsync is that it reads the file only once (as far as I know rsync can't be convinced to assume two files are different without computing the checksum before it starts the delta transfer. Needless to say, reading 160GB hard drives twice isn't a good strategy). A note of caution -- the current version of blocksync works well over short-RTT connections (e.g., localhost, LAN and local WiFi) but isn't particularly useful for syncing over long distances.