How can I use docker without sudo?

Question

Frank Nocke

Asked: 2017-02-26 05:56:42 +0800 CST2017-02-26 05:56:42 +0800 CST 2017-02-26 05:56:42 +0800 CST

Sync two largely identical huge Files

772

I have two 300 GB files on different volumes:

encrypted local backup
encrypted ‘remote’ backup on NAS that is).

By design, these two files are identical in size and also mostly (>90%) identical in content...

Is there an efficient tool to „rsync“ these files, and only copy over the differing sections, so the target file becomes identical with the source?

Perhaps somethings that builds checksums of blocks to figure that out, I don't know... (anything more efficient than cp -f... rsync would afaik also grab the entire source file to overwrite)

2 Answers

Voted

ankit7540 · Answer 1 · 2017-02-26T07:43:15+08:00

rsync can be used to do this.

--no-whole-file or --no-W parameters use the block-level sync instead of the file level syncing.

Test case

Generated a random text files using /dev/random and large chunks of text file from websites as following. These 4 files are different in all contents. tf_2.dat is our target file.

~/logs/rs$ ls -tlh    
-rw-rw-r-- 1 vayu vayu 2.1G  二  25 23:11 tf_2.dat
-rw-rw-r-- 1 vayu vayu 978M  二  25 23:11 a.txt
-rw-rw-r-- 1 vayu vayu 556K  二  25 23:10 file2.txt
-rw-rw-r-- 1 vayu vayu 561K  二  25 23:09 nt.txt

Then copied them to different hard disk using rsync (the destination is empty).

rsync -r --stats rs/ /mnt/raid0/scratch/t2

The following stat was received.

Number of files: 5 (reg: 4, dir: 1)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 4
Total file size: 3,260,939,140 bytes
Total transferred file size: 3,260,939,140 bytes
Literal data: 3,260,939,140 bytes
Matched data: 0 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 3,261,735,553
Total bytes received: 92

sent 3,261,735,553 bytes  received 92 bytes  501,805,483.85 bytes/sec
total size is 3,260,939,140  speedup is 1.00

Now I merge, the files to make a new file which has approx 60% old data.

cat file2.txt a.txt >> tf_2.dat

Now, I sync the two folders , this time using the --no-W option.

rsync -r --no-W --stats rs/ /mnt/raid0/scratch/t2

Number of files: 5 (reg: 4, dir: 1)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 4
Total file size: 4,289,593,685 bytes
Total transferred file size: 4,289,593,685 bytes
Literal data: 1,025,553,047 bytes
Matched data: 3,264,040,638 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 1,026,127,265
Total bytes received: 611,604

sent 1,026,127,265 bytes  received 611,604 bytes  21,169,873.59 bytes/sec
total size is 4,289,593,685  speedup is 4.18

You can see a large data is matched and speedup.

Next, I try again, this time I merge several shell files to the target (tf_2.dat) such that change is ~2%,

cat *.sh >> rs/tf_2.dat

And, again sync using rsync.

rsync -r --no-whole-file --stats rs/ /mnt/raid0/scratch/t2


Number of files: 5 (reg: 4, dir: 1)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 4
Total file size: 4,289,727,173 bytes
Total transferred file size: 4,289,727,173 bytes
Literal data: 178,839 bytes
Matched data: 4,289,548,334 bytes
File list size: 0
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 541,845
Total bytes received: 690,392

sent 541,845 bytes  received 690,392 bytes  43,236.39 bytes/sec
total size is 4,289,727,173  speedup is 3,481.25

We see a large match and speedup giving fast syncing.

user3584196 · Answer 2 · 2018-01-12T17:30:29+08:00

user3584196

2018-01-12T17:30:29+08:002018-01-12T17:30:29+08:00

You can also try to use https://bitbucket.org/ppershing/blocksync (disclaimer: I am the author of this particular fork). An advantage over rsync is that it reads the file only once (as far as I know rsync can't be convinced to assume two files are different without computing the checksum before it starts the delta transfer. Needless to say, reading 160GB hard drives twice isn't a good strategy). A note of caution -- the current version of blocksync works well over short-RTT connections (e.g., localhost, LAN and local WiFi) but isn't particularly useful for syncing over long distances.

2

Sync two largely identical huge Files

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?