Ping a Specific Port

Question

fadedbee

Asked: 2009-09-18 08:59:56 +0800 CST2009-09-18 08:59:56 +0800 CST 2009-09-18 08:59:56 +0800 CST

How do you synchronise huge sparse files (VM disk images) between machines?

772

Is there a command, such as rsync, which can synchronise huge, sparse, files from one linux server to another?

It is very important that the destination file remains sparse. It may be longer (but not bigger) than the drive which contains it. Only changed blocks should be sent across the wire.

I have tried rsync, but got no joy. https://groups.google.com/forum/#!topic/mailing.unix.rsync/lPOScZgFE9M

If I write a programme to do this, am I just reinventing the wheel? http://www.finalcog.com/synchronise-block-devices

Thanks,

Chris.

9 Answers

Voted

Steve P · Answer 1 · 2012-10-07T07:13:01+08:00

Steve P

2012-10-07T07:13:01+08:002012-10-07T07:13:01+08:00

rsync --ignore-existing --sparse ...

To create new files in sparse mode

Followed by

rsync --inplace ...

To update all existing files (including the previously created sparse ones) inplace.

23

reconbot · Answer 2 · 2009-09-18T09:40:03+08:00

Rsync only transfers changes to each file and with --inplace should only rewrite the blocks that changed without recreating the file. From their features page.

rsync is a file transfer program for Unix systems. rsync uses the "rsync algorithm" which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand.

Using --inplace should work for you. This will show you progress, compress the transfer (at the default compression level), transfer the contents of the local storage directory recursively (that first trailing slash matters), make the changes to the files in place and use ssh for the transport.

rsync -v -z -r --inplace --progress -e ssh /path/to/local/storage/ \
user@remote.machine:/path/to/remote/storage/

I often use the -a flag as well which does a few more things. It's equivalent to -rlptgoD I'll leave the exact behavior for you to look up in the man page.

ndemou · Answer 3 · 2018-08-25T05:15:12+08:00

To sync huge files or block-devices with low to moderate differences you can either do a plain copy or use bdsync, rsync is absolutely not fit for this particular case*.

bdsync worked for me, seems mature enough, it's history of bugs is encouraging (little issues, prompt resolution). In my tests it's speed was close to the theoretical maximum you could get** (that is you can sync in about the time you need to read the file). Finally it's open source and costs nothing.

bdsync reads the files from both hosts and exchanges check-sums to compare them and detect differences. All these at the same time. It finally creates a compressed patch file on the source host. Then you move that file to the destination host and run bdsync a second time to patch the destination file.

When using it over a rather fast link (e.g. 100Mbit Ethernet) and for files with small differences (as is most often the case on VM disks) it reduces the time to sync to the time you need to read the file. Over a slow link you need a bit more time because you have to copy the compressed changes from one host to the other (it seems you can save time using a nice trick but haven't tested). For files with many changes the time to write the patch file to disk should also be taken into account (and you need enough free space in both hosts to hold it).

Here's how I typically use bdsync. These commands are run on $LOCAL_HOST to "copy" $LOCAL_FILE to $REMOTE_FILE on $REMOTE_HOST. I use pigz (a faster gzip) to compress the changes, ssh to run bdsync on the remote host and rsync/ssh to copy the changes. Do note that I'm checking whether the patch has been applied successfully but I only print "Update successful" when it does. You may wish to do something more clevel in case of failure.

REMOTE_HOST=1.2.3.4
LOCAL_FILE=/path/to/source/file
REMOTE_FILE=/path/to/destination/file
PATCH=a_file_name
LOC_TMPDIR=/tmp/
REM_TMPDIR=/tmp/
# if you do use /tmp/ make sure it fits large patch files

# find changes and create a compressed patch file
bdsync "ssh $REMOTE_HOST bdsync --server" "$LOCAL_FILE" "$REMOTE_FILE" --diffsize=resize | pigz > "$LOC_TMPDIR/$PATCH"

# move patch file to remote host
rsync "$LOC_TMPDIR/$PATCH" $REMOTE_HOST:$REM_TMPDIR/$PATCH

# apply patch to remote file
(
    ssh -T $REMOTE_HOST  <<ENDSSH
    pigz -d < $REM_TMPDIR/$PATCH | bdsync --patch="$REMOTE_FILE" --diffsize=resize && echo "ALL-DONE"
    rm $REM_TMPDIR/$PATCH
ENDSSH
) | grep -q "ALL-DONE" && echo "Update succesful"  && rm "$LOC_TMPDIR/$PATCH"

# (optional) update remote file timestamp to match local file
MTIME=`stat "$LOCAL_$FILE" -c %Y`
ssh $REMOTE_HOST touch -c -d @"$MTIME_0" "$REMOTE_FILE" </dev/null

*: rsync is hugely inefficient with huge files. Even with --inplace it will first read the whole file on the destination host, AFTERWARDS begin reading the file on the source host and finally transfer the differences (just run dstat or similar while running rsync and observe). The result is that even for files with small differences it takes about double the time you need to read the file in order to sync it.

**: Under the assumption that you have no other way to tell what parts of the files have changed. LVM snapshots use bitmaps to record the changed blocks so they can be extremely faster (The readme of lvmsync has more info).

rkthkr · Answer 4 · 2009-10-03T01:21:31+08:00

rkthkr

2009-10-03T01:21:31+08:002009-10-03T01:21:31+08:00

Take a look at Zumastor Linux Storage Project it implements "snapshot" backup using binary "rsync" via the ddsnap tool.

From the man-page:

ddsnap provides block device replication given a block level snapshot facility capable of holding multiple simultaneous snapshots efficiently. ddsnap can generate a list of snapshot chunks that differ between two snapshots, then send that difference over the wire. On a downstream server, write the updated data to a snapshotted block device.

4

fadedbee · Answer 5 · 2013-05-03T23:35:48+08:00

Best Answer

fadedbee

2013-05-03T23:35:48+08:002013-05-03T23:35:48+08:00

I ended up writing software to do this:

http://www.virtsync.com

This is commercial software costing $49 per physical server.

I can now replicate a 50GB sparse file (which has 3GB of content) in under 3 minutes across residential broadband.

chris@server:~$ time virtsync -v /var/lib/libvirt/images/vsws.img backup.barricane.com:/home/chris/
syncing /var/lib/libvirt/images/vsws.img to backup.barricane.com:/home/chris/vsws.img (dot = 1 GiB)
[........>.........................................]
done - 53687091200 bytes compared, 4096 bytes transferred.

real    2m47.201s
user    0m48.821s
sys     0m43.915s

4

Tobu · Answer 6 · 2012-11-04T16:28:48+08:00

Tobu

2012-11-04T16:28:48+08:002012-11-04T16:28:48+08:00

lvmsync does this.

Here's a usage transcript. It creates an LVM snapshot on the source, transfers the logical partition. You can transfer incremental updates of the changes since snapshot creation as often as you like.

2

James C · Answer 7 · 2009-10-03T01:39:28+08:00

James C

2009-10-03T01:39:28+08:002009-10-03T01:39:28+08:00

Could replicating the whole file system be a solution? DRBD? http://www.drbd.org/

1

cstamas · Answer 8 · 2012-10-07T09:37:55+08:00

cstamas

2012-10-07T09:37:55+08:002012-10-07T09:37:55+08:00

Maybe a bit strange here, but I found out recently that NFS handles this fine.

So you export a directory on one machine then mount it on the other and you just copy the files with basic utils like cp. (Some old/ancient utilities can have problem with sparse files.)

I found rsync especially inefficient in transferring sparse files.

1

dyasny · Answer 9 · 2009-09-18T09:07:09+08:00

dyasny

2009-09-18T09:07:09+08:002009-09-18T09:07:09+08:00

I'm not aware of such a utility, only of the system calls that can handle it, so if you write such a utility, it might be rather helpful.

what you actually can do is use qemu-img convert to copy the files, but it will only work if the destination FS supports sparse files

0

How do you synchronise huge sparse files (VM disk images) between machines?

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?