Ping a Specific Port

Question

Amandasaurus

Asked: 2009-07-21 06:36:34 +0800 CST2009-07-21 06:36:34 +0800 CST 2009-07-21 06:36:34 +0800 CST

Copying a large directory tree locally? cp or rsync?

772

I have to copy a large directory tree, about 1.8 TB. It's all local. Out of habit I'd use rsync, however I wonder if there's much point, and if I should rather use cp.

I'm worried about permissions and uid/gid, since they have to be preserved in the copy (I know rsync does this). As well as things like symlinks.

The destination is empty, so I don't have to worry about conditionally updating some files. It's all local disk, so I don't have to worry about ssh or network.

The reason I'd be tempted away from rsync, is because rsync might do more than I need. rsync checksums files. I don't need that, and am concerned that it might take longer than cp.

So what do you reckon, rsync or cp?

18 Answers

Voted

Hamish Downer · Answer 1 · 2009-07-21T06:40:58+08:00

Best Answer

Hamish Downer

2009-07-21T06:40:58+08:002009-07-21T06:40:58+08:00

I would use rsync as it means that if it is interrupted for any reason, then you can restart it easily with very little cost. And being rsync, it can even restart part way through a large file. As others mention, it can exclude files easily. The simplest way to preserve most things is to use the -a flag – ‘archive.’ So:

rsync -a source dest

Although UID/GID and symlinks are preserved by -a (see -lpgo), your question implies you might want a full copy of the filesystem information; and -a doesn't include hard-links, extended attributes, or ACLs (on Linux) or the above nor resource forks (on OS X.) Thus, for a robust copy of a filesystem, you'll need to include those flags:

rsync -aHAX source dest # Linux
rsync -aHE source dest  # OS X

The default cp will start again, though the -u flag will "copy only when the SOURCE file is newer than the destination file or when the destination file is missing". And the -a (archive) flag will be recursive, not recopy files if you have to restart and preserve permissions. So:

cp -au source dest

258

Ellis Percival · Answer 2 · 2013-05-08T11:09:48+08:00

Ellis Percival

2013-05-08T11:09:48+08:002013-05-08T11:09:48+08:00

When copying to the local file system I tend to use rsync with the following options:

# rsync -avhW --no-compress --progress /src/ /dst/

Here's my reasoning:

-a is for archive, which preserves ownership, permissions etc.
-v is for verbose, so I can see what's happening (optional)
-h is for human-readable, so the transfer rate and file sizes are easier to read (optional)
-W is for copying whole files only, without delta-xfer algorithm which should reduce CPU load
--no-compress as there's no lack of bandwidth between local devices
--progress so I can see the progress of large files (optional)

I've seen 17% faster transfers using the above rsync settings over the following tar command as suggested by another answer:

# (cd /src; tar cf - .) | (cd /dst; tar xpf -)

181

Chad Huneycutt · Answer 3 · 2009-07-21T07:15:34+08:00

Chad Huneycutt

2009-07-21T07:15:34+08:002009-07-21T07:15:34+08:00

When I have to copy a large amount of data, I usually use a combination of tar and rsync. The first pass is to tar it, something like this:

# (cd /src; tar cf - .) | (cd /dst; tar xpf -)

Usually with a large amount of files, there will be some that tar can't handle for whatever reason. Or maybe the process will get interrupted, or if it is a filesystem migration, the you might want to do the initial copy before the actual migration step. At any rate, after the initial copy, I do an rsync step to sync it all up:

# cd /dst; rsync -avPHSx --delete /src/ .

Note that the trailing slash on /src/ is important.

85

AskApache · Answer 4 · 2012-02-27T09:06:33+08:00

AskApache

2012-02-27T09:06:33+08:002012-02-27T09:06:33+08:00

rsync

Here is the rsync I use, I prefer cp for simple commands, not this.

$ rsync -ahSD --ignore-errors --force --delete --stats $SRC/ $DIR/

cpio

Here is a way that is even safer, cpio. It's about as fast as tar, maybe a little quicker.

$ cd $SRC && find . -mount -depth -print0 2>/dev/null | cpio -0admp $DEST &>/dev/null

tar

This is also good, and continues on read-failures.

$ tar --ignore-failed-read -C $SRC -cf - . | tar --ignore-failed-read -C $DEST -xf -

Note those are all just for local copies.

16

arjones · Answer 5 · 2017-05-12T11:14:58+08:00

arjones

2017-05-12T11:14:58+08:002017-05-12T11:14:58+08:00

This thread was very useful and because there were so many options to achieve the result, I decided to benchmark few of them. I believe my results can be helpful to others have a sense of what worked faster.

To move 532Gb of data distributed among 1,753,200 files we had those times:

rsync took 232 minutes
tar took 206 minutes
cpio took 225 minutes
rsync + parallel took 209 minutes

On my case I preferred to use rsync + parallel. I hope this information helps more people to decide among these alternatives.

The complete benchmark are published here

14

innaM · Answer 6 · 2009-07-21T06:40:59+08:00

innaM

2009-07-21T06:40:59+08:002009-07-21T06:40:59+08:00

Whatever you prefer. Just don't forget the -a switch when you decide to use cp.

If you really need an answer: I'd use rsync because it's much more flexible. Need to shutdown before copying is complete? Just ctrl-c and resume as soon as your back. Need to exclude some files? Just use --exclude-from. Need to change ownership or permissions? rsync will do that for you.

7

John · Answer 7 · 2012-11-28T17:20:32+08:00

John

2012-11-28T17:20:32+08:002012-11-28T17:20:32+08:00

The rsync command always computes checksums on every byte it transfers.

The command line option --checksum only relates to whether checksums of files are used to determine which files to transfer or not, ie:

-c, --checksum skip based on checksum, not mod-time & size"

The manpage also says this:

Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking its whole-file checksum, but that automatic after-the-transfer verification has nothing to do with this option’s before-the-transfer "Does this file need to be updated?" check.

So rsync also, always, calculates a checksum of the whole file on the receiving side, even when -c/ --checksum option is "off".

7

oneguynick · Answer 8 · 2009-07-21T08:24:51+08:00

oneguynick

2009-07-21T08:24:51+08:002009-07-21T08:24:51+08:00

rsync -aPhW --protocol=28 helps speed up those large copies with RSYNC. I always go rsync because the thought of being midway through 90GiB and it breaking scares me away from CP

6

n3bulous · Answer 9 · 2009-07-21T08:14:37+08:00

n3bulous

2009-07-21T08:14:37+08:002009-07-21T08:14:37+08:00

rsync is great, but has issues with really large directory trees because it stores the trees in memory. I was just looking to see if they'd fix this problem when I found this thread.

I also found:

http://matthew.mceachen.us/geek/gigasync/

You could also manually break up the tree and run multiple rsyncs.

5

Frédéric N. · Answer 10 · 2019-07-16T04:50:13+08:00

Frédéric N.

2019-07-16T04:50:13+08:002019-07-16T04:50:13+08:00

You definitely want to give rclone a try. This thing is crazy fast :

sudo rclone sync /usr /home/fred/temp -P -L --transfers 64

Transferred:       17.929G / 17.929 GBytes, 100%, 165.692 MBytes/s, ETA 0s
Errors:                75 (retrying may help)
Checks:            691078 / 691078, 100%
Transferred:       345539 / 345539, 100%
Elapsed time:     1m50.8s

This is a local copy from and to a LITEONIT LCS-256 (256GB) SSD.

You can add --ignore-checksum on the first run to make it even faster.

4

Copying a large directory tree locally? cp or rsync?

rsync

cpio

tar

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?