Ping a Specific Port

Question

Daniel W.

Asked: 2014-07-01 04:10:17 +0800 CST2014-07-01 04:10:17 +0800 CST 2014-07-01 04:10:17 +0800 CST

rsync changing already existing files

772

I have a simple line of rsync in my crontab that gets backup files from the prod server to another.

It looks like it is touching the already existing files in the destination folder. This way, the backup would incrementally take longer each interval.

Please take a look at the date and time the files below have changed.

How do I use rsync not to touch (and download?) the files it already has. I don't need any checksums calculated either, once the backups are created, they won't change anymore.

rsync -vzre 'ssh' stor@server:/backup/system/ /storage/share/Backup/Server

The files to be fetched:

-rw-r-x--- 1 root stor 896K Jun 22 05:02 giant-140622-etc.zip
-rw-r-x--- 1 root stor 620K Jun 22 05:02 giant-140622-sql.zip
-rw-r-x--- 1 root stor  84M Jun 22 05:02 giant-140622-www.zip
-rw-r-x--- 1 root stor 899K Jun 25 05:00 giant-140625-etc.zip
-rw-r-x--- 1 root stor 603K Jun 25 05:00 giant-140625-sql.zip
-rw-r-x--- 1 root stor  84M Jun 25 05:00 giant-140625-www.zip
-rw-r-x--- 1 root stor 899K Jun 28 05:00 giant-140628-etc.zip
-rw-r-x--- 1 root stor 620K Jun 28 05:00 giant-140628-sql.zip
-rw-r-x--- 1 root stor  86M Jun 28 05:00 giant-140628-www.zip
-rw-r-x--- 1 root stor 899K Jun 30 05:00 giant-140630-etc.zip
-rw-r-x--- 1 root stor 617K Jun 30 05:00 giant-140630-sql.zip
-rw-r-x--- 1 root stor  86M Jun 30 05:00 giant-140630-www.zip

The destination:

-rw-r-x--- 1 stor stor 896K Jun 30 06:06 giant-140622-etc.zip
-rw-r-x--- 1 stor stor 620K Jun 30 06:06 giant-140622-sql.zip
-rw-r-x--- 1 stor stor  84M Jun 30 06:06 giant-140622-www.zip
-rw-r-x--- 1 stor stor 899K Jun 30 06:06 giant-140625-etc.zip
-rw-r-x--- 1 stor stor 603K Jun 30 06:06 giant-140625-sql.zip
-rw-r-x--- 1 stor stor  84M Jun 30 06:06 giant-140625-www.zip
-rw-r-x--- 1 stor stor 899K Jun 30 06:06 giant-140628-etc.zip
-rw-r-x--- 1 stor stor 620K Jun 30 06:06 giant-140628-sql.zip
-rw-r-x--- 1 stor stor  86M Jun 30 06:06 giant-140628-www.zip
-rw-r-x--- 1 stor stor 899K Jun 30 06:07 giant-140630-etc.zip
-rw-r-x--- 1 stor stor 617K Jun 30 06:08 giant-140630-sql.zip
-rw-r-x--- 1 stor stor  86M Jun 30 06:10 giant-140630-www.zip

Update:

When I run the rsync command (with the --skip-existing arg) from the shell, it only downloads non-existing new files and skips the files it already has.

When investigating the behaviour of the exact same command run by a cronjob, the already existing files do change every cycle and the whole job takes incrementally longer each cycle (compare the times above, cronjob starting at 06:00, 2 minutes per file even if they already exist).

rsync -vzr --ignore-existing -e 'ssh -i /path/id_rsa -l backup' backup@flowl.info:/backup/system/ /nfs/share-private/Backup/Server

Update:

Here are the files form july, I put an extra blank line into, please see the times, which started by 06:01 and raise each new files.

-rw-r-x--- 1 stor stor 899K Jul  4 06:01 giant-140702-etc.zip
-rw-r-x--- 1 stor stor 621K Jul  4 06:01 giant-140702-sql.zip
-rw-r-x--- 1 stor stor  86M Jul  4 06:03 giant-140702-www.zip
                                       ^-- 01 to 03
-rw-r-x--- 1 stor stor 899K Jul  4 06:04 giant-140704-etc.zip
-rw-r-x--- 1 stor stor 634K Jul  4 06:05 giant-140704-sql.zip
-rw-r-x--- 1 stor stor  86M Jul  8 06:02 giant-140704-www.zip
                                       ^-- ???
-rw-r-x--- 1 stor stor 899K Jul  8 06:03 giant-140706-etc.zip
-rw-r-x--- 1 stor stor 629K Jul  8 06:03 giant-140706-sql.zip
-rw-r-x--- 1 stor stor  86M Jul  8 06:06 giant-140706-www.zip
                                       ^-- 03 - 06
-rw-r-x--- 1 stor stor 899K Jul  8 06:07 giant-140708-etc.zip
-rw-r-x--- 1 stor stor 629K Jul  8 06:07 giant-140708-sql.zip
-rw-r-x--- 1 stor stor  86M Jul  8 06:10 giant-140708-www.zip
                                       ^-- 07 - 10

Now when I imagine going on another month, the time would be like:

-rw-r-x--- 1 stor stor 899K Jul  8 06:32 giant-140808-etc.zip
-rw-r-x--- 1 stor stor 629K Jul  8 06:32 giant-140808-sql.zip
-rw-r-x--- 1 stor stor  86M Jul  8 06:35 giant-140808-www.zip
                                       ^-- what I imagine to happen

4 Answers

Voted

kasperd · Answer 1 · 2014-07-10T00:40:03+08:00

kasperd

2014-07-10T00:40:03+08:002014-07-10T00:40:03+08:00

By default rsync will read the entire file on both source and destination, to verify that they are identical. This does not consume network bandwidth, as it will only be comparing a hash value. But it does spend time reading from the disk.

In one usage scenario, I found this to be terribly inefficient because the source files were only being appended to. I used the --size-only, which worked well for me.

There is a few other options, which look like they may be applicable, --append and --append-verify, but I haven't tested those myself.

It does not look like you have a directory with a lot of small files, so the time to read the directory listing from disk and stat each file, shouldn't be much of a problem.

4

Daniel W. · Answer 2 · 2014-07-01T04:17:17+08:00

Best Answer

Daniel W.

2014-07-01T04:17:17+08:002014-07-01T04:17:17+08:00

I added the --ignore-existing command and it looks like it won't change anything and only download new files.

rsync -vzr --ignore-existing -e

Edit: When there are new files it still takes longer each cycle.

2

krissi · Answer 3 · 2014-07-14T08:26:22+08:00

I think adding -t to your argument list will help.

To verify this you could add --itemize-changes to the arguments (without -t). If I understood you correctly, this would show the T-flag in every line

man 1 rspec:

A t means the modification time is different and is being updated to the sender’s value (requires --times). An alternate value of T means that the modification time will be set to the transfer time, which happens when a file/symlink/device is updated without --times and when a symlink is changed and the receiver can’t set its time. (Note: when using an rsync 3.0.0 client, you might see the s flag combined with t instead of the proper T flag for this time-setting failure.)

After this add -t to the command (keep --itemize-changes) and you will receive a t-flag on every line. In the next run the list will only contain the new files.

This is my example run:

krissi@host ~/tmp/rsync % l *
dst:
total 0

src:
total 0
-rw-r--r-- 1 krissi users 0 Jul 13 18:05 bar
-rw-r--r-- 1 krissi users 0 Jul 13 18:05 foo
-rw-r--r-- 1 krissi users 0 Jul 13 18:19 later
krissi@host ~/tmp/rsync % rsync -vzr --itemize-changes src/ dst/
sending incremental file list
>f+++++++++ bar
>f+++++++++ foo
>f+++++++++ later

sent 174 bytes  received 69 bytes  486.00 bytes/sec
total size is 0  speedup is 0.00
krissi@host ~/tmp/rsync % l *
dst:
total 0
-rw-r--r-- 1 krissi users 0 Jul 13 18:21 bar
-rw-r--r-- 1 krissi users 0 Jul 13 18:21 foo
-rw-r--r-- 1 krissi users 0 Jul 13 18:21 later

src:
total 0
-rw-r--r-- 1 krissi users 0 Jul 13 18:05 bar
-rw-r--r-- 1 krissi users 0 Jul 13 18:05 foo
-rw-r--r-- 1 krissi users 0 Jul 13 18:19 later
krissi@host ~/tmp/rsync % rsync -vzr --itemize-changes src/ dst/
sending incremental file list
>f..T...... bar
>f..T...... foo
>f..T...... later

sent 174 bytes  received 69 bytes  486.00 bytes/sec
total size is 0  speedup is 0.00
krissi@host ~/tmp/rsync % rsync -vzr --itemize-changes src/ dst/
sending incremental file list
>f..T...... bar
>f..T...... foo
>f..T...... later

sent 174 bytes  received 69 bytes  486.00 bytes/sec
total size is 0  speedup is 0.00
krissi@host ~/tmp/rsync % rsync -vzrt --itemize-changes src/ dst/
sending incremental file list
.d..t...... ./
>f..t...... bar
>f..t...... foo
>f..t...... later

sent 177 bytes  received 72 bytes  498.00 bytes/sec
total size is 0  speedup is 0.00
krissi@host ~/tmp/rsync % rsync -vzrt --itemize-changes src/ dst/
sending incremental file list

sent 66 bytes  received 12 bytes  156.00 bytes/sec
total size is 0  speedup is 0.00
krissi@host ~/tmp/rsync % l *
dst:
total 0
-rw-r--r-- 1 krissi users 0 Jul 13 18:05 bar
-rw-r--r-- 1 krissi users 0 Jul 13 18:05 foo
-rw-r--r-- 1 krissi users 0 Jul 13 18:19 later

src:
total 0
-rw-r--r-- 1 krissi users 0 Jul 13 18:05 bar
-rw-r--r-- 1 krissi users 0 Jul 13 18:05 foo
-rw-r--r-- 1 krissi users 0 Jul 13 18:19 later

caesarsol · Answer 4 · 2014-07-10T00:37:03+08:00

caesarsol

2014-07-10T00:37:03+08:002014-07-10T00:37:03+08:00

why do you say it takes longer each time? how is that possible?

maybe it's the program generating the files that is touching them?

try with --checksum: skip based on checksum, not mod-time & size, see if that changes anything (i wouldn't keep this option because it reads every file from the disk every time, too expensive, i'm only suggesting it to find the problem.)

(and maybe try to debug with the -t option, that preserves modification times)

1

rsync changing already existing files

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?