Here is the scenario: On the source server, we added a new disk arrays since the old disk arrays where running out of space. So, I copied the contents from the old disk arrays to the new disk arrays using "cp". Then I unmounted the old disk arrays and mounted the new arrays while preserving the partition names.
On the next day, our rsync process ran and for some reason it did not just copied the incremental / different files but it seems to be going through the whole files and... I am not sure what it is doing, there is huge CPU usage by the process and not a lot of IO usage. So, I guess it is doing some checksum process to compare the data between the source and destination but not copying the files?
Anyway, has anyone seen this rsync behavior before? And what triggered this behavior? Is it because I "cp" the files and therefore the files are different? Is there a file where rsync keeps a list of files that it has scanned before so that it knows it will just copy the incremental files?
If you did not use the
-p
option to preserve "modification time, access time, file flags, file mode, user ID, and group ID" (per theman
page; ACLs as well) when you rancp
, then it's highly likely that modification/access times were changed.If your
rsync
command includes either the-a
or-t
options, then it's trying to update all those new access times. I'm not sure whatrsync
s actual algorithm is, but I believe that even if yourrsync
command wasn't trying to update the modification/access times that it'd probably have to start comparing blocks or checksums for blocks for everything that has a new modification time to see if it was actually changed or not.You can use
-u
or--update
to ignore files that are newer on the receiver, this will prevent it syncing all the files that are the same. You could also use the--size-only
option, but that might miss changes if the filesize remained constant.