We've been migrating about 220 GB of data from a Windows 2003 Server to a Windows 2008 Server, and because of the time it would take to copy that data and the necessity of keeping it available for users, I came up with the idea of using rsync
on an Ubuntu server to broker the migration. (I might have gone for a proper Windows solution - but the applications I found were a bit pricey for a one-shot like this - and permissions are not a problem).
All well and good - and today I'm making the last sync and confirming that the new server is up-to-date using diff, but I"ve noticed an odd thing with Excel spreadsheets (.xls).
Every instance of an Excel spreadsheet that has already been copied in a previous in a previous synchronisation is being marked as "already up-to-date" by rsync. However, when I then run a diff, I'm told that the files differ. I'm manually copying them, as there are but a handful, but I was wondering what might be causing this.
No other filetype in the entire 220 GB tree has had any problem like this - just the Excel/xls files. It'd be great if someone could come up with an explanation.
I agree with @Zoredache, Robocopy is all you need.
Try this from your 2008 server, to copy one directory to another; including security, attributes and time stamps...
To copy everything as above, plus owner and audit information...
Further information...
If you wanted to continue using rsync, try the --checksum switch. According to the rsync man page, "Rsync finds files that need to be transferred using a "quick check" algorithm (by default) that looks for files that have changed in size or in last-modified time." Though I am not sure why rsync would not notice the time stamp changing on your files, it is entirely likely that the size of an Excel file would stay the same between edits.
rsync make use of timestamp. Maybe Excel doesn't set the last modification time ?
In that case, the better option would probably to use the "--checksum" flag in rsync. This way, all the files will be scanned. It won't mean that all the files will be transfered again and again, only changes will be transfered, but it means that all files must be scanned at each go.
They are also other options : "--ignore-times" and "--size-only" (I don't know the difference), but contrary to --checksum, the files will be skipped without checking their content first if the file size matches.