We run a number of web apps that store a lot of local data in small xml files. One part of our backup / recovery strategy is to produce a local mirror of the file system via a VPN to the hosting centre.
The VPN connection is only via a 12Mbps ADSL and whilst there are a lot of files and directories, the actual number of files that changes is quite small.
Although the bandwidth is probably an issue, I'm seeing results such as the output below. The robocopy /MIR took 5 hours to run yet only 30 mins to actually perform the copy.
Does anyone have any suggestions as to ways to improve this. The 5 hours is now bordering on too slow and if we can't find a way to speed this up then we're going to have to come up with a completely different solution.
Total Copied Skipped Mismatch FAILED Extras
Dirs : 17625 6618 11007 0 0 0
Files : 1112430 1223 1111207 0 0 0
Bytes : 57.451 g 192.25 m 57.263 g 0 0 0
Times : 5:01:23 0:35:55 0:00:00 4:25:27
Speed : 93509 Bytes/sec.
Speed : 5.350 MegaBytes/min.
Ended : Fri Apr 16 05:54:23 2010
I use rsync for Windows to copy over broadband connection. It is supposedly a delta copy system which only copies over the changes of each file whereas robocopy copies over the whole file if it has changed one bit. (tbh I wonder sometimes whether it actually does this)
You could also use robocopy /mon:x switch and have it permanently running. This will run when robocopy sees x changes in the file system. If it is run very frequently then only a small number of changes will occur.
You could use file replication feature in Windows Server, use a DFS path to each folder and set the local and remote folder as a target.
Robocopy will have to enumerate all local and remote files first, to determine which ones need to be transferred. This is most likely what is taking the time.
What about if you reset the Archive file attribute following a successful backup:
Then every time a file is written to, the Archive bit will be automatically set. Next time, you can tell Robocopy to only archive files with the A flag set:
I haven't tested this, but I believe it should be quicker as Robocopy will have a lot fewer files to process.
Another idea would be to run a scheduled job on the remote server (if this is possible) to zip up the entire directory structure and then just copy the resulting zip file over the VPN. XML will compress nicely and copying a single file will be much more efficient over a high latency link.
I second Charles Gargent's recommendation for rsync. I use rsync over SSH with Cygwin. If I recall correctly, there is a non-cygwin-depenedent executable available.
One huge benefit that rsync has over robocopy is that an rsync agent will be spawned on the remote side to do the processing on that end. The remote agent can inspect the remote filesystem without having to bring all the file details back to your local machine for processing. This is much, much faster than robocopy, and is probably what is behind your 5 hour delay.
You can also use compression with rsync over ssh, which can speed things up further.
Beware, however, that Cygwin filesystem ACLs and Windows ACLs do not play nicely together. If you require a perfect copy of ACLs, rsync might not be for you. I had to write a script to run xcacls to "clean up" permissions on my files after copying them.
Just some notes about using attrib -a /s to work around robocopy's shortfall. If you are going to use this solution run it BEFORE you run your full-backup. ie. A full backup generally takes a lot of time and some of the files could have changed between when they were backed up and when you get around to running attrib afterwards which could mean missing those changes in later copies.
The second note about this solution is that it only works well if you copies are not filtered. If you filter your backup or robocopy processes to avoid temporary files and similar garbage then there is no easy way to make sure that attrib looks at only the file that the copy process looks at. That is; you are changing attrib on files you are not actually copying - and that is not really a good idea.
Fascinating that robocopy seems incapable of creating a traditional full-backup ... or not in a single run. You can do it by running it twice. Once to copy everything and then again with /M to copy and this time actually reset the archive bits. What a PITA.
XYZ's comments regarding the downside of using ATTRIB are helpful, however it is not sufficient to simply follow a robocopy /MIR command with a robocopy /COPY /M command, to reset archive bits selectively. Robocopy will not reset the bit unless it actually copies the file, and (by default) it will not copy 'Same' files. Therefore,
ROBOCOPY source destination /MIR
ROBOCOPY source destination /COPY /M
will leave the archive bit of many files on the source unchanged. (I wish it were not true.)
It is unlikely that Robocopy source code will be tweaked further, but I wish the authors would have provided a 'this' for /MIR to reset archive bits in a single go (e.g., /MIR:A). This is mostly important for initiating backups on a new system, but in any event this demonstrates that robocopy /MIR is not a 'full' backup solution.