I'm in the process of decommissioning an old 2003 server, which acts as a file server, and am just attempting a dry-run of migrating the file repository over to a new Windows Storage Server 2012 box. I'm using robocopy to copy over the files, and currently just doing some test runs to see how long it takes, before we make the final change over.
The first time I ran robocopy I supplied the following switches: Options : . /S /E /COPYALL /PURGE /MIR /MT:128 /R:100000 /W:30 It ran fine (although I wouldn't recommend the /r and /w switches as it'll take forever to complete!) The second time I ran it with the following switches (the destination directory already contained a copy of the source destination from the first time I ran it, /MIR will ensure it's updated): Options : . /S /E /COPYALL /PURGE /MIR /MT:128 /R:0 /W:0
This caused the server to hang about 5 minutes after the job started. It completely hung and I had to manually power cycle it to restart it. The logs aren't giving me a huge indication of what went wrong - thoughts were that /mt:128 had caused issues, but I supplied that switch the first time and that was fine.
The second time I change a couple of switches to /r:0 and /w:0 although I wouldn't imagine that they would cause it to hang.
Finally is the fact that I've chosen /MIR problematic as the destination has already been copied over from the source once before - I wouldn't have thought so though as I thought the only potential downside of mirroring was that it would delete files in the destination which are no longer in the source. If anyone could shed any light on what went wrong it'll ensure that it doesn't go wrong next time I try it out.
EDIT: the switches I mentioned above are taken from the robocopy log file, and in a sense they are an interpretation of the switches I specified, which were: /MIR /COPY:DATSOU /MT:128 /R /W
2nd Edit: The server in question has a dual NIC, teamed using Windows Server in-built NIC teaming. I feel this is important information, which I did not share when I originally posted the question. Would like to investigate this. The NIC in question is a Intel(R) 82574L Gigabit Network Connection. The NIC Team is 'Microsoft Network Adapter Multiplexor Driver'.
It appears to me that Robocopy is A) buggy, and B) hooks into the kernel in some way that can make the entire system incredibly unstable when it bugs out. We've seen this happen quite often (especially with the MT option) when syncing over reasonably high-speed WAN links (20Mbps - 100Mbps). So I'm pretty sure it's not a NIC driver having traffic volume issues - we do things in production that abuse them far more badly than this, and we see this even with 10Gbps LAN connections on Cisco UCS / VMWare 5.5, with everything patched current and Robocopy v6.3.9600.17415 dated 10/28/2014.
I'd love it if somebody can definitively prove we're all doing something stupid, but it looks like Microsoft is just putting out some unbelievably dangerous code.
It sounds like its a network card driver issue for sure. To see if this is a bug with your dual-nic setup, adjust the IPG parameter to about 20 milliseconds and remove your /MT:128 parameter (since /IPG and /MT are not compatible). Using your "switches I specified" line in your original post it would look like this.
The /IPG:20 (inter-packet gap) will slow down the transmission considerably, but provides stability.
The /Z (restartable mode) is important for copies over the network, in case of network disruptions (caused by bad cards, drivers, or by actual network issues) because it will allow the copy to pick up where it left off.
If this completes successfully, you've got an issue with your network driver. The issue would be that whatever driver your using can't handle the throughput of /IPG:0.
The final nail in the coffin for the NIC driver being the root cause of your server hanging would be to replace the card and rerun the command that caused it to hang. Apart from that you could probably also unplug one of the connections so the multiplexing doesn't occur, and run the command that produced the error.
Suggestion came from cb42 on technet.
http://social.technet.microsoft.com/Forums/en-US/itprovistaapps/thread/9555a996-1301-4f68-b9d3-82a87fc6ba46/
...and ss64 rocks (just sayin!) http://ss64.com/nt/robocopy.html
Why do you use
/S
with/E
? It seem to be opposite. And/E + /Purge
is equal to/Mirror
. And I think /MT:128 is too high, you should reduce it. Try: