I have a strange problem, aren't they all! I have a DFS root \domain\files\vms, it has a single target on a different server than the namespace.
I can copy a test file set from the target directly via \server\vms$\testfiles and all is well, the files copy fine. I have repeated these tests many times.
If I try and copy the files from the dfs root I get big pauses in the network traffic, about 50 seconds every couple of minutes, all the traffic just stops for the copy. If I start another copy between the same two machines during this pause, it starts copying fine, so I know it's not an issue with the disks on the server.
Every once in a while the copy will fail, no errors, the progress bar will just zip all the way to 100% and the copy dialog will close. Checking the target folder show that the copy is incomplete.
I've moved the LUN to another server and had the same problem.
The servers are all 2008 R2, the clients are Vista x64, Windows7 x64 and 2008 R2, all have the same problem.
Anyone got any ideas?
Cheers,
Stephen
More Information:
I've been running a NetMon trace on the connection when the file copy fails and what seems to be standing out is that when opening a file that the copy completes on the SMB command looks like this:
SMB2: C CREATE (0x5), Name=Training\PDC2008\BB34 Live Services Notifications, Awareness, and Communications.wmv@#422082, Context=DHnQ, Context=MxAc, Context=QFid, Context=RqLs, Mid = 245376 SMB2: R CREATE (0x5), Context=MxAc, Context=RqLs, Context=DHnQ, Context=QFid, FID=0xFFFFFFFF00000015, Mid = 245376
But for the last file when the copy dialog closes looks like this:
SMB2: C CREATE (0x5), Name=gt\files\Media\Training\PDC2008\BB36 FAST Building Search-Driven Portals with Microsoft Office SharePoint Server 2007 and Microsoft Silverlight.wmv@#859374, Context=DHnQ, Context=MxAc, Context=QFid, Context=RqLs, Mid = 77 SMB2: R , Mid = 77 - NT Status: System - Error, Code = (58) STATUS_OBJECT_PATH_NOT_FOUND
The main difference seems to be in the name, one is relative to the open file share, the other has gained the gt\files\media prefix which is the name of the DFS target.
These failures are always preceded by logoff and back on of the SMB target.
Might have to bump this one to PSS.
As long as you have more than 1 root server, delete the problem DFS root server, check to ensure the folder and share were removed. Then, recreate that root so that it sets the share back up. The 50sec delay preceeded by logoff, I've experienced before with your exact related symptoms. I traced it down to 2 root servers that had their OS rebuilt, but the corresponding config in DFS was never cleared out and reconfigured for these 2 servers. Agreeing with Eric here, scrutinize the config and health of just the root link resolver servers.
-Greg
Any DFS events showing up in Event Viewer on the DFS server?
A shot in the dark, but are you running anti-malware software on the server/clients? If so, have you tried temporarily disabling any network-related features for the purpose of troubleshooting?
What are the root and link timeouts set to on your DFS namespace? You may want to experiment with lengthening them. This will make clients slower to pick up changes to the namespace. If your namespace is static, then it's fine for clients to run using cached referrals rather than checking with the namespace server for a fresh referral.
I ran into an issue with large delays when accessing file shares connected through DFS. We had a ton of stale (orphaned) DFS roots. Have you checked there yet?