We've recently migrated our Windows network to use DFS for shared files. DFS is working well, except for one annoying problem: users experience a significant delay when they try to access a DFS namespace that they have not accessed for some time. I have tried to troubleshoot the issue but have not had any success so far, and I was hoping someone here may have some pointers to help resolve the problem.
Firstly, some background on our network:
The network uses a Windows 2008 functional level Active Directory domain with two Windows 2008 DCs and two DNS servers (one on each of the DCs). The network is DNS only - no WINS. All computers are located at the same site and connected by Gigabit Ethernet. We have approximately 20 Domain-based DFS namespaces in Windows 2008 mode, and each DFS namespace has two Windows 2008 DFS namespace servers (the same two servers for all namespaces). All namespace servers are in FQDN mode and all folder targets are specified using their FQDN. All computers are up-to-date with Service Packs and patches.
The actual folder targets (i.e. the SMB shares our DFS folders point to) are scattered across several file and application servers, all running Windows 2008 bar two application servers which run Windows 2003 R2, with no replication setup at all (e.g. all DFS folders currently only have one folder target).
Some more detail on the problem:
The namespace access delay is generally 1 - 10 seconds long and seems to occur when a particular computer has not accessed the requested namespace for approximately five minutes or more.
For example, if the user has not accessed \\domain.name\namespace1\ for more than five minutes and attempts to access \\domain.name\namespace1\ via Windows Explorer, the Explorer window will freeze for 1 - 10 seconds before finally resuming and displaying the folders that exist in \\domain.name\namespace1. If they then close the Explorer window and attempt to access \\domain.name\namespace1\ again within five minutes the contents will be displayed almost instantly - if they wait longer than five minutes it will go through the 1 - 10 second pause again.
Once "inside" the namespace everything is nice and snappy, it's just the initial connection to the namespace that is slow.
The browsing delays seem to affect all variants of Windows that we use (Windows 2008 x64 SP2, Windows 2003 R2 x86 SP2, Windows XP Pro x86 SP3) - it is possibly a bit worse in Windows XP / 2003 than in Windows 2008, but I'm not sure if the difference isn't just psychological.
Accessing the underlying folder targets directly exhibits no delay at all - i.e. if the SMB shares pointed to by DFS are accessed directly (bypassing DFS) then there is no pause.
During trouble-shooting I noticed that the "Cache duration" for all of our DFS roots is set to 300 seconds - 5 minutes. Given that this is the same amount of time required to trigger the pause I assume that this caching is somehow related, although I am unsure exactly what is cached on the client and hence what needs to be looked up again after 5 minutes have elapsed.
In trying to resolve the problem I have already tried / checked the following (without success):
- Run dcdiag on both Domain Controllers - no problems found
- Done some basic DNS server checks without finding any problems - I don't know how to check the DNS servers in detail, but I would add that the network is not exhibiting any other strange behavior that may point to a DNS problem
- Disabled Anti-virus on clients and servers
- Removing one of the namespace servers from a couple of namespaces - no difference
So that's where I'm up to - and I'm out of ideas. Can anyone suggest what may be causing the delays and/or what I should be trying next?
Well, we finally appear to have resolved this issue in our environment. For the benefits of others, here's what we discovered and how we fixed the problem:
To try and gain further insight into what was occurring before/during/after the delays we used Wireshark on a client machine to capture/analyse network traffic whilst that client attempted to access a DFS share.
These captures showed something strange: whenever the delay occurred, in between the DFS request being sent from the client to a DC, and the referral to a DFS root server coming back from the DC to the client, the DC was sending out several broadcast name lookups to the network.
Firstly, the DC would broadcast a NetBIOS lookup for DOMAIN (where DOMAIN is our pre-Windows 2000 Active Directory domain name). A few seconds later, it would broadcast a LLMNR lookup for DOMAIN. This would be followed by yet another broadcast NetBios lookup for DOMAIN. After these three lookups had been broadcast (and I assume timed out) the DC would finally respond to the client with a (correct) referral to a DFS root server.
These broadcast name lookups for DOMAIN were only being sent when the long delay opening a DFS share occurred, and we could clearly see from the Wireshark capture that the DC wasn't returning a referral to a DFS root server until all three lookups been sent (and ~7 seconds passed). So, these broadcast name lookups were pretty obviously the cause of our delays.
Now that we knew what the problem was, we started trying to figure out why these broadcast name lookups were occurring. After a bit more Googling and some trial-and-error, we found our answer: we hadn't set the DfsDnsConfig registry key on our domain controllers to 1, as is required when using DFS in a DNS-only environment.
When we originally setup DFS in our enviroment we did read the various articles about how to configure DFS for a DNS-only environment (e.g. Microsoft KB244380 and others) and were aware of this registry key, but had misintepreted the instructions on when/how to use it.
KB244380 says:
We thought this meant that the registry key has to be set on the DFS namespace servers only, not realising that it was also required on the domain controllers. After we set DfsDnsConfig to 1 on our domain controllers (and restarted the "DFS Namespace" service), the problem vanished.
Obviously we're happy with this outcome, but I would add that I'm still not 100% convinced that this is our only problem - I wonder if adding DfsDnsConfig=1 to our DCs has only worked around the problem, rather than solving it. I can't figure out why the DCs would be trying to lookup DOMAIN (the domain name itself, rather than a server in the domain) during the DFS referral process, even in a non-DNS-only environment, and I also know I haven't set DfsDnsConfig=1 on domain controllers in other (admittedly much smaller / simpler) DNS-only environments and haven't had the same issue. Still, we've solved our problem so we are happy.
I hope this is helpful to the others who are experiencing a similar issue - and thanks again to those that offered suggestions along the way.
This could be caused by the DNS server netmask ordering. We came across this recently in Server 2003. This depends on your current subnetting.
Example.
Site 1: IP subnet 10.0.0.0/24 Site 2: IP subnet 10.0.1.0/24
Client in site 2 makes a DNS query for your domain based namespace and will be given the DFS server in site 1 by default as the DNS server is not aware of the site IP boundaries. You need to tell your DNS servers what subnet mask to use to identify which IP addresses to respond with.
See http://support.microsoft.com/kb/842197
The Active Directory Team Blog has a Three part article ALL about DFS Delays.
https://techcommunity.microsoft.com/t5/ask-the-directory-services-team/o-8217-dfs-shares-where-art-thou-8211-part-1-3/ba-p/397167 (https://archive.is/OeRqo)
https://techcommunity.microsoft.com/t5/ask-the-directory-services-team/o-8217-dfs-shares-where-art-thou-8211-part-2-3/ba-p/397171 (https://archive.is/cojW4)
https://techcommunity.microsoft.com/t5/ask-the-directory-services-team/o-8217-dfs-shares-where-art-thou-8211-part-3-3/ba-p/397175 (https://archive.is/E9Dov)
It covers the basics on the Referral Process, and then shows how to use various tools including dfsUtil and dfsDiag to discover the actual cause of the delays.
It helped me find my problem. Which turned out to be no Read permissions on the the share directory for Domain Users.
HTH, Daniel
Smells like a DNS problem but anything goes. I much prefered the old FRS because the diagnostics tools like Ultrasound was so useful :7
Do you get anything in the DFS Replication Event Log on the targets? (the DFS Health report will draw its warnings from the event log)
Running without WINS is a nice goal and admirable, though I'm pretty much against this if there's any pre-Vista/2008 Windows systems around as things aren't always working as expected or as fast without WINS in my experience - though it really shouldn't matter.
The client caches a DFS referral, i.e. when you enter \domain.name\namespace it will cache which actual server domain.name refers to. Once the referral expires from the cache, the client basically has to "discover" your DFS topology all over again, hence the delay.
Have a look here: http://technet.microsoft.com/en-us/library/cc758234(WS.10).aspx and here http://blogs.technet.com/filecab/archive/2006/01/20/417832.aspx for further info on how this works.
Possible solutions? A hacky way of going about it might be to write a small program that does a "keep alive" every few minutes; e.g. a C program that fopen's the first file it finds and immediately fclose's it. I haven't tried or tested this, and you would definitely need to give some careful consideration if you were going to do it.
We have had a similar-sounding problem, where users would experience delays (up to a minute) between clicking on a drive mapped to a DFS share, and being able to see and browse to the folders within the share.
The users also had home drives mapped to a different DFS share on the same volume, and had no delay when accessing folders there.
The difference between the two is Access-Based Enumeration (ABE) - the problem share has this enabled (it's a common drive for users, with thousands of folders - ABE means users only see those folders to which they have permissions).
Disabling ABE removed the problem entirely. Obviously this is not a solution as users then see all folders, confusing them. I have replicated the DFS share to a server with some spare disk as a temporary measure, and even with ABE enabled on this new target, the delay has gone.
The problem server is 2k3R2, and has an uptime of over 150 days (!), so it's going to get rebooted and have CHKDSK run over the offending volume. I'll post back here if this makes any difference to the problem. The new target is on a 2k8 server.
dfsutil /spcflush and dfsutil /pktflush can be a solution also in a multi site network make sure that the DFS link of the home site is coming form the local server and not from the cache.
I know the original poster was not using WINS, but I am posting for the benefit of others as we used this post the most to help solve a very similar problem. For us it ended up being someone decided to name their workstation with the same name as the domain. So, every time the DC did a lookup on the domain name for the DFS referral, it was wanting to resolve to that workstation and would cause a considerable multi-10s of seconds delay. A static 20 entry was placed into the WINS pointing at a DC and this has solved the problem. If you had no WINS, you could experiment with placing the domain name as a machine name in the LMHOSTS file pointed to a DC to get the 20 lookup, and set priority to have LMHOSTS be the first place to look at for resolving netbios names.
http://technet.microsoft.com/en-us/library/cc780950(v=ws.10).aspx This page actually mentions both Domain Controllers and DFSN, if that helps.
DFS Domain Controller and Root Server Registry Entries
The following registry entries are located under
on root servers and domain controllers. All entries are REG_DWORD.
So I used this article in my search. I set everything up and still had issues. After spending several days looking into the problem and excluding everything 'Microsoft' I guessed it was Network related. Turns out our WAN Accelerator was the issue. I had our Networking guys turn off acceleration for our Domain Controllers and everything got better.