My team has a server pointing at the DNS supplied by Active Directory to ensure that it is able to reach any hosts managed by the domain. Unfortunately, my team also needs to run dig +trace
frequently and we sporadically get strange results. I am a DNS admin but not a domain admin, but the team responsible for these servers isn't sure what is going on here either.
The problem seems to have shifted around between OS upgrades, but it's hard to say whether that's a characteristic of the OS version or other settings being changed during the upgrade process.
- When the upstream servers were Windows Server 2003, the first step of
dig +trace
(request. IN NS
from the first entry in/etc/resolv.conf
) would occasionally return 0 byte responses. - When the upstream servers were upgraded to Windows Server 2012, the zero byte response problem went away but was replaced with an issue where we would sporadically get the list of forwarders configured on the DNS server.
Example of the second problem:
$ dig +trace -x 1.2.3.4
; <<>> DiG 9.8.2 <<>> +trace -x 1.2.3.4
;; global options: +cmd
. 3600 IN NS dns2.ad.example.com.
. 3600 IN NS dns1.ad.example.com.
;; Received 102 bytes from 192.0.2.11#53(192.0.2.11) in 22 ms
1.in-addr.arpa. 84981 IN NS ns1.apnic.net.
1.in-addr.arpa. 84981 IN NS tinnie.arin.net.
1.in-addr.arpa. 84981 IN NS sec1.authdns.ripe.net.
1.in-addr.arpa. 84981 IN NS ns2.lacnic.net.
1.in-addr.arpa. 84981 IN NS ns3.apnic.net.
1.in-addr.arpa. 84981 IN NS apnic1.dnsnode.net.
1.in-addr.arpa. 84981 IN NS ns4.apnic.net.
;; Received 507 bytes from 192.0.2.228#53(192.0.2.228) in 45 ms
1.in-addr.arpa. 172800 IN SOA ns1.apnic.net. read-txt-record-of-zone-first-dns-admin.apnic.net.
4827 7200 1800 604800 172800
;; Received 127 bytes from 202.12.28.131#53(202.12.28.131) in 167 ms
In most cases this isn't a problem, but it will cause dig +trace
to follow the wrong path if we are tracing within a domain that AD has an internal view for.
Why is dig +trace
losing its mind? And why do we seem to be the only ones complaining?
You are being trolled by root hints. This one is tricky to troubleshoot, and it hinges on understanding that the
. IN NS
query sent at the start of a trace does not set theRD
(recursion desired) flag on the packet.When Microsoft's DNS server receives a non-recursive request for the root nameservers, it's possible that they will return the configured root hints. So long as you do not add the
RD
flag to the request, the server will happily continue to return that same response with a fixed TTL all day long.This is where most troubleshooting efforts will break down, because the easy assumption to leap to is that
dig @whatever . NS
will reproduce the problem, which actually masks it completely. When the server gets a request for root nameservers with theRD
flag set, it will reach out and grab a copy of the real root nameservers, and all subsequent requests for. NS
without theRD
flag will magically start working as expected. This makesdig +trace
happy again, and everyone can go back to scratching their heads until the problem reappears.Your options are to either negotiate a different configuration with your domain admins, or to work around the problem. So long as the poisoned root hints are good enough in most circumstances (and you're aware of the circumstances where they're not: conflicting views, etc.), this isn't a huge inconvenience.
Some workarounds without changing the root hints are:
. NS
. You can also hardwire this nameserver into${HOME}/.digrc
, but this may confuse others on a shared account or be forgotten by you at some point.dig @somethingelse +trace example.com
dig . NS
dig +trace example.com