(Rewriting most of this question since a lot of my original tests are irrelevant in light of new information)
I'm having issues with Server 2012R2 DNS servers. The biggest side effect of these issues is Exchange emails not going through. Exchange queries for AAAA records before trying A records. When it sees SERVFAIL for the AAAA record, it doesn't even try A records, it just gives up.
For some domains, when querying against my active directory DNS servers, I get SERVFAIL instead of NOERROR with no results.
I have tried this from several different Server 2012R2 domain controllers that are running DNS. One of them is an entirely separate domain, on a different network behind a different firewall and internet connection.
Two addresses that I know cause this problem are smtpgw1.gov.on.ca
and mxmta.owm.bell.net
I've been using dig
on a linux machine to test this (192.168.5.5 is my domain controller):
grant@linuxbox:~$ dig @192.168.5.5 smtpgw1.gov.on.ca -t AAAA
; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> @192.168.5.5 smtpgw1.gov.on.ca -t AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 56328
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;smtpgw1.gov.on.ca. IN AAAA
;; Query time: 90 msec
;; SERVER: 192.168.5.5#53(192.168.5.5)
;; WHEN: Wed Oct 21 14:09:10 EDT 2015
;; MSG SIZE rcvd: 46
But queries against a public domain controller work as expected:
grant@home-ssh:~$ dig @4.2.2.1 smtpgw1.gov.on.ca -t AAAA
; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> @4.2.2.1 smtpgw1.gov.on.ca -t AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 269
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 8192
;; QUESTION SECTION:
;smtpgw1.gov.on.ca. IN AAAA
;; Query time: 136 msec
;; SERVER: 4.2.2.1#53(4.2.2.1)
;; WHEN: Wed Oct 21 14:11:19 EDT 2015
;; MSG SIZE rcvd: 46
As I said, I've tried this on two different networks and domains. One is a brand new domain, which definitely has all default settings for DNS. The other has been migrated to Server 2012, so some old settings from 2003/2008 may have carried over. I get the same results on both of them.
Disabling EDNS with dmscnd /config /enableednsprobes 0
fixes it. I see many search results about EDNS being a problem in Server 2003, but not much that matches what I'm seeing in Server 2012. Neither firewall has a problem with EDNS. Disabling EDNS should just be a temporary workaround though - it prevents the use of DNSSEC, and might cause other issues.
I have also seen some posts about issues with Server 2008R2 and EDNS, but those same posts say things are fixed in Server 2012, so it should work properly.
I have also tried enabling the debug log for DNS. I can see the packets that I expected, but it doesn't give me much insight as to why it's returning SERVFAIL. Here is the relevant portions of the DNS server debug log:
First packet - query from client to my DNS server
10/16/2015 9:42:29 AM 0974 PACKET 000000EFF1BF01A0 UDP Rcv 172.16.0.254 a61e Q [2001 D NOERROR] AAAA (7)smtpgw1(3)gov(2)on(2)ca(0) UDP question info at 000000EFF1BF01A0 Socket = 508 Remote addr 172.16.0.254, port 50764 Time Query=4556080, Queued=0, Expire=0 Buf length = 0x0fa0 (4000) Msg length = 0x002e (46) Message: XID 0xa61e Flags 0x0120 QR 0 (QUESTION) OPCODE 0 (QUERY) AA 0 TC 0 RD 1 RA 0 Z 0 CD 0 AD 1 RCODE 0 (NOERROR) QCOUNT 1 ACOUNT 0 NSCOUNT 0 ARCOUNT 1 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(7)smtpgw1(3)gov(2)on(2)ca(0)" QTYPE AAAA (28) QCLASS 1 ANSWER SECTION: empty AUTHORITY SECTION: empty ADDITIONAL SECTION: Offset = 0x0023, RR count = 0 Name "(0)" TYPE OPT (41) CLASS 4096 TTL 0 DLEN 0 DATA Buffer Size = 4096 Rcode Ext = 0 Rcode Full = 0 Version = 0 Flags = 0
Second packet - query from my DNS server to their DNS server
10/16/2015 9:42:29 AM 0974 PACKET 000000EFF0A22160 UDP Snd 204.41.8.237 3e6c Q [0000 NOERROR] AAAA (7)smtpgw1(3)gov(2)on(2)ca(0) UDP question info at 000000EFF0A22160 Socket = 9812 Remote addr 204.41.8.237, port 53 Time Query=0, Queued=0, Expire=0 Buf length = 0x0fa0 (4000) Msg length = 0x0023 (35) Message: XID 0x3e6c Flags 0x0000 QR 0 (QUESTION) OPCODE 0 (QUERY) AA 0 TC 0 RD 0 RA 0 Z 0 CD 0 AD 0 RCODE 0 (NOERROR) QCOUNT 1 ACOUNT 0 NSCOUNT 0 ARCOUNT 0 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(7)smtpgw1(3)gov(2)on(2)ca(0)" QTYPE AAAA (28) QCLASS 1 ANSWER SECTION: empty AUTHORITY SECTION: empty ADDITIONAL SECTION: empty
Third packet - response from their DNS server (NOERROR)
10/16/2015 9:42:29 AM 0974 PACKET 000000EFF2188100 UDP Rcv 204.41.8.237 3e6c R Q [0084 A NOERROR] AAAA (7)smtpgw1(3)gov(2)on(2)ca(0) UDP response info at 000000EFF2188100 Socket = 9812 Remote addr 204.41.8.237, port 53 Time Query=4556080, Queued=0, Expire=0 Buf length = 0x0fa0 (4000) Msg length = 0x0023 (35) Message: XID 0x3e6c Flags 0x8400 QR 1 (RESPONSE) OPCODE 0 (QUERY) AA 1 TC 0 RD 0 RA 0 Z 0 CD 0 AD 0 RCODE 0 (NOERROR) QCOUNT 1 ACOUNT 0 NSCOUNT 0 ARCOUNT 0 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(7)smtpgw1(3)gov(2)on(2)ca(0)" QTYPE AAAA (28) QCLASS 1 ANSWER SECTION: empty AUTHORITY SECTION: empty ADDITIONAL SECTION: empty
Fourth packet - response from my DNS server to client (SERVFAIL)
10/16/2015 9:42:29 AM 0974 PACKET 000000EFF1BF01A0 UDP Snd 172.16.0.254 a61e R Q [8281 DR SERVFAIL] AAAA (7)smtpgw1(3)gov(2)on(2)ca(0) UDP response info at 000000EFF1BF01A0 Socket = 508 Remote addr 172.16.0.254, port 50764 Time Query=4556080, Queued=4556080, Expire=4556083 Buf length = 0x0fa0 (4000) Msg length = 0x002e (46) Message: XID 0xa61e Flags 0x8182 QR 1 (RESPONSE) OPCODE 0 (QUERY) AA 0 TC 0 RD 1 RA 1 Z 0 CD 0 AD 0 RCODE 2 (SERVFAIL) QCOUNT 1 ACOUNT 0 NSCOUNT 0 ARCOUNT 1 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(7)smtpgw1(3)gov(2)on(2)ca(0)" QTYPE AAAA (28) QCLASS 1 ANSWER SECTION: empty AUTHORITY SECTION: empty ADDITIONAL SECTION: Offset = 0x0023, RR count = 0 Name "(0)" TYPE OPT (41) CLASS 4000 TTL 0 DLEN 0 DATA Buffer Size = 4000 Rcode Ext = 0 Rcode Full = 2 Version = 0 Flags = 0
Other things of note:
- One of the networks has native IPv6 internet access, the other does not (but IPv6 stack is enabled on the servers with default settings). Doesn't seem to be an IPv6 network issue
- It doesn't affect all domains. For example
dig @192.168.5.5 -t AAAA serverfault.com
returns NOERROR, and no results. Same thing forgoogle.com
returns google's IPv6 addresses properly. - Tried installing hotfix from KB3014171, made no difference.
- The update from KB3004539 is already installed.
Edit Nov 7, 2015
I've setup another non-domain joined Server 2012R2 machine, and installed DNS server role, and tested with the command nslookup -type=aaaa smtpgw1.gov.on.ca localhost
. It does NOT have the same issues.
Both VMs are on the same host, and same network, so that eliminates any network/firewall issues. It's now down to either patch level or being a domain member/domain controller that makes the difference.
Edit Nov 8, 2015
Applied all updates, made no difference. Went through to double check if there were any configuration differences between my new test server and my domain controller's DNS settings, and there are - the domain controller had forwarders setup.
Now, I'm sure I tried with forwarders and without in my initial tests, but I only tried it using dig
from a linux machine. I do get slightly different results with and without forwarders setup (tried with Google, OpenDNS, 4.2.2.1, and my ISP DNS servers) when I use nslookup on a windows machine.
With a forwarder set, I get Server failed
.
Without a forwarder (so it uses root DNS servers), I get No IPv6 address (AAAA) records available for smtpgw1.gov.on.ca
.
But that's still not the same as what I get for other domains that don't have IPv6 records - nslookup on windows just returns no results for other domains.
With or without forwarders, dig
still shows SERVFAIL
for that name when querying my windows DNS server.
There IS a small difference between the problem domain and other ones that seems relevant, even when I don't involve my windows DNS server:
dig -t aaaa @8.8.8.8 smtpgw1.gov.on.ca
has no answers, and does not have an authority section.
dig -t aaaa @8.8.8.8 serverfault.com
returns no answers, but does have an authority section. So do most other domains I try, no matter what resolver I use.
So why is that authority section missing, and why does Windows DNS server treat it as a failure when other DNS servers don't?
I've looked into the network tace some more and done some reading. The reqest for the AAAA record, when non-existant, returns an SOA. Turns out the SOA is for a different domain that that being requested. I suspect that's why Windows is rejecting the response. Request AAAA for mx.atomwide.com. Response SOA for lgfl.org.uk. I will see if we can make some progress with this information. EDIT: Just for future reference, temporarily turning off "Secure cache against pollution" will allow the query to succeed. Not ideal, but proves the issue is with a dodgy DNS record. RFC4074 is also a good referemce - Intro and Section.
According to KB832223
Microsoft has the following resolution:
Microsoft has the following suggestion to work around the issue: