I have a standalone, isolated network running mixed Windows and Linux systems, with a Windows 2008 R2 server performing AD duties and DNS.
I'm seeing 5-second delays with the use of getaddrinfo
on the Linux
systems.
In Wireshark I see (C->S means client to DNS server):
t=0.000 C->S Query A foo.example.com ID=0x1111
t=0.000 C->S Query AAAA foo.example.com ID=0x2222
t=0.004 S->C Response to 0x2222, No error
(Query is echoed)
Authoritative nameservers:
example.com: type SOA, class IN, mname svr01.example.com
Name: example.com
Type: SOA
Class: IN
TTL: 1 hour
Primary name server: svr01.example.com
Refresh interval: 15 minutes
Retry interval: 10 minutes
Expiration limit: 1 day
Minimum TTL: 1 hour
[5 second delay]
t=5.004 C->S Query A foo.example.com ID=0x1111
t=5.005 S->C Query response A 192.168.1.17'
If I make the same request again, shortly thereafter, I will see no delay, as expected:
t=0.000 C->S Query A foo.example.com ID=0x3333
t=0.000 C->S Query AAAA foo.example.com ID=0x4444
t=0.001 S->C Query response A 192.168.1.17'
I can continue to get immediate responses for some period of time. After a while (still experimenting) the delay will return.
What is going on here? If I use gethostbyname()
(which only does IPv4) or nslookup foo.example.com
, there is no delay.
Additional info:
- IPv6 is disabled on the server NICs
Update:
This answer on Ask Ubuntu suggested adding
options single-request
to /etc/resolv.conf
. This seemed to correct the problem for me.
However, I'm still curious:
- What the SOA record actually means
- Why the server doesn't respond the first time to the A query
Your DNS server appears to be buggy. Two requests are sent to the DNS server, but it sends only a single reply. The client does what clients are supposed to do in that case, it waits a short while and then retransmits the request.
An initial delay of 5 seconds may be reasonable for non-interactive usage. But for interactive usage I would consider that to be way too high.
The proper fix would be to upgrade the DNS server to a version without the bug or to contact the vendor if no fix has been released yet. Everything else is a workaround.
Using
man resolv.conf
on a Ubuntu system will explain what thesingle-request
andsingle-request-reopen
options do. Those are two different variations of a workaround for a known bug in certain DNS servers. The drawback of those options is that it slows down name resolution by roughly a factor of two. However given that the bug appears to slow down name resolution by a factor of about 1000, you may still be better off using the workaround.When requesting a nonexistent record you may receive a response with a SOA record instead. The reason for sending not just an error code but also a SOA record is that the SOA record contains information which will allow the negative result to be cached.
The correct way to interpret your packet capture is that you're seeing dropped reply packets for both the
A
andAAAA
record responses.The
SOA
record seems to be confusing you and is worth elaboration:SOA
record is actually in the authority section, not the answer section.NXDOMAIN
means "there are no records that have that name". If there are other records with the same name, but different types, the response you will see isNOERROR
with zero records in the answer section.NOERROR
response with zero answers and an authority section telling you what zone that answer came from. You can ignore theSOA
component entirely. This reply is telling you that there is noAAAA
record.Now that we've established that the
AAAA
reply is a correctly formatted packet and what you should be seeing in this scenario, it changes the context of what we're you're looking at entirely. You are seeing cases whereA
record replies are being lost, in addition toAAAA
replies being lost. Your research suggests thatAAAA
responses are being lost more frequently, but not exclusively.Based on the information supplied, we're not going to be able to explain what is going on here. You need to set up packet captures on the DNS servers themselves and identify the following factors:
As you can see, there's a lot of things that could be going on here. You're going to need to narrow in on the problem to rule out possibilities. I apologize for this answer not being conclusive, but this was far more than could be covered within a few comments. Feel free to update your question.