I have a really weird problem with my DNS. My domain name (strugee.net
) is unresolvable from some networks, and resolvable from others.
For example, on my home network (same network the server's on):
% dig strugee.net
; <<>> DiG 9.10.3-P4 <<>> strugee.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10086
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;strugee.net. IN A
;; ANSWER SECTION:
strugee.net. 1800 IN A 216.160.72.225
;; Query time: 186 msec
;; SERVER: 205.171.3.65#53(205.171.3.65)
;; WHEN: Sat Apr 16 15:42:36 PDT 2016
;; MSG SIZE rcvd: 56
However, if I log in to a server I have on Digital Ocean, the domain fails to resolve:
% dig strugee.net
; <<>> DiG 9.9.5-9+deb8u3-Debian <<>> strugee.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 58551
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;strugee.net. IN A
;; Query time: 110 msec
;; SERVER: 2001:4860:4860::8844#53(2001:4860:4860::8844)
;; WHEN: Sat Apr 16 18:44:25 EDT 2016
;; MSG SIZE rcvd: 40
But, going directly to the authoritative nameservers works just fine:
% dig @dns1.registrar-servers.com strugee.net
; <<>> DiG 9.9.5-9+deb8u3-Debian <<>> @dns1.registrar-servers.com strugee.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30856
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 5, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;strugee.net. IN A
;; ANSWER SECTION:
strugee.net. 1800 IN A 216.160.72.225
;; AUTHORITY SECTION:
strugee.net. 1800 IN NS dns3.registrar-servers.com.
strugee.net. 1800 IN NS dns4.registrar-servers.com.
strugee.net. 1800 IN NS dns2.registrar-servers.com.
strugee.net. 1800 IN NS dns1.registrar-servers.com.
strugee.net. 1800 IN NS dns5.registrar-servers.com.
;; Query time: 3 msec
;; SERVER: 216.87.155.33#53(216.87.155.33)
;; WHEN: Sat Apr 16 18:46:36 EDT 2016
;; MSG SIZE rcvd: 172
It's pretty clear that there's a problem with some large network somewhere that's failing to resolve my domain, but I can't seem to figure out where. I skimmed the dig
manpage for options that might help, but didn't find anything particularly useful.
I'm on Namecheap both as a domain registrar as well as DNS hosting. I have the DNSSEC option turned on. I haven't made any changes to my DNS settings recently.
How can I debug this problem and find the offending nameserver?
daxd5 offered some good starting advice, but the only real answer here is that you need to know how to think like a recursive DNS server. Since there are numerous misconfigurations at the authoritative layer that can result in an inconsistent
SERVFAIL
, you need a DNS professional or online validation tools.Anyway, the goal isn't to cop out of helping you, but I wanted to make sure that you understand that there is no conclusive answer to that question.
In your particular case, I noticed that
strugee.net
appears to be a zone signed with DNSSEC. This is evident from the presence of theDS
andRRSIG
records in the referral chain:Before we go any further, we need to check whether or not the signing is valid. DNSViz is a tool frequently used for this purpose, and it confirms that there are indeed problems. The angry red in the picture is suggesting that you have a problem, but rather than mousing over everything we can just expand Notices on the left sidebar:
The problem is clear: the signature on your zone has expired and the keys need to be refreshed. The reason why you are seeing inconsistent results is because not all recursive servers have DNSSEC validation enabled. Ones which validate are dropping your domain, and for ones which do not it is business as usual.
Edit: Comcast's DNS infrastructure is known to implement DNSSEC validation, and as one of their customers I can confirm that I'm seeing a
SERVFAIL
as well.While you are indeed seeing that the authoritative name servers are responding correctly, you need to follow up the entire chain of DNS resolution. This is, walk down the whole DNS hierachy from the root servers up.
This basically checks that the public DNS servers are working, and you're doing the same thing that your DNS resolver should be doing. So you should be getting the same answers as above in your Digital Ocean server unless something's wrong with their DNS resolver:
If the first two queries fail, it's the DNS on Digital Ocean's side failing. Check your
/etc/resolv.conf
and try querying the secondary DNS server. If the secondary one works, just switch the order for resolvers and try again.