Day One
I have to hide the actual host names, so I'm hoping there is still enough information to answer this question...
I'm trying to resolve a certain host name (let's pretend it's www.example.com
, but this is not the actual host name). A simple dig
request works, but when I try to do a series of dig
starting from a root nameserver, I hit a dead-end. Here's an example:
# Starting with arbitrarily-chosen root nameserver
$ dig @198.41.0.4 www.example.com
(returns the usual list of TLD .com nameservers)
# Using a.gtld-servers.net
$ dig @192.5.6.30 www.example.com
(returns a list of 5 example.com authorities)
At this point, I tried each of the 5 example.com
authorities. Three of them fail with status SERVFAIL
, and the remaining two time out. Here's a SERVFAIL
example:
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 33577
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;www.example.com. IN A
;; Query time: 74 msec
;; SERVER: <intentionally removed>
;; WHEN: Tue Mar 8 10:10:33 2011
;; MSG SIZE rcvd: 37
I tried this multiple times, from my own machine at home and from a remote machine in our co-lo, and both machines consistently get the same results.
However,
- As I mentioned above,
dig www.example.com
(without specifying an@server
) works fine. - This DNS trace utility is able to resolve the host name, and it clearly shows that it's using one of the name servers that times out for me!
Can anybody help me figure out what's going on?
EDIT 1: In case it helps, what should happen is that this host name should ultimately resolve to a CNAME record pointing to www.example.com.edgesuite.net
, which should in turn resolve to another CNAME record pointing to an Akamai edge server.
EDIT 2: Per Joris's recommendation, I ran dig +trace www.example.com
, and it actually failed to find a result. It gets to the same list of example.com
authorities that I found before, and stops there.
Caching seems like a very likely culprit (and I did think of this earlier), but the weird part is that the actual host name isn't that popular. Would it be cached on two different ISP local nameservers if I'm the first person to request it? :-)
Day Two
OK, I've discovered a few things:
- The two
example.com
authorities that I thought were timing out (as opposed to the other three, that were returningSERVFAIL
) are not actually timing out. They just require a much longer timeout. If I usedig +time=10
, for example, then I do eventually get back a result. - I've tried this from several servers around the U.S., and the story is the same -- using
dig www.example.com
returns a result very quickly, butdig @ns1.example.com
(or@ns2.example.com
) requires using a large timeout parameter.
So my new questions are:
- Could the result really be cached on a variety of proxying DNS servers, even though it's not a commonly-used host name? The TTL is 54,000 (or 15 hours, if I understand correctly).
- If not, then is it possible that
ns1.example.com
is somehow configured to return a result more quickly to proxying DNS servers than to my owndig
queries (some kind of white list)? Or is that just crazy talk?