Ping a Specific Port

Question

Jeff Atwood

Asked: 2009-07-20 02:30:32 +0800 CST2009-07-20 02:30:32 +0800 CST 2009-07-20 02:30:32 +0800 CST

DNS failing to propagate worldwide

772

I haven't changed anything related to the DNS entry for serverfault.com, but some users were reporting today that the serverfault.com DNS fails to resolve for them.

I ran a justping query and I can sort of confirm this -- serverfault.com dns appears to be failing to resolve in a handful of countries, for no particular reason that I can discern. (also confirmed via What's My DNS which does some worldwide pings in a similar fashion, so it's confirmed as an issue by two different sources.)

Why would this be happening, if I haven't touched the DNS for serverfault.com ?
our registrar is (gag) GoDaddy, and I use default DNS settings for the most part without incident. Am I doing something wrong? Have the gods of DNS forsaken me?
is there anything I can do to fix this? Any way to goose the DNS along, or force the DNS to propagate correctly worldwide?

Update: as of Monday at 3:30 am PST, everything looks correct.. JustPing reports site is reachable from all locations. Thank you for the many very informative responses, I learned a lot and will refer to this Q the next time this happens..

7 Answers

Voted

Alnitak · Answer 1 · 2009-07-20T13:48:28+08:00

This is not directly a DNS problem, it's a network routing problem between some parts of the internet and the DNS servers for serverfault.com. Since the nameservers can't be reached the domain stops resolving.

As far as I can tell the routing problem is on the (Global Crossing?) router with IP address 204.245.39.50.

As shown by @radius, packets to ns52 (as used by stackoverflow.com) pass from here to 208.109.115.121 and from there work correctly. However packets to ns22 go instead to 208.109.115.201.

Since those two addresses are both in the same /24 and the corresponding BGP announcement is also for a /24 this shouldn't happen.

I've done traceroutes via my network which ultimately uses MFN Above.net instead of Global Crossing to get to GoDaddy and there's no sign of any routing trickery below the /24 level - both name servers have identical traceroutes from here.

The only times I've ever seen something like this it was broken Cisco Express Forwarding (CEF). This is a hardware level cache used to accelerate packet routing. Unfortunately just occasionally it gets out of sync with the real routing table, and tries to forward packets via the wrong interface. CEF entries can go down to the /32 level even if the underlying routing table entry is for a /24. It's tricky to find these sorts of problems, but once identified they're normally easy to fix.

I've e-mailed GC and also tried to speak to them, but they won't create a ticket for non-customers. If any of you are a customer of GC, please try and report this...

UPDATE at 10:38 UTC As Jeff has noted the problem has now cleared. Traceroutes to both servers mentioned above now go via the 208.109.115.121 next hop.

pQd · Answer 2 · 2009-07-20T02:46:07+08:00

your dns servers for serverfault.com [ ns21.domaincontrol.com, ns22.domaincontrol.com. ] are unreachable. for last ~20h, at least from couple major isps in sweden [ telia, tele2, bredband2 ].

at the same time 'neighbor' dns servers for stackoverflow.com & superuser.com [ ns51.domaincontrol.com, ns52.domaincontrol.com ] are reachable.

sample traceroute to ns52.domaincontrol.com:

 1. xxxxxxxxxxx
 2. 83.233.28.193           
 3. 83.233.79.81            
 4. 213.200.72.5            
 5. 64.208.110.129          
 6. 204.245.39.50           
 7. 208.109.115.121         
 8. 208.109.115.162         
 9. 208.109.113.62          
10. 208.109.255.26

and to ns21.domaincontrol.com

 1. xxxxxxxxxxxx
 2. 83.233.28.193      
 3. 83.233.79.81       
 4. 213.200.72.5       
 5. 64.208.110.129     
 6. 204.245.39.50      
 7. 208.109.115.201    
 8. ???

maybe screwed up filtering / someone triggered some unwanted ddos protection and blacklisted some parts of internet. probably you should contact your dns service provider - go daddy.

you can verify if problem is [partialy] solved by:

checking if godaddy has reacted and changed name servers - eg lookup serverfault.com at http://www.squish.net/dnscheck/ using recort type: ANY
check if provided name servers respond to ping [not very scientific since name servers can work fine and still block icmp, but in this case it seems that icmp is allowed to other servers ] from telia via looking glass.

edit: traceroutes from working places

poland

 1. xxxxxxxxxxxxxxx
 2. 153.19.40.254               
 3. ???
 4. 153.19.254.236              
 5. 212.191.224.205             
 6. 213.248.83.129              
 7. 80.91.254.171               
 8. 80.91.249.105               
    80.91.251.230
    80.91.254.93
    80.91.251.52
 9. 213.248.89.182              
10. 204.245.39.50               
11. 208.109.115.121             
12. 208.109.115.162             
13. 208.109.113.62              
14. 208.109.255.26

germany

 1. xxxxxxxxxxxx
 2. 89.149.218.181       
 3. 89.149.218.2         
 4. 134.222.105.249      
 5. 134.222.231.205      
 6. 134.222.227.146      
 7. 80.81.194.26         
 8. 64.125.24.6          
 9. 64.125.31.249        
10. 64.125.27.165        
11. 64.125.26.178        
12. 64.125.26.242        
13. 209.249.175.170      
14. 208.109.113.58       
15. 208.109.255.26

edit: all works fine now indeed.

bortzmeyer · Answer 3 · 2009-07-20T23:04:30+08:00

bortzmeyer

2009-07-20T23:04:30+08:002009-07-20T23:04:30+08:00

My suggestions: as explained by Alnitak, the problem is not DNS but routing (probably BGP). The fact that nothing was changed in the DNS setup is normal, since the problem was not in he DNS.

serverfault.com has today a very poor DNS setup, certainly insufficient for an important site like this:

only two name servers
all the eggs in the same basket (both are in the same AS)

We've just seen the result: a routing glitch (something which is quite common on the Internet) is sufficient to make serverfault.com disappears for some users (depending on their operators, not on their countries).

I suggest to add more name servers, located in other AS. This would allow failure resilience. You can either rent them to private companies or to ask serverfault users to offer secondary DNS hosting (may be only if the user has > 1000 rep :-)

16

radius · Answer 4 · 2009-07-20T03:41:04+08:00

I do confirm that NS21.DOMAINCONTROL.COM and NS22.DOMAINCONTROL.COM are also unreacheable from ISP Free.fr in France.
Like pQd traceroute, mine also end after 208.109.115.201 for both ns21 and ns22.

traceroute to NS22.DOMAINCONTROL.COM (208.109.255.11), 64 hops max, 40 byte packets
 1  x.x.x.x (x.x.x.x)  2.526 ms  0.799 ms  0.798 ms
 2  78.224.126.254 (78.224.126.254)  6.313 ms  6.063 ms  6.589 ms
 3  213.228.5.254 (213.228.5.254)  6.099 ms  6.776 ms *
 4  212.27.50.170 (212.27.50.170)  6.943 ms  6.866 ms  6.842 ms
 5  212.27.50.190 (212.27.50.190)  8.308 ms  6.641 ms  6.866 ms
 6  212.27.38.226 (212.27.38.226)  68.660 ms  185.527 ms  14.123 ms
 7  204.245.39.50 (204.245.39.50)  48.544 ms  19.391 ms  19.753 ms
 8  208.109.115.201 (208.109.115.201)  19.315 ms  19.668 ms  34.110 ms
 9  * * *
10  * * *
11  * * *
12  * * *

But ns52.domaincontrol.com (208.109.255.26) do works and is in the same subnet as ns22.domaincontrol.com (208.109.255.11)

traceroute to ns52.domaincontrol.com (208.109.255.26), 64 hops max, 40 byte packets
 1  x.x.x.x (x.x.x.x)  1.229 ms  0.816 ms  0.808 ms
 2  78.224.126.254 (78.224.126.254)  12.127 ms  5.623 ms  6.068 ms
 3  * * *
 4  212.27.50.170 (212.27.50.170)  13.824 ms  6.683 ms  6.828 ms
 5  212.27.50.190 (212.27.50.190)  6.962 ms *  7.085 ms
 6  212.27.38.226 (212.27.38.226)  35.379 ms  7.105 ms  7.830 ms
 7  204.245.39.50 (204.245.39.50)  19.896 ms  19.426 ms  19.355 ms
 8  208.109.115.121 (208.109.115.121)  37.931 ms  19.665 ms  19.814 ms
 9  208.109.115.162 (208.109.115.162)  19.663 ms  19.395 ms  29.670 ms
10  208.109.113.62 (208.109.113.62)  19.398 ms  19.220 ms  19.158 ms
11  * * *
12  * * *
13  * * *

As you can see, this time after 204.245.39.50 we go to 208.109.115.121 instead of 208.109.115.201. And pQd has the same traceroute. From a working place I did not cross this 204.245.39.50 router (Global Crossing).

More traceroute from working and non working place would help, but it's highly probable that Global Crossing has a bogus routing entry for 208.109.255.11/32 and 216.69.185.11/32 as 208.109.255.10, 208.109.255.12, 216.69.185.10, 216.69.185.12 are working well.

Why it has a boged routing entry is hard to know. Probably 208.109.115.201 (Go Daddy) is advertising a non working route for 208.109.255.11/32 and 216.69.185.11/32.

EDIT: You can telnet route-server.eu.gblx.net to connect to the Global Crossing route server and do traceroute from within Global Crossing network

EDIT: It seems that the same problem already occured with others NS few days ago, see: http://www.newtondynamics.com/forum/viewtopic.php?f=9&t=5277&start=0

womble · Answer 5 · 2009-07-20T02:42:44+08:00

womble

2009-07-20T02:42:44+08:002009-07-20T02:42:44+08:00

What would be handy would be to see a detailed resolution trace from the locations that are failing... see what layer of the resolution path it's failing on. I'm not familiar with the service you're using, but perhaps it's an option somewhere.

Failing that, it's most likely that the problems are "lower down" in the tree, as failures at the root or TLDs would affect more domains (you'd hope). To increase resilience, you can delegate to a second DNS service to ensure better redundancy in resolution if there are problems with domaincontrol's network(s).

2

Paul Tomblin · Answer 6 · 2009-07-20T06:16:43+08:00

Paul Tomblin

2009-07-20T06:16:43+08:002009-07-20T06:16:43+08:00

I'm surprised you don't host your own DNS. The advantage of doing it that way is if the DNS is reachable, so is (hopefully) your site.

2

Cian · Answer 7 · 2009-07-20T03:23:23+08:00

From UPC at least, I get this reaction when trying to get your A record from your authoritive server (ns21.domaincontrol.com).

; <<>> DiG 9.5.1-P2 <<>> @ns21.domaincontrol.com serverfault.com
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 38663
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;serverfault.com.       IN  A

;; Query time: 23 msec
;; SERVER: 216.69.185.11#53(216.69.185.11)
;; WHEN: Sun Jul 19 12:09:40 2009
;; MSG SIZE  rcvd: 33

When I try the same thing from a machine on a different network (OVH), I get an answer

; <<>> DiG 9.4.2-P2 <<>> @216.69.185.11 serverfault.com
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33998
;; flags: qr aa; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0

;; QUESTION SECTION:
;serverfault.com.               IN      A

;; ANSWER SECTION:
serverfault.com.        3600    IN      A       69.59.196.212

;; AUTHORITY SECTION:
serverfault.com.        3600    IN      NS      ns21.domaincontrol.com.
serverfault.com.        3600    IN      NS      ns22.domaincontrol.com.

;; Query time: 83 msec
;; SERVER: 216.69.185.11#53(216.69.185.11)
;; WHEN: Sun Jul 19 12:11:05 2009
;; MSG SIZE  rcvd: 101

I get similar behaviour for a couple of other domains, so I assume that UPC (at least) is silently redirecting DNS queries to their own caching nameserver, and spoofing the replies. If your DNS had misbehaved briefly, this could explain it as UPC's nameservers may be caching the NXDOMAIN response.

DNS failing to propagate worldwide

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?