Ping a Specific Port

Question

Andrew B

Asked: 2015-08-01 22:22:28 +0800 CST2015-08-01 22:22:28 +0800 CST 2015-08-01 22:22:28 +0800 CST

Why is geo-redundant DNS necessary for small sites?

772

This is a Canonical Question about DNS geo-redundancy.

It's extremely common knowledge that geo-redundant DNS servers located at separate physical locations are highly desirable when providing resilient web services. This is covered in-depth by document BCP 16, but some of the most frequently mentioned reasons include:

Protection against datacenter disasters. Earthquakes happen. Fires happen in racks and take out nearby servers and network equipment. Multiple DNS servers won't do you much good if physical problems at the datacenter knock out both DNS servers at once, even if they're not in the same row.
Protection against upstream peer problems. Multiple DNS servers won't prevent problems if a shared upstream network peer takes a dirt nap. Whether the upstream problem completely takes you offline, or simply isolates all of your DNS servers from a fraction of your userbase, the end result is that people can't access your domain even if the services themselves are located in a completely different datacenter.

That's all well and good, but are redundant DNS servers really necessary if I'm running all of my services off of the same IP address? I can't see how having a second DNS server would provide me any benefit if no one can get to anything provided by my domain anyway.

I understand that this is considered a best practice, but this really seems pointless!

2 Answers

Voted

Andrew B · Answer 1 · 2015-08-01T22:22:28+08:00

Note: Content in dispute, refer to comments for both answers. Errors have been found and this Q&A is in need of an overhaul.

I'm removing the accept from this answer for the time being until the state of this canonical Q&A is properly addressed. (deleting this answer would also delete the attached comments, which isn't the way to go IMO. probably going to turn it into a community wiki answer after extensive editing.)

I could quote RFCs here and use technical terms, but this is a concept that gets missed by a lot of people on both ends of the knowledge spectrum and I'm going to try to answer this for the broader audience.

I understand that this is considered a best practice, but this really seems pointless!

It may seem pointless...but it's actually not!

Recursive servers are very good at remembering when remote servers do not respond to a query, particularly when they retry and still never see a reply. Many implement negative caching of these communication failures, and will temporarily put unresponsive nameservers in the penalty box for a period of time no greater than five minutes. Eventually this "penalty" period expires and they will resume communication. If the bad query fails again they go right back into the box, otherwise it's back to business as usual.

This is where we run into the single nameserver problem:

The penalty period is by nature of implementation always going to be greater than or equal to the duration of the network problem. In almost all cases it will be greater, to a maximum of an additional five minutes.
If your single DNS server goes into the penalty box, the query associated with the failure is going to be completely dead for the full duration.
Brief routing interruptions happen on the internet more than most people realize. TCP/IP retransmissions and similar application safeguards do a good job of hiding this from the user, but it's somewhat unavoidable. The internet does a good job of routing around this damage for the most part due to safeguards built into the various standards that support the network stack...but that also includes the ones built into DNS, and having geo-redundant DNS servers is a part of that.

Long story short, if you go with a single DNS server (this includes using the same IP address multiple times across NS records), this is going to happen. It's also going to happen a lot more than you realize, but the problem will be so sporadic that the odds of the failure 1) being reported to you, 2) being reproduced, and 3) being tied to this specific problem are extremely close to zero. They pretty much were zero if you came into this Q&A not knowing how this process worked, but thankfully that shouldn't be the case now!

Should this bother you? It's not really my place to say. Some people won't care about this five minute interruption problem at all, and I'm not here to convince you of that. What I am here to convince you is that you do in fact sacrifice something by only running a single DNS server, and in all scenarios.

cnst · Answer 2 · 2017-01-07T11:00:55+08:00

OP asks:

That's all well and good, but are redundant DNS servers really necessary if I'm running all of my services off of the same IP address? I can't see how having a second DNS server would provide me any benefit if no one can get to anything provided by my domain anyway.

Great question!

The best answer is provided by Professor Daniel J. Bernstein, PhD Berkeley, who is not only a world-renowned researcher, scientist and cryptologist, but has also written a very popular and well-received DNS suite known as DJBDNS (last released 2001-02-11, still popular to this day).

http://cr.yp.to/djbdns/third-party.html (2003-01-11)

Costs and benefits of third-party DNS service

Pay attention to this short and succinct part:

Erroneous arguments for third-party DNS service

…

The second tactic is to claim that widespread DNS clients will do something Particularly Evil when they are unable to reach all DNS servers. The problem with this argument is that the claim is false. Any such client is clearly buggy, and will be unable to survive in the marketplace: consider what happens if the client's routers briefly go down, or if the client's network is temporarily flooded.

As such, the original answer for this question couldn't be more wrong.

Yes, short temporary network outages lasting a few seconds do happen every now and then. No, a failure to resolve a name during such an outage would not be cached for any number of minutes (otherwise, even having the best setup of highly-available authoritative nameservers in the world won't help).

Any software that liberally implements the conservative guideline of the up-to 5 minutes from the 1998-03 RFC to cache failures is simply broken by design, and having an extra geo-redundant server won't make a dent.

In fact, as per How long a DNS timeout is cached for?, in BIND, the SERVFAIL condition was traditionally NOT cached at all prior to 2014, and since 2015, is cached by default for only 1 second, less than what it'd take an average user to reach a resolver timeout and hit that Refresh button again.

(And even before we get to the above point of whether or not a resolution attempt should be cached, it takes more than a couple of dropped packets even for the first SERVFAIL to occur in the first place.)

Moreover, the BIND developers have even implemented a ceiling for the feature, of only 30s, which, even as a ceiling (e.g., the maximum value that the feature will ever accept), is already 10 times lower than the 5min (300s) suggestion from the RFC, ensuring that even the most well-intentioned admins (of the eye-ball users) won't be able to shoot their own users in the foot.

In addition, there are many reasons why you may not want to run a third-party DNS service -- read through the whole djbdns/third-party.html for all the details, and renting a tiny extra server just for DNS to administer by yourself is hardly warranted when no need other than BCP 16 exists for such an endeavour.

In my personal "anecdotal" experience of owning and setting up domain names since at least 2002, I can tell you with all certainty and honesty that I've actually in total did have a significant downtime of my various domains due to the professionally-run third-party servers of my registrars and hosting providers, which, one provider at a time, and over the years, all had their incidents, were unavailable, brought my domains down unnecessarily, at the same exact time when my own IP address (where the HTTP and SMTP for a given domain was hosted from) was fully reachable otherwise. Do note that these outages happened with multiple independent, respected and professionally-run providers, and are by no means isolated incidents, and do happen on a yearly basis, and, as a third-party service, are entirely outside of your control; it just so happens that few people ever talk about it long-term.

In short:

The geo-redundant DNS is NOT at all necessary for small sites.

If you're running all of your services off of the same IP address, adding a second DNS is most likely to result in an additional point of failure, and is detrimental to the continued availability of your domain. The "wisdom" of always having to do it in any imaginable situation is a very popular myth, indeed; BUSTED.

Of course, the advice would be totally different should some of the services of the domain, be that web (HTTP/HTTPS), mail (SMTP/IMAP) or voice/text (SIP/XMPP), are already serviced by third-party providers, in which case eliminating your own IP as a single-point-of-failure would indeed be a very wise approach, and geo-redundancy would indeed be very useful.

Likewise, if you have a particularly popular site with millions of visitors, and knowingly require the additional flexibility and protections of geo-redundant DNS as per BCP 16, then… You probably aren't using a single server/site for web/mail/voice/text already, so this question and answer obviously don't apply. Good luck!

Why is geo-redundant DNS necessary for small sites?

The geo-redundant DNS is NOT at all necessary for small sites.

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?