This is a Canonical Question about DNS geo-redundancy.
It's extremely common knowledge that geo-redundant DNS servers located at separate physical locations are highly desirable when providing resilient web services. This is covered in-depth by document BCP 16, but some of the most frequently mentioned reasons include:
Protection against datacenter disasters. Earthquakes happen. Fires happen in racks and take out nearby servers and network equipment. Multiple DNS servers won't do you much good if physical problems at the datacenter knock out both DNS servers at once, even if they're not in the same row.
Protection against upstream peer problems. Multiple DNS servers won't prevent problems if a shared upstream network peer takes a dirt nap. Whether the upstream problem completely takes you offline, or simply isolates all of your DNS servers from a fraction of your userbase, the end result is that people can't access your domain even if the services themselves are located in a completely different datacenter.
That's all well and good, but are redundant DNS servers really necessary if I'm running all of my services off of the same IP address? I can't see how having a second DNS server would provide me any benefit if no one can get to anything provided by my domain anyway.
I understand that this is considered a best practice, but this really seems pointless!
Note: Content in dispute, refer to comments for both answers. Errors have been found and this Q&A is in need of an overhaul.
I'm removing the accept from this answer for the time being until the state of this canonical Q&A is properly addressed. (deleting this answer would also delete the attached comments, which isn't the way to go IMO. probably going to turn it into a community wiki answer after extensive editing.)
I could quote RFCs here and use technical terms, but this is a concept that gets missed by a lot of people on both ends of the knowledge spectrum and I'm going to try to answer this for the broader audience.
It may seem pointless...but it's actually not!
Recursive servers are very good at remembering when remote servers do not respond to a query, particularly when they retry and still never see a reply. Many implement negative caching of these communication failures, and will temporarily put unresponsive nameservers in the penalty box for a period of time no greater than five minutes. Eventually this "penalty" period expires and they will resume communication. If the bad query fails again they go right back into the box, otherwise it's back to business as usual.
This is where we run into the single nameserver problem:
Long story short, if you go with a single DNS server (this includes using the same IP address multiple times across
NS
records), this is going to happen. It's also going to happen a lot more than you realize, but the problem will be so sporadic that the odds of the failure 1) being reported to you, 2) being reproduced, and 3) being tied to this specific problem are extremely close to zero. They pretty much were zero if you came into this Q&A not knowing how this process worked, but thankfully that shouldn't be the case now!Should this bother you? It's not really my place to say. Some people won't care about this five minute interruption problem at all, and I'm not here to convince you of that. What I am here to convince you is that you do in fact sacrifice something by only running a single DNS server, and in all scenarios.
OP asks:
Great question!
The best answer is provided by Professor Daniel J. Bernstein, PhD Berkeley, who is not only a world-renowned researcher, scientist and cryptologist, but has also written a very popular and well-received DNS suite known as DJBDNS (last released 2001-02-11, still popular to this day).
http://cr.yp.to/djbdns/third-party.html (2003-01-11)
Pay attention to this short and succinct part:
As such, the original answer for this question couldn't be more wrong.
Yes, short temporary network outages lasting a few seconds do happen every now and then. No, a failure to resolve a name during such an outage would not be cached for any number of minutes (otherwise, even having the best setup of highly-available authoritative nameservers in the world won't help).
Any software that liberally implements the conservative guideline of the up-to 5 minutes from the 1998-03 RFC to cache failures is simply broken by design, and having an extra geo-redundant server won't make a dent.
In fact, as per How long a DNS timeout is cached for?, in BIND, the
SERVFAIL
condition was traditionally NOT cached at all prior to 2014, and since 2015, is cached by default for only 1 second, less than what it'd take an average user to reach a resolver timeout and hit that Refresh button again.(And even before we get to the above point of whether or not a resolution attempt should be cached, it takes more than a couple of dropped packets even for the first SERVFAIL to occur in the first place.)
Moreover, the BIND developers have even implemented a ceiling for the feature, of only 30s, which, even as a ceiling (e.g., the maximum value that the feature will ever accept), is already 10 times lower than the 5min (300s) suggestion from the RFC, ensuring that even the most well-intentioned admins (of the eye-ball users) won't be able to shoot their own users in the foot.
In addition, there are many reasons why you may not want to run a third-party DNS service -- read through the whole
djbdns/third-party.html
for all the details, and renting a tiny extra server just for DNS to administer by yourself is hardly warranted when no need other than BCP 16 exists for such an endeavour.In my personal "anecdotal" experience of owning and setting up domain names since at least 2002, I can tell you with all certainty and honesty that I've actually in total did have a significant downtime of my various domains due to the professionally-run third-party servers of my registrars and hosting providers, which, one provider at a time, and over the years, all had their incidents, were unavailable, brought my domains down unnecessarily, at the same exact time when my own IP address (where the HTTP and SMTP for a given domain was hosted from) was fully reachable otherwise. Do note that these outages happened with multiple independent, respected and professionally-run providers, and are by no means isolated incidents, and do happen on a yearly basis, and, as a third-party service, are entirely outside of your control; it just so happens that few people ever talk about it long-term.
In short:
The geo-redundant DNS is NOT at all necessary for small sites.
If you're running all of your services off of the same IP address, adding a second DNS is most likely to result in an additional point of failure, and is detrimental to the continued availability of your domain. The "wisdom" of always having to do it in any imaginable situation is a very popular myth, indeed; BUSTED.
Of course, the advice would be totally different should some of the services of the domain, be that web (HTTP/HTTPS), mail (SMTP/IMAP) or voice/text (SIP/XMPP), are already serviced by third-party providers, in which case eliminating your own IP as a single-point-of-failure would indeed be a very wise approach, and geo-redundancy would indeed be very useful.
Likewise, if you have a particularly popular site with millions of visitors, and knowingly require the additional flexibility and protections of geo-redundant DNS as per BCP 16, then… You probably aren't using a single server/site for web/mail/voice/text already, so this question and answer obviously don't apply. Good luck!