This is a Canonical Question about securing public DNS resolvers
Open DNS servers seem pretty neat and convenient, as they provide IP addresses that we can use consistently across our company regardless of where they are located. Google and OpenDNS provide this functionality, but I'm not sure that I want these companies to have access to our DNS queries.
I want to set up something like this for use by our company, but I hear a lot about this being dangerous practice (particularly in regards to amplification attacks) and I want to make sure that we do this right. What things do I need to keep in mind when building this type of environment?
There are a few things you need to understand going into this:
This is a network engineering problem.
Most of the people who are looking to set up this type of environment are system administrators. That's cool, I'm a system administrator too! Part of the job is understanding where your responsibilities end and someone else's begins, and believe me, this is not a problem system administrators can solve on their own. Here's why:
This is not a best practice. The best practice is not to do this.
It's very easy to set up an internet facing DNS resolver. It takes far less research to set one up than to understand the risks involved in doing so. This is one of those cases where good intentions inadvertently enable the misdeeds (and suffering) of others.
Google and OpenDNS do this, so why can't I?
Sometimes it's necessary to weigh enthusiasm against reality. Here are some hard questions to ask yourself:
Is this something you want to set up on a whim, or is this something you have a few million dollars to invest in doing it right?
Do you have a dedicated security team? Dedicated abuse team? Do both of them have the cycles to deal with abuse of your new infrastructure, and complaints that you'll get from external parties?
Do you have a legal team?
When all of this is said and done, will all of this effort even remotely begin to pay for itself, turn a profit for the company, or exceed the monetary value of dealing with the inconvenience that led you in this direction?
In closing, I know this thread is Q&A is kind of a letdown for most of you who are being linked to it. Serverfault is here for providing answers, and an answer of "this is a bad idea, don't do it" isn't usually perceived as very helpful. Some problems are much more complicated than they appear to be at the outset, and this is one of them.
If you want to try to make this work, you can still ask us for help as you try to implement this kind of solution. The main thing to realize is that the problem is too big by itself for the answer to be provided in convenient Q&A format. You need to have invested a significant amount of time researching the topic already, and approach us with specific logic problems that you've encountered during your implementation. The purpose of this Q&A is to give you a better understanding of the larger picture, and help you understand why we can't answer a question as broad as this one.
Help us keep the internet safe! :)
Whether you are running an open DNS recursor or an authoritative DNS server, the problem is the same and most of the possible solutions are also the same.
The best solution
DNS cookies is a proposed standard which gives DNS servers a way to require clients to send a cookie in order to prove that the client IP address has not been spoofed. This will cost one additional roundtrip for the first lookup, which is the lowest overhead any solution could offer.
Fallback for older clients
Because DNS cookies are not yet standardized it will of course be necessary to support older clients now and for years to come.
You can rate limit requests from clients without DNS cookie support. But rate limits make it easier for an attacker to DoS your DNS server. Beware that some DNS servers have a rate limit feature designed only for authoritative DNS servers. Since you are asking about a recursive resolver, such rate limiting implementations may not be applicable to you. The rate limit by design will become the bottleneck for your server, and thus an attacker will need to send you less traffic in order to cause legitimate requests to be dropped than he would have if there was no rate limit.
One advantage of rate limits is that in case an attacker does flood your DNS server with DNS requests, you are more likely to have capacity left over that will allow you to ssh to the server and investigate the situation. Additionally rate limits can be designed to primarily drop requests from client IPs sending many requests, which may be enough to protect you against DoS from attackers who don't have access to spoof client IPs.
For those reasons a rate limit a little under your actual capacity may be a good idea, even if it doesn't actually protect against amplification.
Using TCP
It is possible to force a client to use TCP by sending an error code indicating that the answer is too large for UDP. This has a couple of drawbacks. It costs two additional roundtrips. And some faulty clients do not support it.
The cost of two additional roundtrips can be limited to only the first request using this approach:
When the client IP has not been confirmed, the DNS server can send a truncated response to force the client to switch to TCP. The truncated response can be as short as the request (or shorter if the client uses EDNS0 and the response does not) which eliminates the amplification.
Any client IP which completes a TCP handshake and send a DNS request on the connection can be temporarily whitelisted. Once whitelisted that IP gets to send UDP queries and receive UDP responses up to 512 bytes (4096 bytes if using EDNS0). If a UDP response triggers an ICMP error message, the IP is removed from the whitelist again.
The method can also be reversed using a blacklist, which just means that client IPs are allowed to query over UDP by default but any ICMP error message cause the IP to be blacklisted needing a TCP query to get off the blacklist.
A bitmap covering all relevant IPv4 addresses could be stored in 444MB of memory. IPv6 addresses would have to be stored in some other way.
I do not know if any DNS server has implemented this approach.
It has also been reported that some TCP stacks can be exploited in amplification attacks. That however applies to any TCP based service and not just DNS. Such vulnerabilities should be mitigated by upgrading to a kernel version where the TCP stack has been fixed to not send more than one packet in response to a SYN packet.