We run public-facing recursive DNS servers on Linux machines. We've been used for DNS amplification attacks. Are there any recommended iptables
rules that would help mitigate these attacks?
The obvious solution is just to limit outbound DNS packets to a certain traffic level. But I was hoping to find something a little bit more clever so that an attack just blocks off traffic to the victim IP address.
I've searched for advice and suggestions, but they all seem to be "don't run public-facing recursive name servers". Unfortunately, we are backed into a situation where things that are not easy to change will break if we don't do so, and this is due to decisions made more than a decade ago before these attacks were an issue.
The whole thing kind of reeks of a "not my problem" scenario that's not really your fault and should/could be 100% resolved by taking the appropriate action, regardless of how "difficult" or "hard" it is, and that's terminating your open recursive server.
Phase it out: tell the customers that this server is going away as of X date. After that time, they need to install a patch (assuming you have one) to stop it from using your DNS server. This is done all the time. Sysadmins, network admins, helpdesk guys, programmers? We get it; this end-of-life thing happens all the time, because its standard operating procedure for a vendor/service provider/partner to tell us to stop using something after X date. We don't always like it, but its a fact of life in IT.
You say you don't have this issue on the current devices, so I'm assuming you've resolved this issue with a firmware update or patch. I know you said you can't touch the device, but surely they can? I mean, if they're allowing these boxes to essentially phone home to you, they can't really be that anal about who's doing what to their devices; you could have a reverse proxy setup for all they know, so why not have them install a patch that fixes this or tell them to use their own DNS servers. Surely your device supports DHCP; I can't think of a network device (not matter how old/frail/odd) that doesn't.
If you can't do that, the next thing to do is control who can access your recursive server: you say that it's "hard to tell" who's using it and how, but it's time to find out for certain and start dropping traffic that's not legitimate.
These are "quasi-military/government" organizations, right? Well, they likely are part of a legitimate netblock that they own; these devices aren't home routers behind dynamic IPs. Find out. Contact them, explain the problem and how you are saving them a lot of money by not forcing a firmware or product replacement if only they can confirm the netblock/IP address that the device will be using to access your DNS server.
This is done all the time: I have several customers who restrict extranet access or HL7 listeners to healthcare partners in this way; it's not that hard to get them to fill out a form and provide the IP and/or netblock I should be expecting traffic from: if they want access to the extranet, they have to give me an IP or subnet. And this is rarely a moving target so it's not like you're going to get inundated with hundreds of IP change requests every day: big campus hospital networks that own their own netblocks with hundreds of subnets and thousands and thousands of host IPs routinely give me a handful of IP addresses or a subnet I should be expecting; again, these aren't laptop users wandering all around campus all the time, so why would I expect to see UDP source packets from an ever-changing IP address? Clearly I'm making I'm an assumption here, but I'll bet it's not as much as you think for < 100s of devices. Yes, it'll be a lengthy ACL, and yes, it requires some maintenance and communication (gasp!) but its the next best thing outside of shutting it down completely.
If for some reason the channels of communication are not open (or somebody's too afraid or can't be bothered to contact these legacy device owners and do this properly), you need to to establish a baseline of normal usage/activity so you can formulate some other strategy that will help (but not prevent) your participation in DNS amplification attacks.
A long-running
tcpdump
should work filtering on incoming UDP 53 and verbose logging on the DNS server application. I would also want to start collecting source IP addresses/netblocks/geoIP information (are all your clients in the US? Block everything else) because, as you say, you're not adding any new devices, you're merely providing a legacy service to existing installations.This will also help you understand what record types are being requested, and for what domains, by whom, and how often: for DNS amplification to work as intended, the attacker needs to be able to request a large record type (1) to a functioning domain (2).
"large record type": do your devices even need TXT or SOA records to be able to be resolved by your recursive DNS server? You may be able to specify which record types are valid on your DNS server; I believe it's possible with BIND and perhaps Windows DNS, but you'd have to do some digging. If your DNS server responds with
SERVFAIL
to any TXT or SOA records, and least that response is an order of magnitude (or two) smaller than the payload that was intended. Obviously you're still "part of the problem" because the spoofed victim would still be getting thoseSERVFAIL
responses from your server, but at least you're not hammering them and perhaps your DNS server gets "delisted" from the harvested list(s) the bots use over time for not "cooperating"."functioning domain": you may be able to whitelist only domains that are valid. I do this on my hardened data center setups where the server(s) only need Windows Update, Symantec, etc. to function. However, you're just mitigating the damage you're causing at this point: the victim would still get bombarded with
NXDOMAIN
orSERVFAIL
responses from your server because your server would still respond to the forged source IP. Again, Bot script might also automatically update it's open server list based on results, so this could get your server removed.I'd also use some form of rate limiting, as others have suggested, either at the application level (i.e. message size, requests per client limitations) or the firewall level (see the other answers), but again, you're going to have to do some analysis to ensure you're not killing legitimate traffic.
An Intrusion Detection System that's been tuned and/or trained (again, need a baseline here) should be able to detect abnormal traffic over time by source or volume as well, but would likely take regular babysitting/tuning/monitoring to prevent false positives and/or see if it's actually preventing attacks.
At the end of the day, you have to wonder if all this effort is worth it or if you should just insist that the right thing is done and that's eliminating the problem in the first place.
It depends on the kind of rate limiting you want to do.
Rate limiting with
iptables
is really more intended for limiting incoming packets, since packets up to the limit will match the filter and have the specified target applied (e.g.,ACCEPT
). You would presumably have a subsequent target toDROP
packets not matched by the filter. And althoughiptables
has aQUEUE
target it merely passes the packet to user space where you need to supply your own queuing application. You can also rate limit outgoing packets, but few people really want to start dropping outgoing traffic.iptables
rate limit dropping:Using
hashlimit
rather thanlimit
will give you rate limiting per destination IP. I.e., five packets to 8.8.8.8 that hit the limit will prevent packets being sent to 8.8.4.4 while withhashlimit
if 8.8.8.8 is maxed you can still reach 8.8.4.4, which sounds more like what you want.If you don't want packets past the limit to be dropped then what you really want is
tc
.tc
will regulate on the flow to get a nice steady stream rather than lots of bursty traffic. On the incoming side packets are delivered to the application slower but will all arrive in order. On the outgoing packets will leave your application as fast as possible but placed on the wire in a consistent stream.I haven't used
tc
much, but here's an example of rate limiting ICMP which you can probably adapt easily for DNS.Here's one thing you can do to potentially mitigate responses to spoofed queries, but it takes some work:
First, take a look at your log of the security channel and find an IP address that is getting spoofed.
Then run a tcpdump using that source IP (10.11.12.13) like this:
tcpdump -n src 10.11.12.13 and udp dst port 53 -v -X -S
You'll get something like this:
Now the fun part! Open up rfc1035 at https://www.rfc-editor.org/rfc/rfc1035 and turn to section 4.1.1.
It's time to translate the results of the tcpdump and figure out a pattern we can use to create a packet level filter.
The ID of the header starts at 0x1C, so we've got some flags at 0x1E, the QDCOUNT at 0x20, the ANCOUNT at 0x22, the NSCOUNT at 0x24 and the ARCOUNT at 0x26.
That leaves the actual question at 0x28, which in this case is null (ROOT) for the NAME, 0xFF for QTYPE = ANY, and 0x01 for QCLASS = IN.
To make a longish story short, I've found that adding the following iptables rule blocks over 95% of the spoofed queries which are requesting ANY records IN ROOT:
iptables -A INPUT -p udp --dport domain -m u32 --u32 "0x28=0x0000ff00" -j DROP
Your mileage may vary... hope this helps.
Using
tc
and queueing disciplines in linux for outbound port 53 UDP:Will set you up with a
disc
limited to 10mbit for any packet with firewall mark '1'. Firewall markings are only internal to the firewall, and don't modify the packet. Just the handling of the packet by the queueing discipline. Here is how you useiptables
to make the firewall markings:Modify to your liking to exclude trusted subnets and/or destinations. The
-o eth0
limits the shaping to outbound packets only. Hope this helps.depending on the network 'position' you're in [having multiple bgp feeds or being at the 'end' of the internet - as a stub network] you can try something like uRPF to prevent source address spoofing.
other source of info.
I'd try to compose a list of all clients that rely on your externally facing recursive resolvers. Start with a day or so of packet traces on the DNS boxes. From there, begin creating iptables rules to allow that traffic you recognize and authorize. The default will eventually be to drop traffic to 53/tcp and 53/udp. If that breaks something, fine tune your rules.
Are these devices still under a support contract? If so, reach out to your customers. Let them know that the internet has evolved a little in the last decade and in order to continue to provide name resolution for these devices you'll need to know the SRC IP to expect queries from. Set a date ~6 months in the future at which time you will no longer be able to service unknown clients, and stick to it. This is a pretty common in the industry. If these devices are no longer under a support contract... sounds like a business decision. How long does your company intend to expend resources on ancient product that no longer generates revenue?
These sound like specialized devices, are they so specialized that you can reasonably predict which domains to expect legitimate queries for? bind supports views, create a public view that only does recursion for those domains.
Use this as a learning opportunity, if you have not already done so, stop releasing products where you do not have the ability to fix bugs. That's what this is, a bug. One that will certainly EOL this device prematurely, sooner or later.
From nanog somewhere, this:
This is not ideal. It might be better to allow fewer packets per second, and have a higher burst.
Here is a solution that I've used a couple of times against DDOS attacks, it's not perfect but helped me out. The solution consists in a script that is being called each N minutes(like 1,2,3 etc. minutes) by the cron and blocks IP's that are creating a number of connections bigger then the given one in the script: