We're interested in restricting the number of requests per second and/or available bandwidth to HTTP clients, to stop accidental DoS. We provide free scientific data and web services, and sadly some users' scripts aren't well behaved.
I know there's lots of Apache mods that allow you to throttle per client IP address, but the problem is, sometimes we see people doing distributed crawling from their clusters (today this caused a load average > 200 incident!).
What I'd really like to do is throttle per /24 subnet, but without having to specify which subnets in advance.
Ideally, I'd also like to be able to do this as a proportion of a maximum cap, so if we're only seeing requests from one subnet, they get to use all the server's resources, but if two subnets are competing, they get to use half each.
Is this possible with either:
- Apache mods
- Traffic control
- Proxy server
- Something else?
Thanks!
EDIT: Couple of further things... If anything needs to be done at the network infrastructure level (e.g. routers) that's out of our responsibility and becomes an instant PITA. So I'm hoping to find a solution that only requires changes at the server level. Also please don't be offended if I take a while to pick a winner, this is a new topic to me so I want to read around the suggestions a bit :-)
If you are using HAProxy or can use it check see if this blog post helps
</end_shameless_promotion_of_a_fellow_admin_and_company :)>
Be very careful. Simply slowing the network down means that you will be compounding any DOS attack - you need to limit connections before they arrive at the webserver.
Consider - disks are very slow, and only handle one request at a time. One of the most important factors in determining webserver performance is the amount of I/O caching the OS can do - and this is limited by the amount of free memory on the system. Whenever a request comes in, an Apache process (or thread) is scheduled to handle it. That process will sit and hog memory and CPU for the whole time it needs to compose the response and send it across the internet to the client. Denying this memory to the I/O cache. One way to minimise the impact of this is to use a suitable reverse proxy in front of the webserver - e.g. squid which tuns as a single threaded server.
Assuming you can avoid the problem of gumming up your webserver, then you might want to have a look at running a traffic shaper at the perimeter of your network. Linux now comes with tc as standard.
(/me just googled 'linux tc' and got a picture of a girl in a bikini ;)
In terms of identifying crawlers / real DDOS, then the answer is a lot more tricky. Certainly there's no off-the-shelf solution which works reliably for HTTP that I am aware of. However it should be possible to amend the detector in fail2ban to trigger lockout or throttling where you can detect an aberrant pattern. And the basic package can interpret high volumes of requests from a particular end point as such a pattern.