Ping a Specific Port

Question

cherouvim

Asked: 2019-05-24 00:50:22 +0800 CST2019-05-24 00:50:22 +0800 CST 2019-05-24 00:50:22 +0800 CST

How to mitigate backend stress generated from malicious traffic

772

I want to reduce, or mitigate the effects of, malicious layer 7 traffic (targeted attacks, generic evil automated crawling) which reaches my backend making it very slow and even unavailable. This regards load-based attacks as described in https://serverfault.com/a/531942/1816

Assume that:

I use a not very fast backend/CMS (e.g ~1500ms TTFB for every dynamically generated page). Optimizing this is not possible, or simply very expensive in terms of effort.
I've fully scaled up, i.e I'm on the fastest H/W possible.
I cannot scale out, i.e the CMS does not support master-to-master replication, so it's only served by a single node.
I use a CDN in front of the backend, powerful enough to handle any traffic, which caches responses for a long time (e.g 10 days). Cached responses (hits) are fast and do not touch my backend again. Misses will obviously reach my backend.
The IP of my backend is unknown to attackers/bots.
Some use cases, e.g POST requests or logged in users (small fraction of total site usage), are set to bypass the CDN's cache so they always end up hitting the backend.
Changing anything on the URL in a way that makes it new/unique to the CDN (e.g addition of a &_foo=1247895239) will always end up hitting the backend.
An attacker who has studied the system first will very easily find very slow use cases (e.g paginated pages to the 10.000th result) which they'll be able to abuse together with random parameters of #7 to bring the backend to its knees.
I cannot predict all known and valid URLs and legit parameters of my backend at a given time in order to somehow whitelist requests or sanitize the URL on the CDN in order to reduce unnecessary requests from reaching the backend. e.g /search?q=whatever and /search?foo=bar&q=whatever will 100% produce the same result because foo=bar is not something that my backend uses, but I cannot sanitize that on the CDN level.
Some attacks are from a single IP, others are from many IPs (e.g 2000 or more) which cannot be guessed or easily filtered out via IP ranges.
The CDN provider and the backend host provider both offer some sort of DDoS attack feature but the attacks which can bring my backend down are very small (e.g only 10 requests per second) and are never considered as DDoS attacks from these providers.
I do have monitoring in place and instantly get notified when the backend is stressed, but I don't want to be manually banning IPs because this is not viable (I may be sleeping, working on something else, on vacation or the attack may be from many different IPs).
I am hesitant to introduce a per-IP limit of connections per second on the backend since I will, at some point, end up denying access to legit users. e.g imagine a presentation/workshop about my service taking place in a university or large company where from tens or hundreds of browsers will almost simultaneously be using the service from a single IP address. If these are logged in, then they'll always reach my backend and not be served by the CDN. Another case is public sector users all accessing the service from very limited amount of IP addresses (provided by the government). So this would deny access to legit users and would not help at all to attacks from many IPs each of which only does a couple of requests.
I do not want to permanently blacklist certain large IP ranges of countries which sometimes are the origins of attacks (e.g China, eastern Europe) because this is unfair, wrong, will deny access to legit users from those areas and attacks from other places will not be affected.

So, what can I do to handle this situation? Is there a solution that I've not taken into consideration in my assumptions that could help?

2 Answers

Voted

Don Zoomik · Answer 1 · 2020-02-15T14:56:12+08:00

I live in a similar environment (I don't manage it directly but I work with the team that does) and we've found two solutions that work well together. In our case, we host application ourselves so we have full control over traffic flow but the idea remains the same.

Some of your constraints are quite hard and I'd argue contradictory but I think they can be worked around. I'm not sure what your CDN is but I presume that it's a black box that you don't really control.

I would suggest setting up another (caching) layer in front of your application to control and modify traffic, we use Varnish for it - mostly caching but also for mitigating malicious traffic. I can be quite small and doesn't have to cache for as long as CDN as it should only see very little traffic.

Sanitize URIs before preseting to backend. You usually can't do it in CDN but you can do it with your own man in the middle server. Allow only known URIs and parameters, the rest will be either cut off, fed into throttling system or immediatly rejected (404, 500...).
Cache misses lead to throttling or no service. Depending on your application, even one cache miss per second (likely even less than than) per client IP can point to malicious traffic, especially if you're only seeing cache misses from CDN. This requires that you have some insight into real end-user IPs and you can probably scope it to exclude logged in users or known good IP ranges (that could have more lax throttling rules or no throttling). CloudFront adds X-Forwarded-For headers, maybe your CDN has something similar. For example there are 5 cache misses over 15 seconds per client IP, reject with an error that is not cached in CDN (429 probably) for 60 seconds. More misses over longer perios lead to longer bans. See here https://github.com/varnish/varnish-modules/blob/master/docs/vmod_vsthrottle.rst

symcbean · Answer 2 · 2020-02-14T02:08:55+08:00

symcbean

2020-02-14T02:08:55+08:002020-02-14T02:08:55+08:00

I feel your pain - but you are facing an impossible task. You can't have your cake and eat it.

Normally I would suggest fail2ban as the tool to address this (if webserver rate limiting is not an option) however not only do you explicitly say that you can't even support a temporary ban, since your traffic comes via a CDN , you'll need to build a lot of functionality to report the address and to apply blocking.

You only have 2 courses of action left which I can see:

1) flatten the site into html files and serve them as static content

2) get a job somewhere else

1

How to mitigate backend stress generated from malicious traffic

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?