Is there a way I can make Nginx to notify me if hits from a referrer goes beyond a threshold?
e.g If my website is featured at Slashdot and all of sudden I have 2K hits coming in an hour I want to be notified when goes beyond 1K hits an hour.
Will it be possible to do this in Nginx? Possibly without lua? (since my prod is not lua compiled)
I think this would be far better done with logtail and grep. Even if it's possible to do with lua inline, you don't want that overhead for every request and you especially don't want it when you have been Slashdotted.
Here's a 5-second version. Stick it in a script and put some more readable text around it and you're golden.
Of course, that completely ignores reddit.com and facebook.com and all of the million other sites that could send you lots of traffic. Not to mention 100 different sites sending you 20 visitors each. You should probably just have a plain old traffic threshold that causes an email to be sent to you, regardless of referrer.
The nginx limit_req_zone directive can base its zones on any variable, including $http_referrer.
You will also want to do something to limit the amount of state required on the web server though, as the referrer headers can be quite long and varied and you may see an infinte variet. You can use the nginx split_clients feature to set a variable for all requests that is based on the hash of the referrer header. The example below uses only 10 buckes, but you could do it with 1000 just as easily. So if you got slashdotted, people whose referrer happened to hash into the same bucket as the slashdot URL would get blocked too, but you could limit that to 0.1% of visitors by using 1000 buckets in split_clients.
It would look something like this (totally untested, but directionally correct):
The most efficient solution might be to write a daemon that would
tail -f
theaccess.log
, and keep track of the$http_referer
field.However, a quick and dirty solution would be to add an extra
access_log
file, to log only the$http_referer
variable with a customlog_format
, and to automatically rotate the log every X minutes.This can be accomplished with the help of standard logrotate scripts, which might need to do graceful restarts of nginx in order to have the files reopened (e.g., the standard procedure, take a look at /a/15183322 on SO for a simple time-based script)…
Or, by using variables within
access_log
, possibly by getting the minute specification out of$time_iso8601
with the help of themap
or anif
directive (depending on where you'd like to put youraccess_log
).So, with the above, you may have 6 log files, each covering a period of 10 minutes,
http_referer.Txx{0,1,2,3,4,5}x.log
, e.g., by getting the first digit of the minute to differentiate each file.Now, all you have to do is have a simple shell script that could run every 10 minutes,
cat
all of the above files together, pipe it tosort
, pipe it touniq -c
, tosort -rn
, tohead -16
, and you have a list of the 16 most commonReferer
variations — free to decide if any combinations of numbers and fields exceeds your criteria, and perform a notification.Subsequently, after a single successful notification, you could remove all of these 6 files, and, in subsequent runs, not issue any notification UNLESS all six of the files are present (and/or a certain other number as you see fit).
Yes, of course it is possible in NGINX!
What you could do is implement the following DFA:
Implement rate limiting, based on
$http_referer
, possibly using some regex through amap
to normalise the values. When the limit is exceeded, an internal error page is raised, which you can catch through anerror_page
handler as per a related question, going to a new internal location as an internal redirect (not visible to the client).In the above location for exceeded limits, you perform an alert request, letting external logic perform the notification; this request is subsequently cached, ensuring you will only get 1 unique request per a given time window.
Catch the HTTP Status code of the prior request (by returning a status code ≥ 300 and using
proxy_intercept_errors on
, or, alternatively, use the not-built-by-defaultauth_request
oradd_after_body
to make a "free" subrequest), and complete the original request as if the prior step wasn't involved. Note that we need to enable recursiveerror_page
handling for this to work.Here's my PoC and an MVP, also at https://github.com/cnst/StackOverflow.cnst.nginx.conf/blob/master/sf.432636.detecting-slashdot-effect-in-nginx.conf:
Note that this works as expected:
You can see that the first request results in one front-end and one backend hit, as expected (I had to add a dummy backend to the location that has
limit_req
, because areturn 200
would take precedence over the limits, a real backend isn't necessary for the rest of the handling).The second request is above the limit, so, we send the alert (getting
200
), and cache it, returning429
(this is necessary due to the aforementioned limitation that requests below 300 cannot be caught), which is subsequently caught by the front-end, which is free now free to do whatever it wants.The third request is still exceeding the limit, but we've already sent the alert, so, no new alert gets sent.
Done! Don't forget to fork it on GitHub!