With the nginx HttpLimitReq module requests can be limited by IP. However, I'm not understanding what the "nodelay" option does.
If the excess requests within the limit burst delay are not necessary, you should use the nodelay
limit_req zone=one burst=5 nodelay;
TL;DR: The nodelay option is useful if you want to impose a rate limit without constraining the allowed spacing between requests.
I had a hard time digesting the other answers, and then I discovered new documentation from Nginx with examples that answers this: https://www.nginx.com/blog/rate-limiting-nginx/
Here's the pertinent part. Given:
If you add nodelay:
The documentation here has an explanation that sounds like what you want to know:
From what I understand, requests over the burst will be delayed (take more time and wait until they can be served), with the
nodelay
options the delay is not used and excess requests are denied with a 503 error.This blog post (archive.org) gives good explanation how the rate limiting works on nginx:
The way I see it is as follows:
Requests will be served as fast as possible until the zone rate is exceeded. The zone rate is "on average", so if your rate is
1r/s
and burst10
you can have 10 requests in 10 second window.After the zone rate is exceeded:
a. Without
nodelay
, further requests up toburst
will be delayed.b. With
nodelay
, further requests up toburst
will be served as fast as possible.After the
burst
is exceeded, server will return error response until the burst window expires. e.g. for rate1r/s
and burst10
, client will need to wait up to 10 seconds for the next accepted request.The setting defines whether requests will be delayed so that they conform to the desired rate or whether they will be simply rejected...somewhat whether the rate limiting is managed by the server or responsibility is passed to the client.
nodelay
presentRequests will be handled as quickly as possible; any requests sent over the specified limit will be rejected with the code set as
limit_req_status
nodelay
absent (aka delayed)Requests will be handled at a rate that conforms with the specified limit. So for example if a rate is set of 10 req/s then each request will be handled in >= .1 (1/rate) seconds, thereby not allowing the rate to be exceeded, but allowing the requests to get backed up. If enough requests back up to overflow the bucket (which would also be prevented by a concurrent connection limit), then they are rejected with the code set as
limit_req_status
.The gory details are here: https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_limit_req_module.c#L263 where that logic kicks in when the limit has not yet been passed and now the delay is optionally going to be applied to the request. The application of
nodelay
in particular from the directive comes into play here: https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_limit_req_module.c#L495 causing the value ofdelay
above to be 0 triggering that handler to immediately returnNGX_DECLINED
which passes the request to the next handler (rather thanNGX_AGAIN
which will effectively requeue it to be processed again).I didn't understand that at the first time when I was reading the introduction from https://www.nginx.com/blog/rate-limiting-nginx/.
Now I am sure I understand and my answer is so far the best. :)
Suppose:
10r/s
is set, the server's max capability is e.g.10000r/s
which is10r/ms
and there is only 1 client at the moment.So here's the main difference between
10r/s per IP burst=40 nodelay
and10r/s per IP burst=40
.As the https://www.nginx.com/blog/rate-limiting-nginx/ documented (I strongly recommend reading the article first(except the Two-Stage Rate Limiting section)), this behaviour fixes one problem. Which one?:
Check the draft I made, the
40th
request gets response at1s
while the other40th
gets response at4s
.This can make the best use of the server's capability: sends back responses as quick as possible while still keeping the
x r/s
constraint to a given client/IP.But there's also cost here. The cost will be:
If you have many clients queuing on the server let's say client
A
,B
andC
.Without
nodelay
, the requests are served in an order similar toABCABCABC
.With
nodelay
, the order is more likely to beAAABBBCCC
.I would like to sum up the article https://www.nginx.com/blog/rate-limiting-nginx/ here.
Above all, the most important configuration is
x r/s
.x r/s
only, excess requests are rejected immediately.x r/s
+burst
, excess requests are queued.1.
vs2.
, the cost is that on the client side, the queued requests take up the chances of later reuqests which will have had the chance of getting served.For example,
10r/s burst=20
vs10r/s
, the11th
request is supposed to be rejected immediately under the latter condition, but now it is queued and will be served. The11th
request takes up the21th
request's chance.x r/s
+burst
+nodelay
, already explained.P.S. The Two-Stage Rate Limiting section of the article is very confusing. I don't understand but that doesn't seem to matter.
For example:
8 r/s? seriously? There are 17 requests within 3 seconds shown in the image, 17 / 3 = 8?