When nginx's proxy_pass
returns a 502, there may be a broad range of reasons. What I want is to be able to detect when 502 was returned because upstream host was not found (that is, failed to resolve).
I know of proxy_intercept_errors
, but it doesn't seem to be helpful in my case.
What I have
I have an nginx gateway server running on a Kubernetes pod. It is configured to route requests to appropriate Kubernetes services according to the first part of the hostname (the word before the first dot, e.g. service-name.example.com
should route to a service called service-name
).
Here is a simplified config section responsible for this logic:
server {
listen 80;
resolver 172.16.2.3; // Pod IP address
server_name "~^(?<svc>[\w-]+)\.";
location / {
# Each Kubernetes service has an internal domain name matching the following pattern
proxy_pass "http://$svc.default.svc.cluster.local";
proxy_set_header Host $host;
# Proxy `X-Forwarded` headers sent by ELB: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/x-forwarded-headers.html
proxy_set_header X-Forwarded-For $http_x_forwarded_for;
proxy_set_header X-Forwarded-Port $http_x_forwarded_port;
proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto;
}
}
Problem
No matter why upstream is not accessible (if it refuses connections, fails internally or just does not exist), nginx returns 502. It's only the nginx error log where you can see the actual cause.
Since the gateway is publicly available through AWS ELB, it gets often accessed by IP or just random names, which creates noise in monitors set up to react on 5XX error spikes.
What I want to do
Set up nginx to return some less aggressive error (say, 404) in case if the service's hostname can't be resolved by Kubernetes resolver.
For example, I send the following request:
curl -H "Host: non-existent-service.example.com" http://gateway.example.com
I want nginx to be able to detect the fact that the hostname corresponding to the service could not be internally resolved, and then return a 404 instead of 502.
Currently the logs look as follows:
error log:
2017/11/10 16:03:58 [error] 22#22: *482894 non-existent-service.default.svc.cluster.local could not be resolved (3: Host not found), client: 172.16.1.2, server: ~^(?<svc>[\w-]+)\., request: "GET / HTTP/1.1", host: "non-existent-service.example.com"
access log:
172.16.1.2 - - [10/Nov/2017:16:03:58 +0000] "non-existent-service.example.com" "GET / HTTP/1.1" 502 173 "-" "curl/7.43.0" "194.126.122.250" "EE"
UPDATE
Should have mentioned this in the first place. A "catch-all" default server block was the first thing to try. Turned out that this block never gets reached, because virtually any hostname matches the regexp.
Just re-enable the default virtual host and ignore anything that hits it (as such requests are querying the IP directly, or are malicious).
For example, as seen in the nginx 1.12.x
nginx.conf
: