While trying to figure out why our Varnish 4.1 install (on CentOS7 via the varnish-cache.org repo) was not following vcl rules set out to log the client IP address in an X-Forwarded-For header (see: Varnish 4 logging proxy/load balancer instead of client IP addresses) I happened to notice something odd while looking through varnishlog files:
- Begin req 9353447 rxreq
- Timestamp Start: 1488771709.337974 0.000000 0.000000
- Timestamp Req: 1488771709.337974 0.000000 0.000000
- ReqStart 172.25.20.65 19903
- ReqMethod GET
- ReqURL /about-us/
- ReqProtocol HTTP/1.1
- ReqHeader host: www.<notreallythishost>.com
- ReqHeader Accept: */*
- ReqHeader Accept-Encoding: gzip, deflate
- ReqHeader Cache-Control: no-cache
- ReqHeader From: bingbot(at)microsoft.com
- ReqHeader Pragma: no-cache
- ReqHeader User-Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
- ReqHeader X-Forwarded-For: 40.77.167.41
- ReqHeader X-Forwarded-Port: 80
- ReqHeader X-Forwarded-Proto: http
- ReqHeader Connection: keep-alive
- ReqUnset X-Forwarded-For: 40.77.167.41
- ReqHeader X-Forwarded-For: 40.77.167.41, 172.25.20.65
- VCL_call RECV
- ReqUnset X-Forwarded-For: 40.77.167.41, 172.25.20.65
- ReqHeader X-Forwarded-For: 172.25.20.65
- ReqUnset Accept-Encoding: gzip, deflate
- ReqHeader Accept-Encoding: gzip
- ReqUnset User-Agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
- VCL_return hash
- VCL_call HASH
- VCL_return lookup
- VCL_call MISS
- VCL_return fetch
- Link bereq 9353449 fetch
- Timestamp Fetch: 1488771709.338395 0.000421 0.000421
- RespProtocol HTTP/1.1
This fully explains why we were never able to get anything other than the load balancer IP address logged through varnishncsa at any point regardless of the logging technique we used.
It looks as though in the request processing it builds up the X-Forwarded-For header by adding the AWS Load Balancer IP address to the header, but as vc_call is called summarily decontructs it again and removes the original client IP address. So how do I go about keeping the X-Forwarded-For intact and why is Varnish shifting IPs off the left side rather than than just adding them onto the X-Forwarded-For header like it should? Bug?
So the work-around for this (possible bug in 4.1.3-1.el7.x86_64) was a clue I found while browsing other varnish logging questions, specifically one on disabling the x-forwarded-for header altogether.
While this isn't what I wanted to do, it provided the hint about preventing varnish from appending its own vcl_recv content to the buttom of my vcl_recv function definition. Specifically provide your own return (lookup) (though that's varnish <= 3) or return (hash) (varnish 4.x).
So now I have this at the top of the vcl_recv():
Then at the very end of the function I have of course as I mentioned a:
So it is now successfully logging just the client IP address as it should be with the addition of a varnishncsa option of:
Hopefully someone else finds this information useful.
UPDATE: Just as a note I found this post about spoofing in this nginx answer and it would be nice to have a sort of real_ip_from or a trusted_ip_from headers in varnish as well but it doesn't appear to natively at the moment. My initial solution will pick out the spoofed address in this case. So it would be better to have a regex with known IPs removed and you pick up the first unhandled client IP instead. Something like this would work:
where 172.25.20.65 and 172.25.10.228 are my trusted IPs (proxies or load balancers picked up and added to X-Forwarded-For, etc). Depending on whether you expect to see a proxy/LB in front of it your regex can either be this if you expect there to always be at least one load balancer/proxy in your header:
or this if it is allowable to not have anything before your varnish server:
Although if the latter is the case why would you be looking at the x-forwarded-for header...