I recently updated the settings on a public facing NGINX instance to add support for http2. While looking at the logs afterwards to get feel for how often it was being used I saw rapid rise in new log entries not related to site hosted.
First were a bunch of entries making CONNECT
requests, these are all failing with 400 errors because the NGINX instance is not configured as a forward proxy. I've set up fail2ban rules to drop traffic from the many source IP addresses. I'm not particularly worried about this (please add a comment if I should be).
The next set of entries are GET
requests but rather than having paths, they have full URLs as the target e.g.
222.223.121.231 - - [16/Jul/2020:12:57:37 +0100] "GET http://api.gxout.com/proxy/check.aspx HTTP/1.1" 404 199 "http://api.gxout.com/proxy/check.aspx" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
Most of these are getting 404 responses again as expected and I've added another fail2ban rule to drop packets from the source IP addresses (again not really bothered about these).
There are some more similar that are getting 200 responses and these are the ones I'm worried about e.g.
35.236.60.202 - - [16/Jul/2020:11:52:28 +0100] "GET http://www.nike.com/ HTTP/1.1" 200 396 "-" "python-requests/2.20.0"
I have the following questions:
- Why would NGINX return a 200 for this request?
- Suggestions for how to debug this?
All the incoming traffic should be https (required or http2) and I'm pinned at TLS 1.2 or 1.3 so I don't think capturing the traffic with tcpdump is going to help (I'm assuming I can't feed the private key into wireshark and decode the packets?).
The only other option I can think of is adding some custom logging (Is it possible to log the response data in nginx access log?) to the NGINX to log the whole request/response. I've done this in the past to debug oAuth2.0 token exchange issues but only on a system where I had full control over all the incoming traffic.
I think there's no need to further debug this as some things are obvious:
The
python-requests/2.20.0
as User-Agent indicates some Python script. The popularrequests
Python library makes it very easy to write simple bots, whether good or bad.Returning 200 to an unknown hostname may be quite typical if you have a default server in NGINX which allows for response to any
Host:
header.Pardon my wording, but by default, the default server in NGINX will respond to any
Host:
. Then what it takes for a200
to be returned is for your app to not check domain name and not issuing a redirect for the canonical domain name of your website.As in a typical situation ", you know what domains you host", any requests with an alien domain name (or none), can be deemed as malicious/unwanted.
You may want to look at honeypot blocking approach for such requests where "domain is not yours" - the majority of malicious/bad bots will actually come up with nothing but bare IP as the value of the
Host:
header, simply because they are lazy to check what domains are located on a given IP (mind that they find their victims simply through enumerating networks/IP addresses).As for the requests with a full URL instead of URI, this can be anything including badly written bots, proxy checkers, etc.
If you have a lot of those requests, and generating a 404 hits your backend, I would recommend denying this in config directly with a simple rule and possibly adding an instant block as opposed to using Fail2ban.