I have a NodeJS server with HTTPS and certificates. The DNS is on CloudFlare.
The host receives a connection every day at 3:32am from 3.8.236.120
, an AWS host in the UK. The crawler launches requests to the same 4 pages repeatedly around that time, 26 requests in 5 minutes. Although the number seems manageable, this is the most likely culprit, and the only activity the web server shows at that time every day.
I blocked the IP in the CloudFlare Web Application Firewall, along with others that have attacked in the past:
(ip.src in {194.48.199.78 5.43.32.229 92.220.10.100 107.172.137.111 193.169.254.179 149.3.170.66 47.243.233.244 3.8.236.120}) or (ip.geoip.country in {"IN" "ID"})
I checked that the rule is enabled. And yet, this IP address is still connecting to the server via DNS. Here is the server log for one connection:
GET /pt/sumario 200 143.483 ms - 13972
[2024-02-16T03:31:18.155Z] /pt/amostras :: ::ffff:3.8.236.120 :: Python-urllib/3.11
[2024-02-16T03:31:18.156Z] Host = ginja.org
I suspect there is some setting on CloudFlare that conflicts with this rule. How can I block all requests from this IP address?
Update
To answer the comments:
- I blocked the IPv6 address on CloudFlare and still get traffic from it.
- I don't know if the crawler accessed
robots.txt
because it was in the public directory and I didn't track its visits. I am tracking them now. - One request every 12 seconds seems light indeed. But I get traffic from this crawler every day at the same time, and some days my server is down at that time, so I want to rule out this bot first. Also, I have another server checking every minute whether this one is up and running, and on the days I consider under attack, the server doesn't log any activity from the health checker, so it's possible that the crawler floods my server with so many requests at the same time that they don't get logged.