I'm having an issue with a certain individual who keeps scraping my site in an aggressive manner; wasting bandwidth and CPU resources. I've already implemented a system which tails my web server access logs, adds each new IP to a database, keeps track of the number of requests made from that IP, and then, if the same IP goes over a certain threshold of requests within a certain time period, it's blocked via iptables. It may sound elaborate, but as far as I know, there exists no pre-made solution designed to limit a certain IP to a certain amount of bandwidth/requests.
This works fine for most crawlers, but an extremely persistent individual is getting a new IP from his/her ISP pool each time they're blocked. I would like to block the ISP entirely, but don't know how to go about it.
Doing a whois on a few sample IPs, I can see that they all share the same "netname", "mnt-by", and "origin/AS". Is there a way I can query the ARIN/RIPE database for all subnets using the same mnt-by/AS/netname? If not, how else could I go about getting every IP belonging to this ISP?
Thanks.
whois [IP address]
(orwhois -a [IP Address]
) will usually give you a CIDR mask or an address range that belongs to the company/provider in question, but parsing the results is left as an exercise for the reader (there are at least 2 common whois output formats).Note that such wholesale blocking can also potentially knock out legitimate users. Before taking this approach you should contact the abuse desk at the ISP in question (usually listed in the
whois
information for their netblock or DNS domain, otherwise abuse@ is a good place to start) to see if the situation can be resolved diplomatically rather than technically.Also note that there are some pre-made solutions to limit requests per second by IP - Check out mod-qos or your system's firewall/traffic shaping capibilities.
Figured it out on my own. Sort of.
robtex.com lists all announced IP ranges for a given AS at: http://www.robtex.com/as/as123.html#bgp
Still don't know how or where robtex retrieves this info from. If someone else wants to chime in and explain where the data comes from, that would be great.
Since you have access to iptables, I will assume you have a root access on the system anyway. In this case, I would suggest instlling Fail2Ban which will just block an IP (for a certain time you decide) if they try to abuse a service (HTTP, DNS, Mail, SSH ..etc) by hitting the service port as N times within X period. (all users decided.)
I am using that on my server and I am getting very good results. specially with those chinease hackers wanting to hit into my SSH.
hit my home page for more information. I have a blog post all about fail2ban.
You can use Hurricane Electric's BGP Service.
If you have an IP address and want to know all address blocks registered to the same ASN, do this:
You can try this tool. It is not fast, but working.