We've recently begun having trouble with web-scraper/DDoS service 80legs taking down our servers a couple times per week due to their abusive crawling practices. Initially we were simply dropping in the following at the bottom of the affected sites' .htaccess
files:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*80legs
RewriteRule .* - [F,L]
</IfModule>
However, it's getting to the point where we just need to block them at the server level across all servers.
According to the Apache docs this config is valid to place in the Server config section, aka httpd.conf
, but doing this does not have an effect. Is there a particular approach we can take to block/deny/redirect requests based on User-Agent at the server level on an Apache server with Virtual Hosts enabled?
Note: it is not possible to block this at the firewall level because:
- 80legs uses what is essentially an opt-in botnet to crawl pages. Their last "incident" involved 5250 unique IPs from approximately 900 different networks/IP blocks from around the world.
- We do not currently have the ability to do deep-packet inspection.
According to http://www.80legs.com/spider.html their user-agent string is
008
, not "80legs" that you used.Additionally, they say that their crawler respects
robots.txt
file, so you should give that a try.Update your robots.txt to contain:
Sorry, but I don't know what you mean by
we just need to block them at the server level across all servers
if it is notblock this at the firewall
.Indeed, that is exactly where I would block them. Using fail2ban.
Doesn't matter - it's trivial to script the action to set the block to (say) an 8-bit network - or if you're feeling adventurous map out the ASN and block that. Using very long rule chains can impact performance (but a lot less than allowing the traffic through by the sound of things) but you just adjust the duration the ban to prevent this.
Not needed - you use Apache to handle the HTTP traffic and redirect to a script which triggers fail2ban to implement it's action.