I periodically check my server logs and I notice a lot of crawlers search for the location of phpmyadmin, zencart, roundcube, administrator sections and other sensitive data. Then there are also crawlers under the name "Morfeus Fucking Scanner" or "Morfeus Strikes Again" searching for vulnerabilities in my PHP scripts and crawlers that perform strange (XSS?) GET requests such as:
GET /static/)self.html(selector?jQuery(
GET /static/]||!jQuery.support.htmlSerialize&&[1,
GET /static/);display=elem.css(
GET /static/.*.
GET /static/);jQuery.removeData(elem,
Until now I've always been storing these IPs manually to block them using iptables. But as these requests are only performed a maximum number of times from the same IP, I'm having my doubts if it does provide any advantage security related by blocking them.
I'd like to know if it does anyone any good to block these crawlers in the firewall, and if so if there's a (not too complex) way of doing this automatically. And if it's wasted effort, maybe because these requests come from from new IPs after a while, if anyone can elaborate on this and maybe provide suggestion for more efficient ways of denying/restricting malicious crawler access.
FYI: I'm also already blocking w00tw00t.at.ISC.SANS.DFind:)
crawls using these instructions: http://spamcleaner.org/en/misc/w00tw00t.html
We use Cisco hardware-based firewalls rather than server software-based ones and they watch out for patterns of activity and block them for quite a while (30-90 days iirc). I'm sure other firewalls can do this but don't have experience. Basically what I'm saying is that if your firewall can use rules to look for abuse then you'll see the benefit over simply blocking known culprits.
If it is worthwhile is debatable and I don't really know.
As far as your complaint about the fact that they come from different IPs and you can only react by blocking the ip... You can fix this with a reverse proxy like Apache in reverse proxy mode (with something like mod_proxy / mod_security) or HAProxy. Basically if you know the patterns ahead of time you can just drop those requests before they even get to the webserver.
Also, for a bit of vocabulary these firewalls are call Web Application Firewalls (WAFs). They operate on Layer 7 by examining the HTTP requests and responses.
you could always take some of the strings/GETs you're finding and since you already have the string module for iptables, log/drop those packets, and potentially automate adding them to a firewall with a script.
generally speaking, i would say you're good to block those IPs, because they may have been compromised in some way or another, and if they've been compromised, and you're catching one attack, you might be missing another.