I work for a site that often gets attacked by bot networks. We have started to use this tool: http://deflate.medialayer.com/ which auto-bans IPs that have more open connections than the set value. By default it's set to 150, we are currently using 250.
I would like to know, how low is safe so that search bots and normal visitors do not get blocked?
I would like to know, how low is safe so that search bots and normal visitors do not get blocked?
As low as you can get without search bots and normal visitors being blocked. (i.e. "There's no way to tell without empirical data - It depends on your site, the kind of browsers/crawlers being used and how many simultaneous connections they'll attempt to open, whether or not users are behind proxies/NATs that make many users appear to come from one IP address, etc. etc. etc.")
Practical advice: If you want to use an auto-ban like this err on the side of false negatives (allowing attacks to continue) rather than false-positives (banning legit users). 200 simultaneous connections from one IP seems to be a reasonable value, provided you don't have hundreds of users behind a proxy somewhere all looking like one IP and all hitting the site at once.
A typical browser will not make more than a dozen connections at a time. The problem you'll have however is people behind NATs, particularly large networks, where a dozen people connecting at the same time might send the concurrent connection max over a hundred.
There's really no great answer for this Question. The best we can say is to try and see. You may be able to setup two levels, at 250 they're blocked and at some new proposed limit (say 100) they're logged so you can review the log and see if any legitimate traffic every hits that number.
If there's any other log showing some sort of erroneous activity by the bots you may benefit from using Fail2Ban (or something similar). Say if they're constantly requesting web pages that don't exist you could monitor the logs for 404 statuses.
To Quote @chopper3
However in all seriousness there isn't a definitive answer. It depends on several factors such as type of site and content and people behind NAT/large networks. Your best way to choose is to test a proposed setting and see if any legitimate traffic is blocked.