Because of some amateur-made DDOS attack on my website, I had to deny some traffic with .htaccess which worked fine.
Unfortunately, it also blocks the googlebot/bingbot:
order allow, deny
deny from 54.
SetEnvIfNoCase Referer "^$" bad_user
SetEnvIfNoCase User-Agent "^Wget" bad_user
Deny from env=bad_user
It simply block whole traffic from 54.x.x.x
(only traffic I get from it is from infected amazon cloud - I know I could exclude just 30 IPs ranges for amazon cloud and not the whole 54.x.x.x
but I was in a need of fast solution).
The rest of bots (most of them from China, Taiwan and so on) don't use referrer, so:
SetEnvIfNoCase Referer "^$" bad_user
blocks them all.
But it also have a side effects:
- When somebody visit my page from bookmark or when he type it directly to the browser (e.g. he has red it on business card), he won't see my website.
- Googlebot, bingbot (as well as other less important bots) usually don't use referrer either.
#1
is an inconvenience, but #2
is a real problem I have to solve quickly.
I've found that bots important for me use those labels:
66.249.64.119 - - [...] "GET /robots.txt HTTP/1.1" 403 534 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.119 - - [...] "GET /programowanie/ HTTP/1.1" 403 537 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.115 - - [...] "GET /3d-graphic/ HTTP/1.1" 403 535 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
207.46.13.4 - - [...] "GET /robots.txt HTTP/1.1" 403 534 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.4 - - [...] "GET / HTTP/1.1" 403 524 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
Is it possible in .htaccess
to somehow merge my rules with "but if label contains "Googlebot" or "bingbot", let him go" as the most important one (even if they don't use referrer)?
If not, maybe I can add something to robots.txt
to inform Google/Bing that they should have put referrer in their labels (I doubt they would take it into account)?