I've seen plenty of robot.txt stuff, and some mod-rewrite solutions that looked promising… but haven't been able to find a simple solution to block Spiders / Scrapers / whoever the hell I want to block… I'd rather do this by hostname / domain, as it seems simpler than relying on user-agents, etc…
For example, say I were to see this in Apache logs..
msnbot-207-46-192-48.search.msn.com - - [07/Dec/2011:23:01:41 -0500] "GET /%3f/$/bluebox/blog/2011/iphoto/ HTTP/1.1" 404 366
ok… I want to prevent *.search.msn.com
from ever coming here, or any of my sites - in any of my folders - VHOST or otherwise…
Typically, I have MANY <VirtualHost *.80>
's setup, and DO NOT want to have to repeat the config for each host.. In that same vein, I have many DocumentRoot
's… and putting some file in each of them, aka .htaccess
really isn't an option..
I had been using something in httpd.conf
that resembled…
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^BadBot [OR]
RewriteRule ^(.*)$ http://go.away/`
How can i use the hostnames provided by to blanket-UseCanonicalName On
Deny all
any domain I so desired?
Might not be the best idea to do it by host name since Apache would have to do a lookup for each request.
Why not do it with IPtables?
UseCanonicalName
is for the server hostname, not the client's.This will work just fine in your global config, outside of any
VirtualHost
, as long as you don't have anOrder
directive in the vhosts: