We have a XAMPP Apache development web server setup with virtual hosts and want to stop serps from crawling all our sites. This is easily done with a robots.txt file. However, we'd rather not include a disallow robots.txt in every vhost and then have to remove it when we went live with the site on another server.
Is there a way with an apache config file to rewrite all requests to robots.txt on all vhosts to a single robots.txt file?
If so, could you give me an example? I think it would be something like this:
RewriteEngine On
RewriteRule .*robots\.txt$ C:\xampp\vhosts\override-robots.txt [L]
Thanks!
Apache mod_alias is designed for this and available from the core Apache system, and can be set in one place with almost no processing overhead, unlike mod_rewrite.
With that line in the apache2.conf file, outside all the vhost's, http://example.com/robots.txt - on any website it serves, will output the given file.
Put your common global
robots.txt
file somewhere in your server's filesystem that is accessible to the apache process. For the sake of illustration, I'll assume it's at/srv/robots.txt
.Then, to set up
mod_rewrite
to serve that file to clients who request it, put the following rules into each vhost's<VirtualHost>
config block:If you're putting the rewrite rules into per-directory
.htaccess
files rather than<VirtualHost>
blocks, you will need to modify the rules slightly:Not sure if you're running XAMPP on Linux or not, but if you are, you could create a symlink from all virtual hosts to the same robots.txt file, but you need to make sure that your Apache configuration for each virtual host is allowed to follow symlinks (under the
<Directory>
directive'sOptions FollowSymLinks
).Different approach to solution.
I host multiple (more than 300) virtualhost in my cluster environment. In order to protect my servers from being hammered down by crawlers, i define Crawl-delay for 10 seconds.
However, i cannot enforce all my clients with a fixed robots.txt configuration. I let my clients to use their own robots.txt if they wish to do.
Rewrite module first checks if the file exist. If it does not exist, modules rewrites to my default configuration. Code example below...
In order to keep rewrite internal, alias should be used. Instead of defining a new alias which can cause some user side conflicts, i located my robots.txt inside /APACHE/error/ folder which already has an alias as default configuration.