I have several sites in a /24
network that all get crawled by google on a pretty regular basis. Normally this is fine. However, when google starts crawling all the sites at the same time, the small set of servers that back this IP block can take a pretty big hit on load.
With google webmaster tools, you can rate limit the googlebot on a given domain, but I haven't found a way to limit the bot across an IP network yet. Anyone have experience with this? How did you fix it?
I found these notes interesting to pursue
You can go to google and create an account with the webmaster tool and then you can control the crawl rate for each site. Go to Site Configuration::Settings::Crawl Rate. This won't let you schedule your sites in a certain order I don't believe, but you can at least slow it down for all of them.
If you run BGP you could simply rate-limit AS15169 (AS-GOOGLE), but doing it by hand is likely to be far too error-prone.
No, not roable. You ahve to put that into a robots.txt on every site. Google- rightly - does not have toold for "IP address ownsers" so to say. All control comes from the robots.txt on the websites.