Just a few hours after having made some changes in the HTML of my site, I found that Google had updated its search result against my website. The Internet is so huge, how did the Google crawler do that? Doesn't it use too much bandwidth?
Just a few hours after having made some changes in the HTML of my site, I found that Google had updated its search result against my website. The Internet is so huge, how did the Google crawler do that? Doesn't it use too much bandwidth?
Google's spiders are constantly crawling the web. They have multiple machines which crawl their massive index and add new pages to it all the time.
Reasons it's fast:
Edit:
...among many other factors.
Google has an abundance of space and bandwidth. Don't you worry about them! As of January 2008, Google was sorting (on average) 20PB a day. 20PB (petabytes) is 20,000 terabytes, or 20 million gigabytes. Now that's just sorting, it isn't all of their data, it's a fraction of it.
Simply incredible.
I suspect google uses a few extra signals to decide to re-crawl.
Account activity in analytics or google webmaster tools, twitter activity, search activity, toolbar activity, chrome url completion, perhaps requests to their dns service.
Then they need to look up when a listing page was last updated, and if so mine it for newly created pages. The sitemap is the preferred listing page (SuperUser has one), then feeds, then the home page which tends to list recent pages and therefore to be updated whenever another page is.
Google's crawling frequency is defined by many factors such as PageRank, links to a page, and crawling constraints such as the number of parameters in a URL.
and here's an excellent article on how it is done:
The Anatomy of a Large-Scale Hypertextual Web Search Engine