If the robots.txt
file is missing in the root directory of a website, how are things treated as:
- the site is not indexed at all
- the site is indexed without any restrictions
It should logically be the second one according to me. I ask in reference to this question.
The purpose of a
robots.txt
file is to keep crawlers out of certain parts of your website. Not having one should result in all your content being indexed.The implication from the first comment on that Meta question was that the
robots.txt
file existed but was inaccessible (for whatever reason), rather than not being there at all. That might cause the web crawlers some issues, but that's speculation.I don't have a
robots.txt
on my blog (self hosted Wordpress installation) and that's indexed.Robots.txt is a strictly voluntary convention amongst search engines; they're free to ignore it, or implement it in any way they choose. That said, barring the occasional spider looking for email addresses or the like, they pretty much all respect it. Its format and logic are very, very simple, and the default rule is allow (since you can only disallow). A site without a robots.txt will be fully-indexed.
I haven't had robots.txt on dozens of domains I've had registered, some as far back as 1994, and have never had a problem with them getting placed in google/yahoo, etc.
Even my personal website gets 150-200 users a day from google, and doesn't have a robots.txt file.
(Love the three minute pause requirement between answering questions. Next I'll get the robot captcha. Sometimes it just isn't worth trying to be helpful.)
robots.txt is completely optional. If you have one, standards-compliant crawlers will respect it, if you have none, everything not disallowed in HTML-META elements (Wikipedia) is crawlable.
Site will be indexed without limitations. spiders will follow whatever they find. i don't think you want that. some spiders like baidu can be very aggressive about that. it can even evaluate even urls in javascript codes.
here is detailed information. http://www.robotstxt.org/orig.html
ps. also you will have many 404 logs in your webserver. it's also disadvantage while reading logs. & dont forget to put favicon.ico file. that is another stupid file that all browsers demand on every page.
(I could not find a way to add a comment but) Also, I would like to add that not having a robots.txt is also a problem in the sense that you will not be able to provide a Sitemap for it. Remember that Sitemap's are only located by either them being specified in the Robots.txt file or through direct submission to search engines, but of course the latter means you have to do it one-by-one, rather than just simply having all quickly find it.