Ping a Specific Port

Question

Lazer

Asked: 2010-06-26 03:39:11 +0800 CST2010-06-26 03:39:11 +0800 CST 2010-06-26 03:39:11 +0800 CST

What happens if a website does not have a robots.txt file?

772

If the robots.txt file is missing in the root directory of a website, how are things treated as:

the site is not indexed at all
the site is indexed without any restrictions

It should logically be the second one according to me. I ask in reference to this question.

6 Answers

Voted

ChrisF · Answer 1 · 2010-06-26T04:06:04+08:00

Best Answer

ChrisF

2010-06-26T04:06:04+08:002010-06-26T04:06:04+08:00

The purpose of a robots.txt file is to keep crawlers out of certain parts of your website. Not having one should result in all your content being indexed.

The implication from the first comment on that Meta question was that the robots.txt file existed but was inaccessible (for whatever reason), rather than not being there at all. That might cause the web crawlers some issues, but that's speculation.

I don't have a robots.txt on my blog (self hosted Wordpress installation) and that's indexed.

7

BMDan · Answer 2 · 2010-06-26T05:55:43+08:00

BMDan

2010-06-26T05:55:43+08:002010-06-26T05:55:43+08:00

Robots.txt is a strictly voluntary convention amongst search engines; they're free to ignore it, or implement it in any way they choose. That said, barring the occasional spider looking for email addresses or the like, they pretty much all respect it. Its format and logic are very, very simple, and the default rule is allow (since you can only disallow). A site without a robots.txt will be fully-indexed.

6

karmawhore · Answer 3 · 2010-06-26T03:51:51+08:00

karmawhore

2010-06-26T03:51:51+08:002010-06-26T03:51:51+08:00

I haven't had robots.txt on dozens of domains I've had registered, some as far back as 1994, and have never had a problem with them getting placed in google/yahoo, etc.

Even my personal website gets 150-200 users a day from google, and doesn't have a robots.txt file.

(Love the three minute pause requirement between answering questions. Next I'll get the robot captcha. Sometimes it just isn't worth trying to be helpful.)

1

weeheavy · Answer 4 · 2010-06-26T04:06:26+08:00

weeheavy

2010-06-26T04:06:26+08:002010-06-26T04:06:26+08:00

robots.txt is completely optional. If you have one, standards-compliant crawlers will respect it, if you have none, everything not disallowed in HTML-META elements (Wikipedia) is crawlable.

1

risyasin · Answer 5 · 2010-06-26T04:06:50+08:00

risyasin

2010-06-26T04:06:50+08:002010-06-26T04:06:50+08:00

Site will be indexed without limitations. spiders will follow whatever they find. i don't think you want that. some spiders like baidu can be very aggressive about that. it can even evaluate even urls in javascript codes.

here is detailed information. http://www.robotstxt.org/orig.html

ps. also you will have many 404 logs in your webserver. it's also disadvantage while reading logs. & dont forget to put favicon.ico file. that is another stupid file that all browsers demand on every page.

1

Carlos Aguilar Mares · Answer 6 · 2010-06-27T09:57:34+08:00

Carlos Aguilar Mares

2010-06-27T09:57:34+08:002010-06-27T09:57:34+08:00

(I could not find a way to add a comment but) Also, I would like to add that not having a robots.txt is also a problem in the sense that you will not be able to provide a Sitemap for it. Remember that Sitemap's are only located by either them being specified in the Robots.txt file or through direct submission to search engines, but of course the latter means you have to do it one-by-one, rather than just simply having all quickly find it.

1

What happens if a website does not have a robots.txt file?

Ping a Specific Port

How do I tell Git for Windows where to find my private RSA key?

How do you restart php-fpm?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?