I am denying indexing to a folder called pdf
via robots.txt
. However, I do direct link to a few files that exist in that directory.
Will search engines such as Google index those files, or ignore them because they reside in the pdf
folder?
I am denying indexing to a folder called pdf
via robots.txt
. However, I do direct link to a few files that exist in that directory.
Will search engines such as Google index those files, or ignore them because they reside in the pdf
folder?
Short answer: No.
Crawlers are disallowed from indexing anything under the URL prefix you put in robots.txt.
Longer answer: It depends.
The
Allow
keyword is not part of the standard but some robots will follow it. You can use this to Allow a particular URL and Disallow the entire subtree that contains that URL. Most bots work on a first-match-wins basis. Google and Bing work on a longest-string-wins basis regardless of the order of theAllow
andDisallow
lines.