I would like to block a bot with IIS. With Apache you can add a command to your .htaccess file, as outlined here. How would I accomplish this with IIS 7.5?
Update
In addition to answer below, there are a total of approaches I discovered since posting this question:
- URL Scan option listed in the accepted answer.
- Define a Request Filtering rule (example below)
- Define a URL Rewriting rule (example below)
Request Filter Rule
<system.webServer>
<security>
<requestFiltering>
<filteringRules>
<filteringRule name="BlockSearchEngines" scanUrl="false" scanQueryString="false">
<scanHeaders>
<clear />
<add requestHeader="User-Agent" />
</scanHeaders>
<appliesTo>
<clear />
</appliesTo>
<denyStrings>
<clear />
<add string="YandexBot" />
</denyStrings>
</filteringRule>
</filteringRules>
</requestFiltering>
</security>
[...]
</system.webServer>
URL Rewriting rule
<rule name="RequestBlockingRule1" patternSyntax="Wildcard" stopProcessing="true">
<match url="*" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="YandexBot" />
</conditions>
<action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="Get Lost." />
</rule>
For my last project I ended going with option 2 since it is security focused and based on the integrated URL Scan built into IIS 7.
I know this is an old question, but in IIS 7.5 you can deny by user agent if you use Request Filtering.
In IIS, go to the website you wish to apply the filter and then in the right pane, click the Request Filtering icon. (you may have to enable this feature through server manager).
Click the Rules tab, and then along the far right list, select "Add Filtering Rule"
Give it a name, and then in the Scan Headers section, put "User-Agent".
You can add any specific file type(s) to block in Applies To, or you can leave it blank to make it apply to all file types.
In Deny Strings, enter all of the user agent strings you want to block. In the case of this question, you would put "Yandex" here.
I confirmed these changes in chrome using the User Agent Switcher extension.
For crawlers that do not respect Robots.txt, you can use URL Rewrite on the server to block based on their User Agent, see: http://chrisfulstow.com/using-the-iis-7url-rewrite-module-to-block-crawlers/
For more info: http://www.iis.net/download/URLRewrite
Normally you use robots.txt. It will work on all well behaved bots.
For bots that are not well behaved there is often little you can do. You can limit connection counts or bandwidth in your firewall or webserver, but major bots will typically use multiple IP addresses. Limiting based on user-agent strings is usually not a good idea, as those are trivial for the bot to spoof, and bots that does not care about robots.txt have a tendency to spoof useragent strings as well. It works in the specific case when the bot sends a correct user agent, but does not obey the robots.txt.
Edit: If you really want to block based on useragent instead of pushing it back to your firewall or similar I think the easiest way is to use URLScan. You write a rule that looks something like this: