Ping a Specific Port

Question

Josh

Asked: 2011-02-23 06:32:34 +0800 CST2011-02-23 06:32:34 +0800 CST 2011-02-23 06:32:34 +0800 CST

Block Bots with IIS 7.5 and 8.0

772

I would like to block a bot with IIS. With Apache you can add a command to your .htaccess file, as outlined here. How would I accomplish this with IIS 7.5?

Update

In addition to answer below, there are a total of approaches I discovered since posting this question:

URL Scan option listed in the accepted answer.
Define a Request Filtering rule (example below)
Define a URL Rewriting rule (example below)

Request Filter Rule

 <system.webServer>
    <security>
      <requestFiltering>
        <filteringRules>
          <filteringRule name="BlockSearchEngines" scanUrl="false" scanQueryString="false">
            <scanHeaders>
              <clear />
              <add requestHeader="User-Agent" />
            </scanHeaders>
            <appliesTo>
              <clear />
            </appliesTo>
            <denyStrings>
              <clear />
              <add string="YandexBot" />
            </denyStrings>
          </filteringRule>
        </filteringRules>
      </requestFiltering>
    </security>
    [...]
 </system.webServer>

URL Rewriting rule

<rule name="RequestBlockingRule1" patternSyntax="Wildcard" stopProcessing="true">
                    <match url="*" />
                    <conditions>
                        <add input="{HTTP_USER_AGENT}" pattern="YandexBot" />
                    </conditions>
                    <action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="Get Lost." />
                </rule>

For my last project I ended going with option 2 since it is security focused and based on the integrated URL Scan built into IIS 7.

3 Answers

Voted

Josh · Answer 1 · 2013-01-13T09:56:50+08:00

Josh

2013-01-13T09:56:50+08:002013-01-13T09:56:50+08:00

I know this is an old question, but in IIS 7.5 you can deny by user agent if you use Request Filtering.

In IIS, go to the website you wish to apply the filter and then in the right pane, click the Request Filtering icon. (you may have to enable this feature through server manager).

Click the Rules tab, and then along the far right list, select "Add Filtering Rule"

Give it a name, and then in the Scan Headers section, put "User-Agent".

You can add any specific file type(s) to block in Applies To, or you can leave it blank to make it apply to all file types.

In Deny Strings, enter all of the user agent strings you want to block. In the case of this question, you would put "Yandex" here.

I confirmed these changes in chrome using the User Agent Switcher extension.

13

Carlos Aguilar Mares · Answer 2 · 2011-02-24T12:12:25+08:00

For crawlers that do not respect Robots.txt, you can use URL Rewrite on the server to block based on their User Agent, see: http://chrisfulstow.com/using-the-iis-7url-rewrite-module-to-block-crawlers/

Here’s an easy way to block the main web crawlers – Google Bing and Yahoo – from indexing any site across an entire server. This is really useful if you push all your beta builds to a public facing server, but don’t want them indexed yet by the search engines.

Install the IIS URL Rewrite Module.

At the server level, add a request blocking rule. Block user-agent headers matching the regex: googlebot|msnbot|slurp.

Or, just paste this rule into “C:\Windows\System32\inetsrv\config\applicationHost.config”
<system.webServer>
   <rewrite>
      <globalRules>
         <rule name="RequestBlockingRule1" stopProcessing="true">
            <match url=".*" />
            <conditions>
               <add input="{HTTP_USER_AGENT}" pattern="googlebot|msnbot|slurp" />
            </conditions>
            <action type="CustomResponse" statusCode="403"
               statusReason="Forbidden: Access is denied."
               statusDescription="You do not have permission to view this page." />
         </rule>
      </globalRules>
   </rewrite>
</system.webServer>
This’ll block Google, Bing and Yahoo from indexing any site published on the server. To test it out, try the Firefox User Agent Switcher.

For more info: http://www.iis.net/download/URLRewrite

pehrs · Answer 3 · 2011-02-23T06:50:53+08:00

Best Answer

pehrs

2011-02-23T06:50:53+08:002011-02-23T06:50:53+08:00

Normally you use robots.txt. It will work on all well behaved bots.

For bots that are not well behaved there is often little you can do. You can limit connection counts or bandwidth in your firewall or webserver, but major bots will typically use multiple IP addresses. Limiting based on user-agent strings is usually not a good idea, as those are trivial for the bot to spoof, and bots that does not care about robots.txt have a tendency to spoof useragent strings as well. It works in the specific case when the bot sends a correct user agent, but does not obey the robots.txt.

Edit: If you really want to block based on useragent instead of pushing it back to your firewall or similar I think the easiest way is to use URLScan. You write a rule that looks something like this:

[Options]
 RuleList=DenyYandex

[DenyYandex]
 DenyDataSection=Agents
 ScanHeaders=User-Agent

[Agents]
 Yandex

5

Block Bots with IIS 7.5 and 8.0

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Resolve host name from IP address

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?