We have a page on our extranet website that exposes information we would like to prevent from being data harvested.
We have done the due diligence of encrypting the URL parameters to make it hard for the end-user to generate links for data harvesting, as well as incorporated a time-to-live for each URL so that after 30 minutes you can't use that URL again: the pages produces an error to the end user.
The next step is to prevent the end-user from harvesting from the search results, which they would have 30 minutes to do. On average the end users who are scraping take about 3 seconds between harvested pages in the search results.
The programming for this would be hard and time-consuming: we are looking for something that integrates in with IIS 6 on Server 2003 x64 that would block the user's IP from accessing the site for a period of time if a threshold is met.
Any suggestions would be appreciated? Since I work on the development side, my keywords are not yielding any good results in Google for what I'm looking for.
It's not a complete out of the box solution, but Evan Anderson released a script here on Server Fault that is similar to
fail2ban
for *nix - basically after a set threshold, it makes a firewall block entry.You can find it here.
Like I said, it's not a straight plug in solution, but if you don't find anything else you should be able to manipulate it to trip it through other metrics.