I am working on a site, which is going to allow downloads to users, there will be around 2,000,000 files which can be downloaded.
We want to discourage people from crawling and taking all of these documents so would like to limit the number of requests we server containing a URL pattern over a certain time limit. We are happy for the rest of the site to be crawled so don't want to limit that.
We are putting an exclusion in robots.txt to discourage crawlers from getting the files. we are more worried about malicious or misbehaving crawlers.
We would like to use apache to limit the number of downloads of the documents to about 1 per minute per ip address.
Is there a best practice way to do this?
we are using Centos with apache2.2
There are a lot of similar questions to this but most of them seem to center on bandwidth limiting which is not what I want.
I don't think it exists a module to limit connections per time per IP. But you should play a little bit with limitipconn and mod_cband ... probably together can do that. Or you can use limitipconn with iptables.
To do that probably you should use iptables:
I didn't test this rule, is just a hint for what you should look.
If you use iptables you should have 2 ip's and different virtual hosts for your main site and your document section, to limit only the ip(virtual host) for documents.
Regards
You should be able to use mod_evasive, where you can limit how many requests an Ip-address is allowed to do to a specific URI or site in a certain period of time.
If an Ip-address exceeds this limit, it will be blocked for a period of time and will get an 403 error if the user tries to access the URI again. You can also send out mails, or execute a script when an Ip-address exceeds the limit.
For more information: http://www.zdziarski.com/blog/?page_id=442
You seem to be looking to do something similar to what sites like RapidShare do. As far as I am aware, you can't do this within the configuration files of Apache; it will require at least server-side scripting (likely PHP) with a small database to keep track of requests and serve the downloads if the user meets the criteria.
Here is an example with PHP and MySQL that would need to be adapted a bit to fit your situation: http://www.web-development-blog.com/archives/limit-the-number-of-downloads-per-client/
The code above limits the number of connections to a single download, but as you can see the concept can be expanded on to limit the number of downloads total.