I am using owncloud and sometimes share links with people via Facebook.
I am concerned with automatic crawling, so I would like to deny facebook access to my owncloud.thomas-steinbrenner.net server (it visits all links in order to get preview pictures, preview texts, etc.)
Is there a way to do this in nginx? Like via hostname or agent? (I feel like using IPs is a game one cannot win).
If not: Is there some other way like a blacklist project with a list of gov-, FB-, etc. -IPs for iptables?
tcp wrappers? I believe that can do host/domain based denies. Also have you tried a simple robots.txt would be surprised if facebook didn't respect them. I'd think they could not afford the controversy of ignoring them.
nginx supports the $http_user_agent value out of the box:
Hostname verification can be done via 3rd party module — ngx_http_rdns_module: http://wiki.nginx.org/HttpRdnsModule (https://github.com/flant/nginx-http-rdns)
Like this: