We are looking for a way to be able to block spam based on geographic location by filtering using geoip.
context: we rarely have any email correspondence outside of the USA, so we would like to block all incoming email outside the US except for maybe one or two countries.
After a little Googling I have found a couple of solutions that may work (or not), but I would like to know what other sysadmins are currently doing or what they would recommend as a solution.
Here is what I have found so far:
Using PowerDNS and its GeoIP backend it is possible to use geoip for filtering. Normally this backend is used to help distribute load as a kind of load balancing but I dont see why it couldnt be used to kill spam as well?
Possibly use the Maxmind lite country database and some scripting to do a similar job.
Ideally what I am looking for is a solution that would handle decent load and scale well too...aren't we all! ;)
Thanks in advance for your help! :-)
From this research paper on SNARE, I present this nugget:
My personal observations mirror yours and note that even now in 2014, geographic location continues to be an excellent predictor of spam. As others pointed out, GeoIP location (country or distance) alone is not a sufficiently reliable basis for blocking connections. However, combining GeoIP distance with a few other pieces of data about the connection, such as FCrDNS, HELO hostname validity, sender OS (via p0f), and SPF provides a 99.99% reliable basis (as in, a .01% chance of a FP) for rejecting 80% of connections before the DATA phase.
Unlike some SMTP tests (such as a DNSBL listing in zen.spamhaus.org) which have very low FP rates, none of the aforementioned tests individually are a sufficient basis for rejecting connections. Here's another pattern that falls into that category–-the envelope sender user matches the envelope recipient user. I've noticed that about 30% of spam follows this pattern: from: [email protected] to: [email protected]. It happens far more frequently in spam than in valid mail flows. Another spammer pattern is a non-matching envelope and header from domain.
By heuristically scoring these "spam appearing" characteristics, the basis for an extremely reliable filtering system can be assembled. SpamAssassin already does (or can do) most of what I described. But you also asked for a solution that would handle sufficient load and scale well. While SpamAssassin is great, I didn't see "massively reduced resource consumption" anywhere in the 3.4 release notes.
All the tests I listed in the first paragraph occur before SMTP DATA. Combining those early tests forms a sufficient basis for rejecting spammy connections before SMTP DATA without any False Positives. Rejecting the connection before SMTP data avoids the bandwidth costs of transferring the message as well as the heavier CPU and network load of content based filters (SpamAssassin, dspam, header validation, DKIM, URIBL, antivirus, DMARC, etc.) for the vast majority of connections. Doing far less work per connection scales much better.
For the smaller subset of messages that are indeterminate at SMTP DATA, the connection is allowed to proceed and I score the message with results from the content filters.
To accomplish all I have described, I've done a bit of hacking on a node.js based SMTP server called Haraka. It scales very, very well. I have written a custom plugin called Karma which does the heuristics scoring, and I've put all the weighting scores into a config file. To get an idea of how karma works, have a look at the karma.ini config file. I've been getting "better than gmail" filtering results.
Having a look at the tests run by the FCrDNS, helo.checks, and data.headers as well. They might provide you with additional filtering ideas. If you have further ideas for reliably detecting spam with cheap (pre-DATA) tests, I'm interested to hear them.
Other questions you need to ask: what is your acceptable false-positive and false-negative rate (how much legit mail are you willing to lose and how much junk are you willing to accept?)
What additional latency are you willing to accept? Some very effective low-falsing anti-spam techniques (e.g. greylisting) can delay mail. This can irritate some users who (unrealistically) expect email to be immediate communication.
Reflect on how much you wish to externalize your costs when planning an anti-spam system. For example, ipfilter-based blacklists are unforgiving but do not materially affect any other system. Greylisting conserves sender and recipient bandwidth but keeps mail in remote queues longer. Mail bounce messages and challenge/response systems can be (ab)used to mailbomb an unrelated third party. Techniques like tarpitting actively externalize costs by intentionally holding open SMTP connections for long periods of time. DNSBLs require you cede some amount of control to a third party (blacklist maintainers) but ultimately as a mail admin you are responsible for explaining your blocking policy to your users & management. The upshot is that there are ethical considerations that go along with each technology and it's important to be aware of your effect on others.
How tolerant will you be toward misconfigured external systems? (e.g. those without FCrDNS, broken HELO/EHLO strings, unauthorized pipelining, those that don't properly retry after a temporary failure code 4xx, etc.)
How much time, money, bandwidth, and hardware do you want to devote to the problem?
No single technique is effective, but a concerted defense-in-depth approach can substantially reduce inbound garbage. DNSBLs, URIBLs, greylisting, content filtering, and hand-tuned white- and blacklisting work well on my small domain, but I can afford to be more liberal in what I reject.
Unless things have changed recently, blacklisting IPs by country of origin is not terribly effective. I had the idea of using ASN and OS fingerprint (via p0f) to judge the quality of an inbound connection but didn't pursue it; the statistics would be interesting to look at but I'm not convinced it would be any more useful than the standard techniques already described. The upside to using GeoIP, ASN, and OS fingerprint info is that while they may individually be weak predictors of connection quality, they are available at TCP/IP connection time, long before you reach the SMTP layer (fsvo "long".) In combination, they may prove to be useful and that would be helpful because spam becomes costlier to block as it approaches the end user.
I'm not trying to be a naysayer; 'oddball' character encodings and GeoIP information probably correlate well to spam but may not be reliable enough to use as single criteria to block mail. However they may well be helpful indicators in a system like Spamassassin. The takeaway is that spam defense is a complex problem in cost-risk-benefit analysis and it's important to know what your values are before implementing or changing a system.
Wouldn't you simply be better off using something like the Spamhaus ZEN block list (http://www.spamhaus.org/zen/) instead? It's free if your organization's email traffic is less than 100,000 SMTP connections and 300,000 DNSBL queries per day.
Granted, their usage requirements (http://www.spamhaus.org/organization/dnsblusage.html) may require a subscription to the Data feed if you do a lot more daily email traffic, but at that level of usage (read the fine print at the bottom of the page) you probably don't want the administrative burden of managing your own block list anyways.
There is also the geoip patch for netfilter/iptables for Linux. You could use this to block 25 for your email server if it is Linux. You could use Linux as a firewall for your email server with this iptables patch. Best part is that it is free :-)
Using a combination of DNSBL at the mail server/app level and GEOIP at the server level should remove the majority of your spam for you without having to implement spam scoring, fasle positives, etc.
This is especially true if your mail server/company only receives email from a handfull of known countries such as USA, Canada.
Argosoft mail server for windows does a good job of this but I am not aware of a similar linux based solution. That's why I'd recommend using a trusted MTA on Linux (preferably with DNSBL) and then do a GEOIP solution at the server level.
Hope this helps.
I thought you might like to hear that there are already commercial anti-spam vendors who do this.. though IME it adds to a spam "score" for countries outside your home territory, to prevent overblocking.
Might fare well integrated with SpamAssassin for example?
You might also consider what characterset the email is in.
As for implementation - there are a few inexpensive, commercially obtainable geo-IP databases. I would be inclined to integrate these "manually" - so you can at least let the message get up to the point where you know the sender recipient pair, for logging.
HTH,
Tom
I have been using Sendmail together with milter-greylist for exactly this purpose for several years on several medium-volume mail-servers with no issues whatsoever. It is easy to configure the desired GeoIP, SPF, whitelist, DNSBL etc. rules in greylist.conf to selectively enforce greylisting.
Here is an in-line appliance that works w/ routers & firewalls to block IP traffic by country and IP blacklists at line speeds. And Arclight is correct, unless you are willing to assume increased latency and drop in TCP connections, you need a specialized device like this IP Blocker from TechGuard to maintain line-speed protection. TechGuard also just released data on the impact of ACLs on routers and firewalls.