In all the information on the internet I could find about greylisting I find the information that the following tripplet is used to uniquely distinguish an incoming e-mail:
- Source IP
- Source e-mail address
- Destination e-mail address
Now the source IP can make problems because large mail services use multiple IP addresses (possibly from complete different IP ranges) to re-send blocked e-mails.
Question
Why is it at all relevant to consider the source IP address? Why not just use source and destination e-mail address as the key to identify a given e-mail (sender-receiver link)?
Why not instead using the subject to more uniquely identify a specific e-mail?
Reasoning
Even after doing quite some thinking about what kind of problems could arise when just ignore the source IP I didn't find any reason where the source IP could be relevant.
- When the same IP sends two times from the same source to the same destination e-mail address (and wait the required time), the e-mail is delivered
- When two different IPs send from the same source to the same destination e-mail address, the e-mail also should be delivered (e.g. large mail services)
- Some greylisting solutions allow for a subnet mask for the source IP. But this is very unsharp and does not accommodate for all situations - especially not for ultra-large mail services with MTAs standing in completely different subnets.
- What about a legitimate mail-sender who sends 2 different e-mails within the "try later time period" to the same destination e-mail address the first time?
- Using the tripplet: Source and destination e-mail address and subject should theoretically more accurately treat each individual e-mail with the greylisting - even when coming from the same sender to the same receipient.
But my main question is: Why at all include the source IP in the tripplet? (The chance that 2 different external entities will send with the same source e-mail address to the same destination e-mail address seems extremely unlikely to me)
One of the things that is done with most greylisting techniques is checking if the specified source address is authorized to send from that IP, which can be done with SPF records. If the source IP is listed in the source domain's SPF record as being a valid sender for that domain, many (I'd like to say most, but my experience is limited) greylist filters will auto-whitelist the email. The source IP thus is not necessarily part of the controlling triplet, but is important to know why something was greylisted, or not, and so should be retained.
So to more directly address "why it's in the triplet" we have to look at the most common ways that systems with multiple outbound SMTP addresses use their mail servers. A hosting group like GoDaddy will usually have a number of hosts feeding a single server, and that server will have a queue of outgoing messages. While there will be multiple servers, on multiple IP addresses, each server will have its own hosts and its own message queue. If a message is refused for greylisting, it is still queued in the same mailserver at the same IP, and so will be tied to the same IP the next time it's tried.
GoDaddy may not be the best example there, because in fact they seem to have a round-robin queue that selects among three or more Internet-facing servers for any message coming out of an intermediate server. However, although I can't be certain, what I've seen of their output emails suggests that a temporary error will not result in that message being pushed back to the intermediate server; it's the Internet-facing server that has received the temporary error, and is managing the holdback timing. So the message is still tied to the same IP address, because it's been left in charge of a specific server.
The specific case that greylisting was initially designed to trap, the infected home machine, will of course simply not try again after the message send has been attempted, and may in fact be already disconnected and off to the next server when the error message comes back - if the spammer doesn't see that the message was refused, he can probably still charge for having sent it, and those $0.000001 per message can add up.
And the subject is not really a valid key for greylisting, partly because there can be a large number of different, valid, messages with the same subject, and partly because Joe Spammer will be sending out a crap-ton of messages with the same sender and subject, from different IP addresses, in hopes that some get through.