I learned that both the FROM address and the TO address is repeated in a hidden element called the "envelope", and then repeated again in the "body".
Question
- Why isn't the envelope data copied into the "header"?
- Why does this duplication exist, why couldn't the necessary features be embedded into the message itself?
- Do all (non-SMTP) message transports do this?
- What alternatives to SMTP are there? (so I can better understand the reasoning)
The addresses in an email message header serve different purposes than the envelope sender and recipient (which really aren't hidden per se, they just aren't part of the message).
The envelope sender and recipient, which you never see in a message, are part of the SMTP protocol, and specify delivery instructions, that is, to which mailbox the mail server is expected to deliver the message, or where to return it in case of some failure. Neither address is required to have any relation to the semantic content of the message. These are explained in detail in RFC 5321 sections 4.1.1.2 and 4.1.1.3.
Logically these are analogous to the addresses printed on the envelope of a piece of postal mail.
The originator and destination addresses which appear in the message itself indicate semantic meanings, rather than explicit delivery instructions. These are explained in detail in RFC 5322 section 3.6.3 and RFC 6854 section 2.1 (which obsoletes RFC 5322 section 3.6.2).
In brief, From: in the message indicates the mailbox of whoever wrote the message, Sender: indicates the entity which sent a message on behalf of someone else, and To: and Cc: indicate the intended recipient mailbox. The RFCs define other header fields you may be interested in, as well.
Logically these are analogous to the addresses printed on the correspondence inside a piece of postal mail.
Often, the envelope sender and recipient are the same as to the From: and To: addresses. But it is common for them to have no correspondence at all, for instance, in the case of mailing lists.
The most common scenario where you will see a difference is during delivery of an email with multiple recipients.
Let's say you are about to send an email to:
When your mail client is sending the email to your mail server all three addresses will be repeated on both envelope and headers. Next your mail server will look up the MX records for
example.com
andexample.net
to continue delivery.Your mail server will now establish two separate SMTP connections with each of the receiving servers to send the email further.
When communicating with the MX for
example.com
all three receivers will still be in theTo
header, but there will only be a single envelope receiver.When communicating with the MX for
example.net
all three receivers will still be in theTo
header, but there will only be two envelope receivers.As an analogue to the above imagine you printed out three copies of a letter with three recipients written on the paper. You would then put those three pieces of paper into three separate envelopes and write just one address on each envelope.
There are other scenarios where it makes a difference such as when using
bcc
and when forwarding email.As a slightly contrived analogue imagine that you are exchanging letters with some entity. Unknown to that entity you create a photocopy of each of those letters which you put in an envelope addressed to your lawyer.