I have an application that sends smtp email via a local Exchange server at a customer. All the email sends ok (it gets to the recipient) but some email gets sent multiple times because my application never receives the 221 respose code to say that it has worked ok and so re-tries. We could remve the retries but often this is needed for genuine problems.
It happens more commonly on certain desination email addresses and will repeatedly happen on that email once it has.
I have captured the TCP stream of one that does work and one that doesn't. Obviously names changed to protect the guilty.
Did work:
220 SBS4S1.example.local Microsoft ESMTP MAIL Service ready at Thu, 16 Apr 2015 11:51:16 +0100
EHLO example-PC
250-SBS4S1.example.local Hello [192.168.111.15]
250-SIZE 10485760
250-PIPELINING
250-DSN
250-ENHANCEDSTATUSCODES
250-STARTTLS
250-AUTH
250-8BITMIME
250-BINARYMIME
250-CHUNKING
250-XEXCH50
250 XSHADOW
RSET
250 2.0.0 Resetting
MAIL FROM:<[email protected]>
250 2.1.0 Sender OK
RCPT TO:<[email protected]>
250 2.1.5 Recipient OK
DATA
354 Start mail input; end with <CRLF>.<CRLF>
--DATA Here, identical other than MIME/Content IDs in both emails--
.
250 2.6.0 <[email protected]> [InternalId=26602] Queued mail for delivery
QUIT
221 2.0.0 Service closing transmission channel
.
Did NOT work:
220 SBS4S1.example.local Microsoft ESMTP MAIL Service ready at Thu, 16 Apr 2015 11:43:39 +0100
EHLO example-PC
250-SBS4S1.example.local Hello [192.168.111.15]
250-SIZE 10485760
250-PIPELINING
250-DSN
250-ENHANCEDSTATUSCODES
250-STARTTLS
250-AUTH
250-8BITMIME
250-BINARYMIME
250-CHUNKING
250-XEXCH50
250 XSHADOW
RSET
250 2.0.0 Resetting
MAIL FROM:<[email protected]>
250 2.1.0 Sender OK
RCPT TO:<[email protected]>
250 2.1.5 Recipient OK
DATA
354 Start mail input; end with <CRLF>.<CRLF>
--DATA Here, identical other than MIME/Content IDs in both emails--
.
QUIT
250 2.6.0 <[email protected]> [InternalId=26573] Queued mail for delivery
--This then times out but never gets the 221--
The only difference I can see other than the missing 221 is that the 250 and the QUIT are the other way around in the message that "fails". Is this a known Exchange quirk we have to work around, something we could fix or should cope with in our application or something we could ask to be changed on the mail server? We don't want to break anything for the normal, common situation.
Update: I thought we wouldn't be able to get hold of the Exchange logs (the admins didn't even know where they were) but I have! So the relevent line is:
Tarpit for '0.00:00:32.619' due to 'DelayedAck',Expired;Timeout
It looks like it might be due to:
https://mikecrowley.wordpress.com/2010/07/24/delayed-smtp-acknowledgement/
Got to try and get it tested first.
It seems that sometimes your application sends
QUIT
command before receiving reply for the "the final dot" of the message (250 for OK).It seems that the MTA (MS Exchange) ignores commands received before it sends the reply.
Suggested Fixes:
1) Increase timeout for waiting for the reply, do not send QUIT before receiving it (even when PIPELINING extension is advertised).
2) Do not resend if you receive
250
reply for "the final dot". In such case MTA (MS Exchange) takes over responsibility for delivering the message.The reason for the problem was the "DelayedAck" setting on the Exchange transport. The receiving server on the "other side" of the relay was taking too long to respond (more than 30 secs) and so the connection we were making timed out and SMTP conversation never completed. (se url I added in the question).
The Exchange admin turned off DelayedAck on the internal transport and the problem disappeared.
We could/should have altered our application to work around this but we have a 3rd part component that made this tricky and so we are unable to verify that this would work.