A customer is trying to send emails with (smaller and larger) attachments to one of our exchange servers, but get the connection reset after timeout is met. To me, it seems that the sending server does not receive the ACKs, and hence resends, resulting i DUP ACK from our side. We are using Cisco ASA, but we're not using any smtp/esmtp policies (aka fixup smtp) for any of the involved interfaces (it is used for a completely different vlan, where no exchange resides).
1.1.1.1 receiving smtp server
2.2.2.2 sending smtp server
Up to and including the "S: 354 Start mail input; end with .", it works like a charm. The problem really comes when data is sent.
Wireshark dump
No Time Source Destination Protocol Length Info 20 504.923698 1.1.1.1 2.2.2.2 SMTP 78 S: 250 2.1.5 Recipient OK No Time Source Destination Protocol Length Info 21 505.304394 2.2.2.2 1.1.1.1 SMTP 60 C: DATA No Time Source Destination Protocol Length Info 22 505.304713 1.1.1.1 2.2.2.2 SMTP 100 S: 354 Start mail input; end with . No Time Source Destination Protocol Length Info 23 505.599857 2.2.2.2 1.1.1.1 SMTP 1434 C: DATA fragment, 1380 bytes No Time Source Destination Protocol Length Info 24 505.620808 2.2.2.2 1.1.1.1 SMTP 1434 C: DATA fragment, 1380 bytes No Time Source Destination Protocol Length Info 25 505.620823 1.1.1.1 2.2.2.2 TCP 54 smtp > 55346 [ACK] Seq=450 Ack=2904 Win=64860 Len=0 No Time Source Destination Protocol Length Info 26 505.919899 2.2.2.2 1.1.1.1 SMTP 1434 [TCP Previous segment lost] C: DATA Fragment, 1380 bytes No Time Source Destination Protocol Length Info 27 505.919912 1.1.1.1 2.2.2.2 TCP 54 [TCP Dup ACK 25#1] smtp > 55346 [ACK] Seq=450 Ack=2904 Win=64860 Len=0 No Time Source Destination Protocol Length Info 28 505.940785 2.2.2.2 1.1.1.1 SMTP 1434 [TCP Previous segment lost] C: DATA fragment, 1380 bytes No. Time Source Destination Protocol Length Info 29 505.940797 1.1.1.1 2.2.2.2 TCP 54 [TCP Dup ACK 25#2] smtp > 55346 [ACK] Seq=450 Ack=2904 Win=64860 Len=0 No. Time Source Destination Protocol Length Info 30 505.961793 2.2.2.2 1.1.1.1 SMTP 1434 [TCP Retransmission] C: DATA fragment, 1380 bytes No. Time Source Destination Protocol Length Info 31 505.982494 2.2.2.2 1.1.1.1 SMTP 1434 [TCP Retransmission] C: DATA fragment, 1380 bytes No. Time Source Destination Protocol Length Info 32 505.982508 1.1.1.1 2.2.2.2 TCP 54 smtp > 55346 [ACK] Seq=450 Ack=4284 Win=64860 Len=0 No. Time Source Destination Protocol Length Info 33 506.302829 2.2.2.2 1.1.1.1 SMTP 1434 [TCP Previous segment lost] C: DATA fragment, 1380 bytes No. Time Source Destination Protocol Length Info 34 506.302846 1.1.1.1 2.2.2.2 TCP 54 [TCP Dup ACK 32#1] smtp > 55346 [ACK] Seq=450 Ack=4284 Win=64860 Len=0 No. Time Source Destination Protocol Length Info 35 506.323446 2.2.2.2 1.1.1.1 SMTP 1434 [TCP Retransmission] C: DATA fragment, 1380 bytes
etc etc until timeout met.
We run other exchange servers, to which the sender can send the very same email to. All of our exchange servers sit behind the same firewalls, routers and switches. Probably only the patch cabling that diffs.
oh, and sending attachments on 15MB from gmail to the server works
Normal continous ping: Ping statistics for 2.2.2.2: Packets: Sent = 249, Received = 249, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 82ms, Maximum = 546ms, Average = 138ms ^C # unfragged packet of 992 bytes works C:\Users\someadmin>ping -f -l 992 2.2.2.2 Pinging 2.2.2.2 with 992 bytes of data: Reply from 2.2.2.2: bytes=992 time=100ms TTL=48 Reply from 2.2.2.2: bytes=992 time=101ms TTL=48 Reply from 2.2.2.2: bytes=992 time=101ms TTL=48 Reply from 2.2.2.2: bytes=992 time=100ms TTL=48 Ping statistics for 2.2.2.2: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 100ms, Maximum = 101ms, Average = 100ms # unfragged packet of 993 bytes fail C:\Users\someadmin>ping -f -l 993 2.2.2.2 Pinging 2.2.2.2 with 993 bytes of data: Request timed out. Request timed out. Request timed out. Request timed out. Ping statistics for 2.2.2.2: Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),
I can however ping googles dns with large packets:
ping -f -l 1472 8.8.8.8 Pinging 8.8.8.8 with 1472 bytes of data: Reply from 8.8.8.8: bytes=64 (sent 1472) time=31ms TTL=51 ping -f -l 1472 8.8.4.4 Pinging 8.8.4.4 with 1472 bytes of data: Reply from 8.8.4.4: bytes=64 (sent 1472) time=30ms TTL=51
Cisco ASA policies
class-map inspection_default match default-inspection-traffic ! ! policy-map type inspect dns preset_dns_map parameters message-length maximum client auto message-length maximum 3096 no dns-guard no protocol-enforcement no nat-rewrite policy-map global_policy class inspection_default inspect dns preset_dns_map inspect ftp inspect h323 h225 inspect h323 ras inspect rsh inspect rtsp inspect sqlnet inspect skinny inspect sunrpc inspect xdmcp inspect sip inspect netbios inspect tftp inspect ip-options inspect icmp policy-map shape_policy class class-default police input 10000000 5000 police output 10000000 5000 !
Where should I start looking? Should I start by require the sender to do the same wireshark/tcpdump trace?
It's hard to know for sure but, in my opinion, you have a MTU path issue here. Do a path MTU discovery and reduce the MTU of your gateway (or server NIC) accordingly. If it solves your problem, then you have your proof that some node in the path isn't handling MTU correctly (either dropping the ICMP code 4 packets or simply not sending it back).