We've been trying to get a grip on a really weird problem where we can wget a page from Apache 2.2.19 on solaris 10 and some permutations of requests reliably take various fixed lengths of time to respond.
It looks to be based around the closing of the TCP socket, from a tcpdump on the client, we typically see a pause between the server filling the TCP window with its response for the final time, and the final lump of data along with the FIN from the server.
So on the wire it just hangs mid transfer of the HTTP response, but a netstat on the server shows a socket in FIN_WAIT_1. we can't do a tcpdump on the server to clarify but it looks to us like the the OS has handed off the TCP conversation to the hardware, so believes it's started the 4way close, yet the NIC never puts that packet, nor the outstanding data packets (maybe 1 or 2 @ 1500 bytes and a straggler of 400 or whatever).
That's about as clean a picture as we can paint, apart from one recent test we did which looked amazing... a file of 64076 bytes is served by apache - 432ms delay for the last packet. we add ONE char to the file, and that caused an additional packet to be created, 53 bytes which also included the FIN, as opposed to the empty 52 byte FIN we see on the slightly smaller file, the one with the delay. The existence of this extra packet was reliably changing how the FIN conversation happened, and made the conversation go from almost half a second to a few milliseconds.
Most typically over the life of out investigations, this delay has been 4.6 seconds, again we see the window size fluctuate, SACK's going back to Apache when needed, but the LAST time that window gets full, it hangs for 46. seconds, and then back comes a final 2 or 3kb of data, and the FIN that Solaris thinks it sent out ages ago.
Our tcpdumps are on a F5 BigIP, so there is an ASIC for the traffic to traverse, as well as a Cisco 6509 (L2 only), however we do see the same user experience when doing a wget on a neighboring Solaris box, so don't believe it's any black magic that the BigIP is doing.
But this is all mixed up with confusion about window sizes and MSS's and the likes, but if this is sounding familiar to anyone, we're all ears!
0 Answers