Settings: this is a quad CPU machine, plenty strong, not loaded at all (neither CPU nor network), the client is a Windows Server 2008 64bit, the server is a linux box.
I have four threads that are all issuing HTTP requests starting at the same time. The connections are initiated to IPs X, X, Y, Z (two connections to X, one to Y and Z). All targets are on the local LAN.
I am seeing that connections to X, Y and Z are formed (SYN-SYN/ACK), and the second connection to to X is with a 100 ms delay. Meaning, the machine is not sending the second SYN to X for a full 100 ms.
Could this be related to TCP Offload Engine? What else could be causing this delay?
Edit - Another suspect is the client code - it's written in Java, uses HttpURLConnection.
A network trace (e.g., Wireshark) will show if the delay is in waiting for a response. It would also point out other "detours" like the suggestion about DNS. Sounds like you may have done this already, but you didn't say.
A different possibility: the Windows XP SP2 limit on outgoing half-open connections, which defaults to 10. I'm not sure how you see how many connections are in this state, but I believe that if this rate limiter kicks in it will show up in the error logs.
Half-open.com
Does it have to do a DNS lookup for each request? Is that limiting the rate?
Initial connection goes through fine but it is your second connection that is getting queued. I would review the client implementation in software, I don't know if more recent JDKs have made more changes but it used to be that even if you made individual HttpUrlConnections the underlaying Protocol Handler would still reuse the socket connection.
You should check in at StackOverflow and see if some of them have dealt with this issue before.
OK, there's a LOT of possible places this could be going wrong. You mentioned the TCP offload engine, and that's a reasonable suspect (especially if you've got Broadcom NICs in there), so let's rule it out and disable it (consult your documentation for this).
After that you want to start reducing other possible candidates, so look to switches, network cables and so on. If you can, connect source to target via a crossover and see if you can reproduce it there.
It's also worthwhile trying dear old
ping
- from the sounds of it, you should be able to reproduce the abberant behaviour with 4 concurrent pings.But what it boils down to is that there is no point in suspecting anything at this early stage, as there as just too many places where it could be going wrong (including your app).
Have you checked your firewall settings? There might be something in the firewall settings that is rate limiting the connections.
Do you have any special sysctl settings for the server? There are lots of minor tweaks that one can do for networking in sysctl.
Have you checked against different servers/clients? This is to help isolate the cause of the problem - whether it is the particular server, client or both.