We are using a Fortibalancer for our web servers (Win2012 with IIS) and we have run into a strange issue. IE users will experience timeouts (~77s) in getting a response from our servers. Packet traces show ZeroWindow probes and ACKs happening at the time of the timeouts.
These are the facts:
When we bypass the load balancers, there is no issue and no Zero Window packets (let alone probes)
Packet traces on the servers show Zero Window packets to the load balancers, but not to the servers
Wireshark shows the highest packet 'size' as 16KB when using the load balancers, but shows 64KB when clients connect directly to the servers.
The issue is not related to load: It can happen with almost no traffic or during periods of high traffic.
We cannot replicate the problem, but it does happen around predictable times (~9:30am or ~3:30am) but not every day. (Nothing special happens in our environment during those times).
Firefox users NEVER experience the problem.
IE version does not seem to matter: IE 8-11 users have the same problem.
LB's are up-to-date. They perform SSL offloading, and link and load balancing. CPU usage on the LB's have never exceeded 10%.
Because of #1, we know the servers themselves are not the issue.
Because of #2, it seems that the LB's are the bottleneck.
Number 3 gives me pause and there seems to be no way to increase the window size (we've tried and we can't increase from 16KB).
Number 5 is the real killer. Our application does not function well enough on other browsers to test, but FF is the one non-IE browser that does, and users have never, ever experienced a delay. FF is so reliable, we are starting to transition clients over to FF and still have not experienced ZeroWindows while IE users continue to experience them. In their packet traces, I can see that the packet 'size' to the LB's are 100-200 bytes larger than with IE packet streams.
Question:
What can I test next in order to find a direction on remediating the problem? Any ideas on what the problem could be?
0 Answers