I have a set of very beefy, very underutilized servers which are running a couple virtual machines with Windows Server 2008 R2 and IIS 7.5.
The problem: Sometimes requests take a very long time to be processed. The user sees their browser spinning, apparently not getting any response from IIS.
Some stats & attempts at resolving:
- The load on the servers (host and VM) is negligible. CPU never goes over 5%, there are 10+ GB of RAM available, everything is connected via MPIO to a fast SAN.
- I get between 30 and 50 requests per second, a mix of dynamic and static content, only GET and POST. Most hits are cached (80% hit rate), so the load on the fabric/SAN/IO is close to nothing.
- TCP offloading is disabled on the host and virtual machine, both on the network adapter and via disabling TCP chimney
- The web applications are running in ASP.NET 4.0 Integrated Mode. They do not make long-running calls to third party web services or the like.
- I have tried both the processModel autoConfig, as well as setting maxWorkerThreads, maxConnections, maxIOThreads etc. to very high numbers with no difference.
- Database querys are all completing in less than 1 second. I profiled for an entire day and didn't capture a single query that would take longer.
- I've looked at tons of performance monitor counters; the ASP.NET queues and Application queues are always empty, nothing appears to get queued (which wouldn't make sense anyways considering that the server isn't breaking a sweat at all)
- I've identified using appcmd list requests that sometimes requests get 'stuck' for 20-60 seconds in IIS' "SendResponse" stage. This is the only thing I could find so far that would make any sense as to why we're seeing users getting stuck when navigating around. Note that most requests are getting processed fast though, it looks just like random requests from different application pools get stuck here.
Any idea what else I can look at? What would cause requests getting stuck in the 'SendResponse' stage in IIS?
Have you found any solution to this? I'm reguarly seeing the same thing for quite a while now where static content such as JS, PNG and GIFs are stuck in the "SendResponse" state in the "IIS Web Core" module. I'm in the same situation with IIS 7/ASP.NET 4.0. I'm monitoring these with this Microsoft.Web.Administration code, rather than appcmd.
UPDATE: On some further research, one possibility I've come up with is that it may be a dropped network connection. In this thread, the person reports Win32 status codes of 1236 in his IIS logs, which is "The network connection was aborted by the local system.". I'm not sure however if this means the requester cancelled the request, or the web server has aborted the request. Conceivably the requester might navigate to another page on your site before all these HTTP requests for page content (images, JS, etc) have completed which would probably abort all the pending requests to the web server (i.e. he clicks on a link on the page when it first renders). I've found some 1236 Win32 status codes in my IIS logs (mostly for static content such as GIF, PNG and JS with a few of them tied to ASPX pages), however, I'm not sure if these are the same requests I'm seeing stuck in the "SendResponse" state.
This is usually due to mobile devices on slow data connections downloading large asset files. When I say "large" I mean relative to the speed of the connection.
The requests aren't "hanging", they're just taking a long time because they're dependent on the user's network speed. The requests will be dumped if the user disconnects, so they're probably patiently waiting for the page to load.
Check the IPs listed next to the "hanging" requests and look them up, you'll probably find that they belong to mobile phone operators.
Wanted to share my recent experience with the same symptoms you described. We had periodic request hangs where the browser seemed to be waiting a long time (over 10 seconds, up to ~100 seconds) for random requests. This was happening with static content requests as well as dynamic. After watching the requests using the same method as you (appcmd list requests), I could see they would be stuck in SendResponse.
To replicate the issue I would loop 100,000 requests to a small static .jpg file and could generate 1000s of 1236 responses (verified by checking IIS logs).
After disabling Windows Defender real-time scan my 100k request test generated 0 1236 responses. In our UAT environment we were getting about 10-100 1236 errors in our iis logs every day (out of about 500k requests). After disabling Windows Defender real-time scan we now have not had a single 1236 win32status for any incoming request.