We're running an IIS7 server hosting several dozen websites. Several of these websites are all part of the same legacy app we've developed. These sites all run the same code and run in the same app pool.
Roughly once a month over the past few months, we've found that all requests for this app pool start hanging indefinitely. When this happens, we receive an alert and we recycle the app pool. After that, the sites start working again.
This only ever affects this one app pool - never any others on the same server. A couple times, before recycling the pool, I've looked at the currently-executing requests in the worker process. They all show up as executing inside the WindowsAuthenticationModule. Which is strange, because the vast majority of the application does not require authentication. There is a small admin section which uses Windows auth... but all the other requests should be anonymous.
Does anyone have any idea as to what might be causing this?
There are several unusual things about the way these sites are set up. As I mentioned, they all run the same code - multiple sites point at the same physical directory. The only difference is the host header bindings. I'm not sure why there isn't just one site with all the host headers, but that's how it works.
In several of these sites, the same physical directory is mapped at two levels - as the root of the site and again as an application within the site. So if a user goes to http://oursite.com/index.aspx, that maps to c:\files\oursite\index.aspx. If a user goes to http://oursite.com/foo/index.aspx, that also maps to c:\files\oursite\index.aspx. I think there is code which looks at the request URL and handles the two requests differently.
This is strange because the same web.config ends up being interpreted as a site config file, and also as an application config file within the site. I don't know if this might be related to the authentication problem.
If we can't find the cause, we're thinking of a few workarounds we could try:
Move the admin section into a separate site, and give the client a new admin URL. Run that separate site in its own app pool. Then in the web.config shared by all the other sites, remove the WindowsAuthenticationModule. That way there should be no possibility of a hang within the WindowsAuthenticationModule.
Try running all these sites in the classic pipeline instead of the integrated pipeline. They were working fine on our old IIS6 server...
(If we get desperate) Set up a watchdog script which monitors the sites and auto-recycles the app pool when it detects that requests are getting stuck.
What do you think?
Thanks for your help,
Richard
I'd try to rightclick the w3wp.exe process executing under your hanging app pool. Right-click it and select, create dump file.
Either:
OR:
Activate logging with netsh to see kerberos token requests as they happen (I haven't tried this IRL):
PS C:\> netsh trace show providers | select-string kerberos
PS C:\> netsh trace show providers | select-string auth
...and then something like:
netsh trace start provider={5BBB6C18-AA45-49B1-A15F-085F7ED0AA90}
OR: