Consider an ASP.NET SOAP web service that starts up fine, but craters hard when receiving its first hit.
Please note that this is deployment works in the Test environment, but not in the PreProd environment. Both are Windows 2003 SP3 + IIS 6 + ASP.NET 3.5. All up-to-date.
The behaviour that we're seeing is:
- restart the site & app pool
- the app pool is configured to run under Network Service.
- browsing to the .asmx and .wsdl responds normally, as expected.
- send a normal well-formed SOAP request / normal payload to the web service
- 100% CPU usage
- after 5 seconds, the page request / site returns "Service Unavailable"
- no entry is created in the IIS log file (i.e. c:\windows\system32\logfiles\W3C-foo)
- the app pool ends up being stopped
The processes that hit the CPU hard are dw20.exe
. I am unsure why Dr Watson is involved here.
Event Log shows an ASP.NET Runtime error:
Task Manager:
Event log text:
EventType clr20r3, P1 w3wp.exe, P2 6.0.3790.3959, P3 45d6968e, P4 errormanagement, P5 1.0.0.0, P6 4b86a13f, P7 24, P8 0, P9 system.stackoverflowexception, P10 NIL.
Questions
Any thoughts on what this system.stackoverflow exception might be? Given that the code is the same between environments, might it be a payload problem? Could it be a configuration issue? You can see the name of my .NET assembly there in the exception message: "ErrorManagement"
The resolution to this (likely unique) problem:
Stackoverflow exceptions are a special case, because the affected application can't do anything anymore (e. g. logging a stack trace) - in this case, the application pool process (w3p.exe) is terminated by the OS. That's why Dr. Watson/DW20 gets involved. You could try to debug the dump that DW20 saved using WinDbg with SOS extension (expect a steep learning curve if you're not familiar with that tool set - I hope this becomes easier with VS2010 as promised).
The high CPU usage (and often high memory usage) is caused by DW20, which is especially annoying if the "crash-and-restart-loop" is faster than DW20 and so several DW20 processes accumulate.
The default IIS application pool setting is to restart crashed applications no more than 3 times in a short period of time, otherwise they will be stopped to protected the server from DoS.
Regarding the root cause, the stackoverflow: Could be everything... but how about this wild guess: Database access is failing due to misconfiguration, exception is generated, and your application is logging exceptions to the database, without catching exceptions on exception handling ;)
I had this issue and found out that I had a LINQ statement that was trying to delete some rows. It was failing every day so the amount of rows just kept going up and up. The exceptions were actually handled and logged to a table that I had made so I found it there. Found the problem table and it was 700k+ of super heavy rows. My LINQ looked like this:
So it was pulling the full objects and getting OutofMemory exceptions. I changed this code to a stored proc:
I just wanted to post this because if you are not seeing the exceptions you might want to check some of your SQL table row counts and see if there could be some LINQ statements that are timing out.
It sounds like you may be getting an unhandled exception which in turn kills the asp.net worker process. See if this helps:
http://support.microsoft.com/?id=911816
Check in the Event Viewer for Errors (typically Application log). It is under Administrative Tools.
Also, @markus's good point of IIS having pretty hard default "no more than X thread crashes per Y time"-setup, so that if you hit the page just a couple of times with such an error, the entire Application Pool is taken down. Again check Event Viewer.