Long story short -- we have 2 webservers (IIS 6) that run a 3rd party asp.net application. Randomly (so far) they just...stop working. I have an outside check that will tell me when it stops working within a minute or so. Right now I have to get onto the machines through RDP and issue an iisreset. Which is fine until I'm not at a machine and I have to get to one PDQ.
I wrote a simple page that will issue an iisreset on the offending remote machine(s). This works, usually. Sometimes "iisrestart \machinename" will stop the IIS service, but not restart it, which is bad.
Ideally, I'd like to know if I can just stop the service, try to start it, and if it doesn't start in 10 seconds, try to start it again. But I don't know how to monitor the status of a service remotely.
Can someone point me in the right direction?
I think you'd be better off recycling the affected app pool on the unresponsive server: http://technet.microsoft.com/en-us/library/cc770764(WS.10).aspx
If you run IIS6, this might be interesting (in a broad sense it also applies to IIS7): http://blogs.msdn.com/b/david.wang/archive/2006/01/26/thoughts-on-application-pool-recycling-and-application-availability.aspx
But the best think you could do would probably be to troubleshoot the application, and make sure it doesn't stop working in the first place
UPDATE: Since you use IIS6, read the above article, and have a look it this: http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/1eee28e2-b319-4b4e-8267-a8c0aa0dcf36.mspx?mfr=true
Chris Adams made a blog post some years back with a little application to recycle IIS app pools (utilizing WMI), this might come in handy: http://blogs.iis.net/chrisad/archive/2006/08/30/Recycling-Application-Pools-using-WMI-in-IIS-6.0.aspx
I dealt with a similar issue four or five years ago except for an in house app that was being rewritten.
If there is any kind of event raised in the event log you could set up a sink that watches for that event and issue a scripted reset via vbs / wmi.
Check out the official MS docs.
EDIT - Ugly.. If you have Nagios you could set up an event handler based on page response time that does a reset / recycle.