I have 3 IIS web servers in an ARR web farm. When we do rolling releases, we take one server offline as a backup server and move it into an "Unavailable State" I have noticed that with ARR, servers will not stay in this state...they come back online automatically hours or days later. Does anyone know how to remedy this situation? This is very bad as the server that is down is typically not running the correct version of our code.
I need to keep a server unavailable until i tell it otherwise.
I am running ARR 2.5 on a Windows 2008 R2 Datacenter SP1, and have experienced the same problem when transitioning a farm server by selecting either "Disallow New Connections" or "Make Server Unavailable Immediately". The controller will eventually revert the farm server to available. I checked the access logs on the affected server and in one case, it became available and began serving requests after 2 hours.
With regards to Jim B's solution, I have one issue with it. If you deploy correct code to the primary and don't configure the health test before the next check, then all farm servers that have been provisioned with the new running code will be marked as unhealthy, which in most cases is all those on the farm. I don't see how even a momentary lapse of farm server health is worth this workaround. If I'm missing the picture, please let me know.
I thought I'd be clever and set the farm server as unhealthy first, then set it to unavailable. The server immediately became unavailable and healthy.
Personally, the only way that I can be sure the farm server will not be made available automatically is by removing it from load balancing completely after you have reasonable assurance that the connections have drained.
Regardless, it seems to be a bug. I can't be sure a particular farm server will not handle session traffic when I try to funnel away from it (for updates and restarts, for example).
I would suggest configuring a health test to check the version to the version of the code that you want to run. When you make changes simply change the response match. You should also disallow new connections on a server you have a maintainence window on. This will drain the connections off and not allow them on regardless of the health of the server
I figured out when (not why) this is happening - and how to fix it!
It happens when the app pool of the default web site (ARR Process) on the load balancer is terminated or recycled.
Follow microsofts recommendation and set app pool idle-timeout=0: Microsoft recommended ARR setup guide