SCOM supports putting discreet objects/classes/targets into maintenance mode. This gives a very fine control over what objects/classes/targets have alerts forwarded or not.
Unfortunately, behaviorally, our operations team doesn't want that level of control.
Behaviorally they want to put the entire server, or groups of servers, into maintenance mode. Where "maintenance mode" means no alerting of any kind. Period. Fin.
Today, we come close, by putting WindowsComputer
and HealthService
(which also seems to cover Agent). Putting those objects into maintenance mode allows us to do application deployments (service stops, etc) and anything requiring a reboot.
However, we still get occasional alerts from objects in either like the Dell MP or BizTalk MP. Alerts that don't tend to target WindowsComputer
, or anything in its inheritance chain(?).
We tried putting Entity
object/class/target into maintenance mode but this seemed to send the RMS server into a tizzy. e.g. If we made 50 requests, for 50 different servers, maybe 1 in 5 would actually be placed into maintenance mode. The remainder would be ignored.
We are using the SCOM API via Power Shell, or the SCOM SDK object model, to put things into maintenance mode.
Is there a recommended way to put a server, and all its contained objects, into maintenance mode, reliably?
Is there something our team should be considering on why we don't want to put everything into maintenance mode?
According to the documentation, you can easily place a whole server in maintenance mode:
This article might help clarify a few things:
http://blogs.technet.com/b/momteam/archive/2012/05/23/kb-understanding-operations-manager-maintenance-mode.aspx
Putting the computer object into maintenance mode should work.
Since SCOM 2007 R2 there is no need to separately put the agent and agent watcher into maintenance mode. Just be sure to check the 'Selected Objects and all their contained objects' option if using the console, or the TraversalDepth.Recursive if using the SDK (the PowerShell cmdlet does this by default).
You could try to identify the top-level distributed applications (DAs) or groups that contain the objects raising the alerts, and put those DAs and groups into maintenance mode.
Consider: