We have an array of servers, any of which could go down generating a medium-priority notification:
define host {
host_name foo1
contacts medium-priority
use default-host
}
...
However, we'd like a higher-priority notification whenever more than two such servers are in trouble. To that end, we've set up a separate service-definition using Nagios'/Icinga's check_cluster
-utility:
define service {
service_description foo-cluster
servicegroups cluster-checks
display_name Foo Cluster
check_command check_cluster_host!Foo Cluster!0!3!$HOSTSTATEID:foo1$,$HOSTSTATEID:foo2,...$HOSTSTATEID:fooN$
contacts high-priority
hostgroup_name clusters
notes Check, that no more than 2 hosts in group foo are in trouble
use default-service
}
The above will probably work, but I'd like for this service-check to be triggered not by time, but only by a change in the status of any of the "underlying" hosts...
We generate Icinga's config-files with Ansible and so can construct complex dependencies programmatically -- but can such triggering be implemented at all?
You could define an event handler on the host which basically is a small script "doing something based on parameters". You can pass the host's state attributes from runtime macros as command parameters.
https://www.icinga.com/docs/icinga1/latest/en/eventhandlers.html
I would go the route and define a custom var on the host which defines the services to trigger when an event handler is fired. That way you don't need to hardcode them inside the script.
Your script may then decide to force a new service check via the external command pipe. You probably should define whether HARD or SOFT states are enough - keep in mind though that event handlers are only fired on a state change, not on DOWN->DOWN->DOWN for example.
Example: https://github.com/Icinga/icinga-core/blob/master/contrib/eventhandlers/submit_check_result.in
Note: That service should not have active checks enabled, and not use a dummy command, but the actual service check command.
(such check result submission happened in the old Nagios/Icinga1 world for somewhat hackish distributed monitoring too, if you're looking for more examples with the command pipe and event handlers).