I've read up on config management/provisioning tools like Ansible and SaltStack. These sound pretty good to me and I intend to use one of the two heavily (Not yet decided which altough I'm leaning towards Ansible). Ideally I want to use on of the two to control all aspects of configuration and command-execution in the system. I.e: from initial bootstrapping, ad-hoc commands, but also when a system-wide exception happens.
To this end it seems that I could use Nagios event handlers (when correctly setup) to in turn execute configured ansible playbooks (or the saltstack equivalant) to try to bring the system back in a correct state.
Is this setup often used? Any reasons this would not be a good idea?
I'm asking because it seems logical/convenient to me to have all config under 1 tool (ansible or saltstack), but information on using a combination of Nagios (or similar) and Ansible (or similar) as described seems to be really sparse/non-existent.
It's a reasonable idea, but you have to be VERY careful that your automated actions are precise and accurate.
You need to be absolutely sure that the failure state you're experiencing can be resolved with those automated actions to reset it (accurate).
You also need to make damn sure that your actions are totally idempotent, in case something goes wrong, and it triggers the wrong reconfiguration service. (precise).
It's not a bad idea, all together, but the problems surrounding state flap are the ones that'll catch you the quickest. From experience, they even catch me, where I've restarted something automatically, and not been aware that it's stuck (or running, depending on your point of view).