Ping a Specific Port

Question

anx

Asked: 2020-10-15 13:38:50 +0800 CST2020-10-15 13:38:50 +0800 CST 2020-10-15 13:38:50 +0800 CST

ansible: How to properly handle errors that break handler notification?

772

A problem I keep running into in ansible is where one deployment step should run when any of a number of preparation step is changed, but the changed status is lost due to fatal errors.

When after one successfull preparation step, ansible cannot continue, I still want the machine to eventually reach the state the playbook was meant to achieve. But ansible forgets, e.g.:

- name: "(a) some task is changed"
  git:
    update: yes
    ...
  notify:
   # (b) ansible knows about having to call handler later!
   - apply

- name: "(c) connection lost here"
  command: ...
  notify:
   - apply

- name: apply
  # (d) handler never runs: on the next invocation git-fetch is a no-op
  command: /bin/never

Since the preparation step (a) is now a no-op, running again does not recover this information. For some tasks, just running ALL handlers is good enough. For others one can rewrite the handlers into tasks that know when: to run. But some tasks & checks are expensive and/or unreliable, so this is not always good enough.

Partial solutions:

Write out a file and check for its existence later instead of relying on the ansible handler. This feels like an antipattern. After all, ansible knows whats left to do - I just do not know how to get it to remember it across multiple attempts.
Stay in a loop until it works or manual fix is applied, however long that may be: This seems like a bad trade, because now I might not be able to use ansible against the same group of targets .. or I have to safeguard against undesirable side-effects of multiple concurrent runs
Just require a higher reliability of targets so its rare enough to justify always manually resolving these situations, using --start-at-task= and checking which handlers are still needed: Experience says, things do occasionally break, and right now I am adding more things that can.

Is there a pattern, feature or trick to properly handle such errors?

2 Answers

Voted

Michael Hampton · Answer 1 · 2020-10-15T17:59:36+08:00

The Ansible docs you linked to suggest a way to deal with this:

Ansible runs handlers at the end of each play. If a task notifies a handler but another task fails later in the play, by default the handler does not run on that host, which may leave the host in an unexpected state. For example, a task could update a configuration file and notify a handler to restart some service. If a task later in the same play fails, the configuration file might be changed but the service will not be restarted.

You can change this behavior with the --force-handlers command-line option, by including force_handlers: True in a play, or by adding force_handlers = True to ansible.cfg. When handlers are forced, Ansible will run all notified handlers on all hosts, even hosts with failed tasks. (Note that certain errors could still prevent the handler from running, such as a host becoming unreachable.)

Placing it in ansible.cfg will ensure that it is the default behavior for every playbook and role you run.

Very little can save you if the host dies during a playbook run.

rmmjohann · Answer 2 · 2021-04-14T06:17:58+08:00

rmmjohann

2021-04-14T06:17:58+08:002021-04-14T06:17:58+08:00

It seems that currently the only way to tackle this problem is like Michael Hampton pointed out.

IMHO this is not a viable solution since the handlers itself can error caused by the origin error letting the playbook run crash. A better solution should persist handler notification state between playbook executions, ideally at the remote hosts. There already is the concept of facts and custom facts which holds some kind of state at the remotes hosts disk.

Currently I have no working concept how to implement that.

0

ansible: How to properly handle errors that break handler notification?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?