Ping a Specific Port

Question

Will

Asked: 2016-02-24 13:16:19 +0800 CST2016-02-24 13:16:19 +0800 CST 2016-02-24 13:16:19 +0800 CST

Auto-Retry Config Push with Ansible or Saltstack?

772

I'm trying to choose a configuration management system for 500-2000 very-geographically-distributed hosts. Due to varying network reliability, it's possible that a number of hosts may be temporarily unavailable at any given time. For this reason, my initial choice was Chef, since it uses a "pull" model, and when hosts come online and check in, they'll immediately get current configuration.

However, if my hosts only poll the Chef server for new configuration every 30 minutes, rapid deployments are impossible. Also, I am not a Rubyist. I would prefer to use a push-based model, where I can push configuration to hosts as rapidly as possible. So, the natural choices seem to be Ansible or SaltStack (probably SaltStack). But my question is: How do Ansible and SaltStack handle failed or down hosts? Is there some way to keep retrying a push forever until a host comes back online? Are there existing patterns for properly handling eventual consistency of down-hosts with either of these tools? Thanks!

3 Answers

Voted

udondan · Answer 1 · 2016-02-24T19:42:26+08:00

I can only answer this for Ansible.

Ansible itself does not handle hosts which are not reachable. It will try to connect and once and if this is not possible the host is thrown out of the current play. But Ansible gives you some tools to deal with this yourelf.

First there is the wait_for module. With this you could wait with a very high timeout until the hosts are available.

- wait_for:
    port: 22
    delay: 10
    timeout: 3600
    host: "{{ inventory_hostname }}"
  delegate_to: localhost

This alone though would be a problem when you run the play, because Ansible by default would not process any further tasks until all hosts pass this task. Which is contra-productive in this case. According to your description the first hosts could be again unavailable when the last host finally was reachable.

For this to solve you need to use Ansible 2, which has a new feature called strategies. strategy: free allows you to run every task as fast as possible, which means it runs all tasks as soon as the host is available.

Still, a connection could go down and in this case there is no built-in way to automatically retry. If the ssh connection can not be established a fatal error will be thrown for this host and since Ansible ~1.9. there is no way to catch this kind of connection error. That does not affect other hosts though, they will all play fine.

You can retry though. Failed hosts will be stored in a file <playbook-name>.retry next to the playbook itself. To retry only failed hosts you then could run:

ansible-playbook ... --limit @<playbook-name>.retry

Mike · Answer 2 · 2016-02-27T05:08:17+08:00

Salt runs in a pull model from the nodes to the master. You can issue global commands from the master like

salt 'api*.domain.com` state.highstate

That will run a highstate on all hosts that has a id(hostname) of api*.domain.com. A highstate is like a full chef run.

Usually by default people will either have the master schedule highstate runs on minions or they will run the schedule on the minions themselves to say run a highstate every 10 minutes.

So if a node is down and you run a command on the master to run a state then salt will report the node is down in its run output which can be formatted in many different ways for you to ingest. It can even be logged to mysql for example.

So for example if you ran the above command on the master server to run a highstate on all api*.domain.com nodes. If 2 of the 5000 were currently rebooting once salt-minion came back online they would get the even from the master via the message bus and run the highstate.

Salt also has a thing called proxy nodes to help the load of a master. You could have a single master somewhere and a proxy node in each datacenter and all the commands sent from the master go through the proxy nodes and the minions in those datacenters hit their proxy node and never the master

savamane · Answer 3 · 2016-03-17T09:28:44+08:00

savamane

2016-03-17T09:28:44+08:002016-03-17T09:28:44+08:00

To extend Mike's answer, you can do push and pull simultaneously with Salt. Pushing is as easy as

salt 'api*.domain.com` state.highstate

At the same time, your minions can do scheduled pull every X minutes or hours via the built-in scheduler. My preferred method is to configure it via pillar, but adding it to the minion config works too. Something like:

schedule:
  highstate:
    function: state.highstate
    maxrunning: 1
    hours: 1
    splay: 600

1

Auto-Retry Config Push with Ansible or Saltstack?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?