I'm working on creating a 3 manager Docker Swarm using Docker CE 19.03 and three CentOS 7 machines.
I've installed docker via yum, enabled the service and started it.
I've created a firewalld 'service' and allowed (per docker docs):
- TCP port 2377 for cluster management communications
- TCP and UDP port 7946 for communication among nodes
- UDP port 4789 for overlay network traffic
I was able to init the swarm and join a worker. The third server I tried to join as a manager however and it failed with
Error response from daemon: manager stopped: can't initialize raft node: rpc error: code = Unknown desc = could not connect to the prospective new cluster member using its advertised address: rpc error: code = DeadlineExceeded desc = context deadline exceeded
So I backed out with docker swarm leave
and tried to join as a worker. It succeeded no problem. Now why would I be failing to join as a manager but successfully joining as a worker?
I tried allowing 2376/tcp (per https://www.digitalocean.com/community/tutorials/how-to-configure-the-linux-firewall-for-docker-swarm-on-centos-7) and disabling firewalld on the one manager to see if it was a firewall issue but I got the same error.
So, lesson learned, trust the error messages.
I had applied my firewalld service template and forgotten to reload firewalld so the ports were not actually open. Apparently all those ports were not actually required to join as a worker but are as a manager.
I had noticed in my load balancer that my containers were frequently showing as failed on that host when they are not which led me to double check everything.
There is a default time limit for swarm token. You can create the token again to join