This is a new cluster built using Kubespray on bare metal.
The issue that calicoctl
reports not Established
status, StatefulSet
members cannot communicate between each other and majority of Ingress
requests take around 10 seconds to open sample Nginx page.
All other components such as etcd, pods, sudo kubectl get cs
and sudo kubectl cluster-info dump
are okay.
calico-node pods on master-1 (192.168.250.111) and node-1 (192.168.250.112) report no errors in logs
calico-node pods on master-2 (192.168.240.111) and node-1 (192.168.240.112) report error in logs
bird: BGP: Unexpected connect from unknown address 192.168.240.240 (port 36597)
- this IP is VPN router's IP (gateway'
s for these servers)
calico-node pods on master-3 (192.168.230.111) and node-3 (192.168.230.112) report error in logs
bird: BGP: Unexpected connect from unknown address 192.168.230.230 (port 35029)
- this IP is VPN router's IP (gateway'
s for these servers)
192.168.250.112 (node-1):
era@server-node-1:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | up | 19:54:47 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:54:35 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.230.111 | node-to-node mesh | up | 20:42:31 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:54:35 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.230.112 | node-to-node mesh | up | 20:42:30 | Established |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-1:~$
192.168.240.112 (node-2):
era@server-node-2:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | start | 19:52:09 | Passive |
| 192.168.240.111 | node-to-node mesh | up | 19:54:37 | Established |
| 192.168.230.111 | node-to-node mesh | start | 19:52:09 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.250.112 | node-to-node mesh | start | 19:52:09 | Passive |
| 192.168.230.112 | node-to-node mesh | start | 19:52:09 | Active Socket: Connection |
| | | | | reset by peer |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-2:~$
192.168.230.112 (node-3):
era@server-node-3:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+-------------+
| 192.168.250.111 | node-to-node mesh | up | 20:42:31 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:51:59 | Passive |
| 192.168.230.111 | node-to-node mesh | up | 19:54:25 | Established |
| 192.168.250.112 | node-to-node mesh | up | 20:42:30 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:51:59 | Passive |
+-----------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-3:~$
I've tried to set exact network interface to see if it helps - didn't help:
era@server-master-1:~$ kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=ens3
daemonset.apps/calico-node env updated
Tried to test ports with nc
179 from any node and master to any node and master and they succeeded.
Ubuntu 18.04 is used for operating system.
Any suggestions what else to debug in Calico to solve an issue? Any hint would be useful to get closer to resolution.
Update
I have found issue correlation with missing routes.
Below is output for 192.168.250.112. So it can't reach node and master in 192.168.240.x because there are no routes:
era@server-node-1:~$ ip route | grep tun
10.233.76.0/24 via 192.168.230.112 dev tunl0 proto bird onlink
10.233.77.0/24 via 192.168.230.111 dev tunl0 proto bird onlink
10.233.79.0/24 via 192.168.250.111 dev tunl0 proto bird onlink
era@server-node-1:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | up | 21:39:05 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:54:35 | Connect Socket: Connection |
| | | | | reset by peer |
| 192.168.230.111 | node-to-node mesh | up | 20:42:31 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:54:35 | Connect Socket: Connection |
| | | | | reset by peer |
| 192.168.230.112 | node-to-node mesh | up | 20:42:30 | Established |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-1:~$
Below is output for 192.168.240.112. So it can't reach nodes and masters in 192.168.250.x and 192.168.230.x because there are no routes:
era@server-node-2:~$ ip r | grep tunl
10.233.66.0/24 via 192.168.240.111 dev tunl0 proto bird onlink
era@server-node-2:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | start | 19:52:10 | Passive |
| 192.168.240.111 | node-to-node mesh | up | 19:54:38 | Established |
| 192.168.230.111 | node-to-node mesh | start | 22:05:18 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.250.112 | node-to-node mesh | start | 19:52:10 | Passive |
| 192.168.230.112 | node-to-node mesh | start | 22:05:22 | Active Socket: Connection |
| | | | | reset by peer |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-2:~$
Below is output for 192.168.230.112. So it can't reach node and master in 192.168.240.x because there are no routes:
era@server-node-3:~$ ip r | grep tunl
10.233.77.0/24 via 192.168.230.111 dev tunl0 proto bird onlink
10.233.79.0/24 via 192.168.250.111 dev tunl0 proto bird onlink
10.233.100.0/24 via 192.168.250.112 dev tunl0 proto bird onlink
era@server-node-3:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+-------------+
| 192.168.250.111 | node-to-node mesh | up | 21:36:50 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:51:59 | Passive |
| 192.168.230.111 | node-to-node mesh | up | 19:54:25 | Established |
| 192.168.250.112 | node-to-node mesh | up | 20:42:30 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:51:59 | Passive |
+-----------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-3:~$
So why these routes are not there and how to change this behaviour by adding them? If I add manually, route is automatically removed.
The issue was NATing applied on VPN TUN (layer 3). Calico doesn't support it (or I'm not familiar with NATed solutions available).
Solution: use routes instead of NAT