I have a Kubernetes cluster in version 1.25.0 with some nodes (Ubuntu server machines). I use calico from https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml. Now I am adding a new node. The node is completely identical. The only exception is that it has a 2.5gbit network port instead of a 1gbit network port. On this node, both calico node and kube proxy crash permanently. On all other nodes it works fine. Calico Node reports the following as the reason for the crash:
Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused W0724 00:54:46.157624 73 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
Back-off restarting failed container
Kube proxy just crashes with back-off restarting failed container
.
The logs of all look good, no errors - not even warning. Here is a part of the logs from the calico node container:
2023-07-24 01:35:34.609 [INFO][115] felix/int_dataplane.go 1893: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"calico_tmp_B", State:"", Index:76}
2023-07-24 01:35:34.609 [INFO][115] felix/int_dataplane.go 1913: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_B", Addrs:set.Set[string](nil)}
2023-07-24 01:35:34.609 [INFO][115] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_B", Addrs:set.Set[string](nil)}
2023-07-24 01:35:34.609 [INFO][115] felix/int_dataplane.go 1893: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"calico_tmp_A", State:"", Index:77}
2023-07-24 01:35:34.609 [INFO][115] felix/int_dataplane.go 1913: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_A", Addrs:set.Set[string](nil)}
2023-07-24 01:35:34.609 [INFO][115] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"calico_tmp_A", Addrs:set.Set[string](nil)}
2023-07-24 01:35:34.609 [INFO][115] felix/int_dataplane.go 1803: Dataplane updates throttled
2023-07-24 01:35:35.603 [INFO][115] felix/int_dataplane.go 1770: Dataplane updates no longer throttled
bird: device1: Initializing
bird: direct1: Initializing
bird: device1: Starting
bird: device1: Initializing
bird: direct1: Initializing
bird: Mesh_192_168_178_58: Initializing
bird: Mesh_192_168_178_25: Initializing
bird: Mesh_192_168_178_70: Initializing
bird: Mesh_192_168_178_38: Initializing
bird: Mesh_192_168_178_72: Initializing
bird: device1: Starting
bird: device1: Connected to table master
bird: bird: device1: Connected to table masterdevice1: State changed to feed
bird: device1: State changed to feed
bird: direct1: Starting
bird: direct1: Connected to table master
bird: direct1: State changed to feed
bird: direct1: Startingbird:
Graceful restart started
bird: bird: direct1: Connected to table masterGraceful restart done
bird: direct1: State changed to feedbird:
Startedbird:
Mesh_192_168_178_58: Starting
bird: bird: Mesh_192_168_178_58: State changed to startdevice1: State changed to up
bird: Mesh_192_168_178_25: Starting
bird: Mesh_192_168_178_25: State changed to start
bird: Mesh_192_168_178_70: Starting
bird: Mesh_192_168_178_70: State changed to start
bird: Mesh_192_168_178_38: Starting
bird: bird: direct1: State changed to upMesh_192_168_178_38: State changed to start
bird: Mesh_192_168_178_72: Starting
bird: Mesh_192_168_178_72: State changed to start
bird: Graceful restart started
bird: Started
bird: device1: State changed to up
bird: direct1: State changed to up
bird: Mesh_192_168_178_58: Connected to table master
bird: Mesh_192_168_178_58: State changed to wait
bird: Mesh_192_168_178_25: Connected to table master
bird: Mesh_192_168_178_25: State changed to wait
bird: Mesh_192_168_178_72: Connected to table master
bird: Mesh_192_168_178_72: State changed to wait
bird: Mesh_192_168_178_70: Connected to table master
bird: Mesh_192_168_178_70: State changed to wait
bird: Mesh_192_168_178_38: Connected to table master
bird: Mesh_192_168_178_38: State changed to wait
bird: Graceful restart done
bird: Mesh_192_168_178_58: State changed to feed
bird: Mesh_192_168_178_25: State changed to feed
bird: Mesh_192_168_178_70: State changed to feed
bird: Mesh_192_168_178_38: State changed to feed
bird: Mesh_192_168_178_72: State changed to feed
bird: Mesh_192_168_178_58: State changed to up
bird: Mesh_192_168_178_25: State changed to up
bird: Mesh_192_168_178_70: State changed to up
bird: Mesh_192_168_178_38: State changed to up
bird: Mesh_192_168_178_72: State changed to up
2023-07-24 01:35:41.982 [INFO][115] felix/health.go 336: Overall health status changed: live=true ready=true
+---------------------------+---------+----------------+-----------------+--------+
| COMPONENT | TIMEOUT | LIVENESS | READINESS | DETAIL |
+---------------------------+---------+----------------+-----------------+--------+
| CalculationGraph | 30s | reporting live | reporting ready | |
| FelixStartup | - | reporting live | reporting ready | |
| InternalDataplaneMainLoop | 1m30s | reporting live | reporting ready | |
+---------------------------+---------+----------------+-----------------+--------+
2023-07-24 01:36:27.256 [INFO][115] felix/int_dataplane.go 1836: Received *proto.HostMetadataV4V6Update update from calculation graph msg=hostname:"storage-controller" ipv4_addr:"192.168.178.72/24" labels:<key:"beta.kubernetes.io/arch" value:"amd64" > labels:<key:"beta.kubernetes.io/os" value:"linux" > labels:<key:"kubernetes.io/arch" value:"amd64" > labels:<key:"kubernetes.io/hostname" value:"storage-controller" > labels:<key:"kubernetes.io/os" value:"linux" > labels:<key:"specialServerType" value:"storage" >
2023-07-24 01:36:29.551 [INFO][115] felix/int_dataplane.go 1836: Received *proto.HostMetadataV4V6Update update from calculation graph msg=hostname:"node1" ipv4_addr:"192.168.178.25/24" labels:<key:"beta.kubernetes.io/arch" value:"amd64" > labels:<key:"beta.kubernetes.io/os" value:"linux" > labels:<key:"kubernetes.io/arch" value:"amd64" > labels:<key:"kubernetes.io/hostname" value:"node1" > labels:<key:"kubernetes.io/os" value:"linux" > labels:<key:"node-role.kubernetes.io/control-plane" value:"" > labels:<key:"node.kubernetes.io/exclude-from-external-load-balancers" value:"" >
2023-07-24 01:36:34.389 [INFO][117] monitor-addresses/autodetection_methods.go 103: Using autodetected IPv4 address on interface enp11s0: 192.168.178.88/24
2023-07-24 01:36:37.850 [INFO][115] felix/summary.go 100: Summarising 20 dataplane reconciliation loops over 1m3.5s: avg=13ms longest=180ms (resync-filter-v4,resync-ipsets-v4,resync-mangle-v4,resync-nat-v4,resync-raw-v4,resync-routes-v4,resync-routes-v4,resync-rules-v4,update-filter-v4,update-ipsets-4,update-mangle-v4,update-nat-v4,update-raw-v4)
I absolutely don't understand this. I have already rebuilt the whole node, updated calico node and also tried other kubernetes versions (1.25.11). No firewall is installed. Can anyone help me here? Thanks
PS: I have already tried all autodetection methods. Right now I use IP_AUTODETECTION_METHOD=can-reach=8.8.8.8
Output of ifconfig on the node:
ifconfig -a
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:50:db:52:31 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp11s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.178.88 netmask 255.255.255.0 broadcast 192.168.178.255
inet6 fe80::67c:16ff:fec8:53e6 prefixlen 64 scopeid 0x20<link>
inet6 2a02:908:523:bd80:67c:16ff:fec8:53e6 prefixlen 64 scopeid 0x0<global>
ether 04:7c:16:c8:53:e6 txqueuelen 1000 (Ethernet)
RX packets 8748 bytes 6634723 (6.6 MB)
RX errors 0 dropped 242 overruns 0 frame 0
TX packets 6342 bytes 944277 (944.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 390 bytes 47600 (47.6 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 390 bytes 47600 (47.6 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
tunl0: flags=193<UP,RUNNING,NOARP> mtu 1480
inet 192.168.53.192 netmask 255.255.255.255
tunnel txqueuelen 1000 (IPIP Tunnel)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 39 bytes 13183 (13.1 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
wlp12s0: flags=4098<BROADCAST,MULTICAST> mtu 1500
ether 60:e9:aa:5e:01:95 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0