I'm having some issues getting my AKS pods/containers connected to our on-prem network.
I have a virtual network in the 172.16.20.0/22
and 172.16.24.0/29
namespaces.
They have 2 subnets, each has one of the above ranges as their subnet range.
The AKS cluster is bound to the 172.16.20.0/22
subnet, and each of the nodes as well as the pods are getting an IP address in that range. I Also added a regular VM to this subnet for temporary debugging.
In the 172.16.24.0/29
subnet, we have a Virtual Network Gateway (it has no IP in this subnet) which connects that subnet to our on-prem network.
The VN Gateway has a matching local network gateway with address space 172.17.151.0/24
. In our local network we have an SMTP server on 172.17.151.254
, listening on port 25.
On the VM I spun up for debugging, I can connect to the SMTP server just fine. I can also ping the VM from the SMTP server without problems.
From the pods however, I cannot connect to SMTP (tested with netcat -zv 172.17.151.254 25
), neither can I ping a pod's IP address from the SMTP server.
Neither the subnets have an network security group (NSG) attached, so it can't be a misconfigured NSG rule. What else could be causing the connection to fail? The pods get the same basic network configuration from the DHCP serverin the subnet:
- A 172.16.20.0/22 ip address
- 172.16.20.1 as their default gateway
Out IT staff which maintains the on-prem device which is connecting to the Azure VNG helped me debug, they say that when initiating an SMTP connection to 172.17.151.254
they see the packet arriving, and a response package from the server going back into the VPN tunnel, so it seems the response packet is getting dropped somewhere in Azure.
Edit: During a further debug session with our IT staff, we noticed that the source IP of the packets coming from our misbehaving pod, is 172.17.20.5
, instead of 172.16.20.21
. 172.17.20.5
is the IP of the VMSS node the pod is running on, so that could make sense, but this would mean that the internal routing on that node isn't configured correctly.
Or is this something specific to kubernetes that is causing this to fail?
What I've tried so far:
- On VM: ping to
172.16.20.21
(pod): works fine - On VM: ping to
172.17.151.254
: works fine - On VM:
tracert 172.17.151.254
succeeds in 1 hop (shouldn' this be at least showing 2 hops as it passes through the default gateway?) - On pod: ping to
172.16.20.4
(vm): works fine - On pod: ping to
172.17.151.254
: fails - On pod:
traceroute 172.17.151.254
fails with no hops showing - On on-prem VPN device: ping to
172.16.20.4
(vm): works fine - On on-prem VPN device: ping to
172.16.20.21
(pod): fails
Extra info:
ifconfig -a
from pod:
eth0: flags=67<UP,BROADCAST,RUNNING> mtu 1500
inet 172.16.20.21 netmask 255.255.252.0 broadcast 0.0.0.0
ether de:c7:74:e3:c5:24 txqueuelen 1000 (Ethernet)
RX packets 386868 bytes 35746728 (34.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 511891 bytes 43865660 (41.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 5 bytes 504 (504.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 5 bytes 504 (504.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
route
output from pod:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 172.16.20.1 0.0.0.0 UG 0 0 0 eth0
172.16.20.0 0.0.0.0 255.255.252.0 U 0 0 0 eth0
ipconfig /all
from debug VM:
Windows IP Configuration
Host Name . . . . . . . . . . . . : debug-vm
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : nedz0ha4spbubmi5cnxgsnswdh.ax.internal.cloudapp.net
Ethernet adapter Ethernet:
Connection-specific DNS Suffix . : nedz0ha4spbubmi5cnxgsnswdh.ax.internal.cloudapp.net
Description . . . . . . . . . . . : Microsoft Hyper-V Network Adapter
Physical Address. . . . . . . . . : 00-0D-3A-2D-DC-BA
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::e9bb:fede:66cc:398c%6(Preferred)
IPv4 Address. . . . . . . . . . . : 172.16.20.4(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.252.0
Lease Obtained. . . . . . . . . . : Friday, August 28, 2020 7:15:08 AM
Lease Expires . . . . . . . . . . : Friday, October 8, 2156 1:20:49 PM
Default Gateway . . . . . . . . . : 172.16.20.1
DHCP Server . . . . . . . . . . . : 168.63.129.16
DHCPv6 IAID . . . . . . . . . . . : 100666682
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-26-DA-67-54-00-0D-3A-2D-DC-BA
DNS Servers . . . . . . . . . . . : 168.63.129.16
NetBIOS over Tcpip. . . . . . . . : Enabled
route print
from debug vm:
===========================================================================
Interface List
6...00 0d 3a 2d dc ba ......Microsoft Hyper-V Network Adapter
1...........................Software Loopback Interface 1
===========================================================================
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 172.16.20.1 172.16.20.4 10
127.0.0.0 255.0.0.0 On-link 127.0.0.1 331
127.0.0.1 255.255.255.255 On-link 127.0.0.1 331
127.255.255.255 255.255.255.255 On-link 127.0.0.1 331
168.63.129.16 255.255.255.255 172.16.20.1 172.16.20.4 11
169.254.169.254 255.255.255.255 172.16.20.1 172.16.20.4 11
172.16.20.0 255.255.252.0 On-link 172.16.20.4 266
172.16.20.4 255.255.255.255 On-link 172.16.20.4 266
172.16.23.255 255.255.255.255 On-link 172.16.20.4 266
224.0.0.0 240.0.0.0 On-link 127.0.0.1 331
224.0.0.0 240.0.0.0 On-link 172.16.20.4 266
255.255.255.255 255.255.255.255 On-link 127.0.0.1 331
255.255.255.255 255.255.255.255 On-link 172.16.20.4 266
===========================================================================
Persistent Routes:
None
IPv6 Route Table
===========================================================================
Active Routes:
If Metric Network Destination Gateway
1 331 ::1/128 On-link
6 266 fe80::/64 On-link
6 266 fe80::e9bb:fede:66cc:398c/128
On-link
1 331 ff00::/8 On-link
6 266 ff00::/8 On-link
===========================================================================
Persistent Routes:
None
The problem was found after extensive troubleshooting with the help of Microsoft support.
The root cause was the IP address of the SMTP server (VPN endpoint) on
172.17.151.254
, this overlaps with the default docker bridge network of172.17.0.0/16
which was configured on the K8S nodes. As this aspect was not present on the debug VM I started, the problem didn't manifest itself there.Lesson learned: Steer clear from the
172.17.0.0/16
range when using AKS