I have an issue with private network traffic not being masqueraded in very specific circumstances.
The network is a group of VMware guests using the 10.1.0.0/18
network.
The problematic host is 10.1.4.20 255.255.192.0
and the only gateway it is configured to use is 10.1.63.254
. The gateway server 37.59.245.59
should be masquerading all outbound traffic and forwarding it through 37.59.245.62
, but for some reason, 10.1.4.20
ends up occasionally having 37.59.245.62
in its routing cache.
ip -s route show cache 199.16.156.40
199.16.156.40 from 10.1.4.20 via 37.59.245.62 dev eth0
cache used 149 age 17sec ipid 0x9e49
199.16.156.40 via 37.59.245.62 dev eth0 src 10.1.4.20
cache used 119 age 11sec ipid 0x9e49
ip route flush cache 199.16.156.40
ping api.twitter.com
PING api.twitter.com (199.16.156.40) 56(84) bytes of data.
64 bytes from 199.16.156.40: icmp_req=1 ttl=247 time=93.4 ms
ip -s route show cache 199.16.156.40
199.16.156.40 from 10.1.4.20 via 10.1.63.254 dev eth0
cache age 3sec
199.16.156.40 via 10.1.63.254 dev eth0 src 10.1.4.20
cache used 2 age 2sec
The question is, why am I seeing a public IP address in my routing cache on a private network?
Network information for the app server (without lo) :
ip a
eth0 Link encap:Ethernet HWaddr 00:50:56:a4:48:20
inet addr:10.1.4.20 Bcast:10.1.63.255 Mask:255.255.192.0
inet6 addr: fe80::250:56ff:fea4:4820/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1523222895 errors:0 dropped:407 overruns:0 frame:0
TX packets:1444207934 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1524116772058 (1.5 TB) TX bytes:565691877505 (565.6 GB)
Network information for the VPN gateway (without lo too) :
eth0 Link encap:Ethernet HWaddr 00:50:56:a4:56:e9
inet addr:37.59.245.59 Bcast:37.59.245.63 Mask:255.255.255.192
inet6 addr: fe80::250:56ff:fea4:56e9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7030472688 errors:0 dropped:1802 overruns:0 frame:0
TX packets:6959026084 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7777330931859 (7.7 TB) TX bytes:7482143729162 (7.4 TB)
eth0:0 Link encap:Ethernet HWaddr 00:50:56:a4:56:e9
inet addr:10.1.63.254 Bcast:10.1.63.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth0:1 Link encap:Ethernet HWaddr 00:50:56:a4:56:e9
inet addr:10.1.127.254 Bcast:10.1.127.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.8.1.1 P-t-P:10.8.1.2 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:477047415 errors:0 dropped:0 overruns:0 frame:0
TX packets:833650386 errors:0 dropped:101834 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:89948688258 (89.9 GB) TX bytes:1050533566879 (1.0 TB)
eth0 leads to the outside world, and tun0 to an openvpn network of VMs on which sits the app server.
ip r
for the VPN gateway :
default via 37.59.245.62 dev eth0 metric 100
10.1.0.0/18 dev eth0 proto kernel scope link src 10.1.63.254
10.1.64.0/18 dev eth0 proto kernel scope link src 10.1.127.254
10.8.1.0/24 via 10.8.1.2 dev tun0
10.8.1.2 dev tun0 proto kernel scope link src 10.8.1.1
10.9.0.0/28 via 10.8.1.2 dev tun0
37.59.245.0/26 dev eth0 proto kernel scope link src 37.59.245.59
ip r
on the app server :
default via 10.1.63.254 dev eth0 metric 100
10.1.0.0/18 dev eth0 proto kernel scope link src 10.1.4.20
Firewall rules:
Chain PREROUTING (policy ACCEPT 380M packets, 400G bytes)
pkts bytes target prot opt in out source destination
Chain INPUT (policy ACCEPT 127M packets, 9401M bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 1876K packets, 137M bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 223M packets, 389G bytes)
pkts bytes target prot opt in out source destination
32M 1921M MASQUERADE all -- * eth0 10.1.0.0/17 0.0.0.0/0
Unfortunately, most of what you're seeing is due to routing issues between external routers, they obtain and update their routing info dynamically, to help route traffic around problematic areas, but when those routes get changed often (normally due to availability) it is called route flapping. That is getting reflected down to you, normally end users don't see any of this..
You could attempt to disable your route cache, as explained here (note the caveats, it's not something that seems to offer much on the upside), but I think you'd be better off just talking to the network admin(s) locally as it seems it's their routing which is really unstable.
I am of course going with the assumption that it isn't you responsible for network administration.
Have someone, or you, take a look at the router/L3 device at 10.1.4.20. It looks like it might be receiving bad routes from an upstream peer that are then being withdrawn and then re-advertised.
I asked this somewhere else, and it turns out the solution was to turn off ICMP redirects.