This comes from an implementation of OpenStack, and we have a fix, but I don't understand why we need this fix.
- My machine has an ip of 192.168.20.45
- My default gateway (pfsense) has an ip of 192.168.20.254
- My openstack compute node has an ip of 192.168.20.58
- When a VM is running, it is on 10.60.60.12
I can ssh to this VM (from 20.45) and then ping google.com from there, it will stay connected about 20 or so pings, then the ssh session will timeout and disconnect.
Adding some hard coded routes has solved this issue, but makes me wonder WHY did it ever work at all?
We added static routes to 10.60.60.x via 192.168.20.58 to pfsense, via dhcp to all clients, and via openvpn and MS VPN configurations.
I know how the static routes solve the problem, but what eludes me is how it worked at all before they were in place, and why did the connection go from functioning to not functioning? Why would TCP deal with the first 60+ packets, and then quit working?
In the end it turns out we fixed it by adding routes. What we never managed to figure out was why it worked at all, but once we added routes on both sides of the interface (openstack as well as 192.168.20.X) then everything works great.
I still have no good answer for how it ever connected, let alone passed data for a short while.