We have a pair of VMs running as virtual routers and BGP/TCP peering between the two virtual routers (running over QEMU/KVM). The VMs each have a tap interface that is connected to a Linux bridge that only has the two taps as members.
All works great, except that we see that conntrack seems to be reporting the TCP sessions between these two VMs. Initially we thought that the TCP sessions were leaking and that this was a security hole, but netstat reports nothing. So it seems we are not allocating a TCB for this on the host OS (which is correct); phew. The guest OS traffic should be transparent to the host OS which it seems it is; mostly.
The reason this conntrack behavior is an issue is that if both VMs are reset at the same time, then there is no-one left running to send any traffic on the guest TCP sessions to cause a TCP reset; so we get a conntrack "leak" on the host OS. Over time this builds up and eventually the host OS runs out of resources. We have a lot of BGP sessions in this test. Seems this is a way for a guest OS to to a DoS on a host OS...
Is this valid behaviour for conntrack? This is private VM to VM communication over an L2 bridge. Why should Linux be snooping and recording such TCP sessions? Is this a bug or a feature?
Most approaches seem to involve iptables to stop this; we don't really want to have to ask the customer to do that. Any other suggestions?