This issue is driving me crazy. I run a fresh install of Ubuntu 18.04, with:
- ufw to manage the firewall
- a br0 bridge
- lxd and libvirt (KVM)
I tried stock docker.io package and packages form docker's own deb repository.
I want o be able to deploy docker containers choosing the ip to bind its port (eg. -p 10.58.26.6:98800:98800) and then open the port with UFW.
But docker seems to create iptables rules that pertubates the br0 bridge (eg. host cannot ping libvirt guests)
I have looked all around and cannot find good, security aware solution.
Manually doing iptables -I FORWARD -i br0 -o br0 -j ACCEPT
seems to makes everything work.
Also setting "iptables": false
for the docker daemon allows the bridge to behave normally, but breaks docker's containers egress network.
I have found this solution that seemed simple, by editing a single UFW's file https://stackoverflow.com/a/51741599/1091772, but it doesn't work at all.
What would be the best practice and secure way of solving this permanently, surviving to reboots ?
EDIT:
I ended up adding -A ufw-before-forward -i br0 -o br0 -j ACCEPT
at the end of /etc/ufw/before.rules
before the COMMIT. Can I consider this as a fix or doesn't it raise some issues ?
The problem, actually a feature: br_netfilter
From the description, I believe the only logical explanation is that the bridge netfilter code is enabled: intended among other usages for stateful bridge firewalling or for leveraging iptables' matches and targets from bridge path without having to (or being able to) duplicate them all in ebtables. Quite disregarding network layering, the ethernet bridge code, at network layer 2, now makes upcalls to iptables working at IP level, ie network layer 3. It can be enabled only globally yet: either for host and every containers, or for none. Once understood what's going and knowing what to look for, adapted choices can be made.
The netfilter project describes the various
ebtables
/iptables
interactions when br_netfilter is enabled. Especially of interest is the section 7 explaining why some rules without apparent effect are sometimes needed to avoid unintended effects from the bridge path, like using:to avoid two systems on the same LAN to be NATed by... the bridge (see example below).
You have a few choices to avoid your problem, but the choice you took is probably the best if you don't want to know all the details nor verify if some iptables rules (sometimes hidden in other namespaces) would be disrupted:
permanently prevent the br_netfilter module to be loaded. Usually
blacklist
isn't enough,install
must be used. This is a choice prone to issues for applications relying on br_netfilter: obviously Docker, Kubernetes, ...Have the module loaded, but disable its effects. For iptables' effects that is:
If putting this at startup, the module should be loaded first or this toggle won't exist yet.
These two previous choices will for sure disrupt iptables match
-m physdev
: The xt_physdev module when itself loaded, auto-loads the br_netfilter module (this would happen even if a rule added from a container triggered the loading). Now br_netfilter won't be loaded,-m physdev
will probably never match.Work around br_netfilter's effect when needed, like OP: add those apparent no-op rules in various chains (PREROUTING, FORWARD, POSTROUTING) as described in section 7. For example:
Those rules should never match because traffic in the same IP LAN is not routed, except for some rare DNAT setups. But thanks to br_netfilter they do match, because they are first called for switched frames ("upgraded" to IP packets) traversing the bridge. Then they are called again for routed packets traversing the router to an unrelated interface (but won't match then).
Don't put an IP on the bridge: put that IP on one end of a
veth
interface with its other end on the bridge: this should ensure that the bridge won't interact with routing, but that's not what are doing most container/VM common products.You can even hide the bridge in its own isolated network namespace (that would only be helpful if wanting to isolate from other ebtables rules this time).
Switch everything to nftables which among stated goals will avoid these bridge interaction issues. For now the bridge firewalling has no stateful support available, it's still WIP but is promised to be cleaner when available, because there won't be any "upcall".
You should search what triggers the loading of br_netfilter (eg:
-m physdev
) and see if you can avoid it or not, to choose how to proceed.Example with network namespaces
Let's reproduce some effects using a network namespace. Note that nowhere any ebtables rule will be used. Also note that this example relies on the usual legacy
iptables
, not iptables over nftables as enabled by default on Debian buster.Let's reproduce a simple case similar with many container usages: a router 192.168.0.1/192.0.2.100 doing NAT with two hosts behind: 192.168.0.101 and 192.168.0.102, linked with a bridge on the router. The two hosts can communicate directly on the same LAN, through the bridge.
Let's load the kernel module br_netfilter (to be sure it won't be later) and disable its effects with the (not-per-namespace) toggle bridge-nf-call-iptables, available only in initial namespace:
Warning: again, this can disrupt iptables rules like
-m physdev
anywhere on the host or in containers which rely on br_netfilter loaded and enabled.Let's add some icmp ping traffic counters.
Let's ping:
The counters won't match:
Let's enable bridge-nf-call-iptables and ping again:
This time switched packets got a match in iptables' filter/FORWARD chain:
Let's put a DROP policy (which zeroes the default counters) and try again:
The bridge code filtered the switched frames/packets via iptables. Let's add the bypass rule (which will zero again the default counters) like in OP and try again:
Let's see what is now actually received on host2 during a ping from host1:
... instead of source 192.168.0.101. The MASQUERADE rule was also called from the bridge path. To avoid this either add (as explained in section 7's example) an exception rule before, or state a non-bridge outgoing interface, if possible at all (now it's available you can even use
-m physdev
if it has to be a bridge...).Randomly related:
LKML/netfilter-dev: br_netfilter: enable in non-initial netns: it would help to enable this feature per namespace rather than globally, thus limiting interactions between hosts and containers.
netfilter-dev: netfilter: physdev: relax br_netfilter dependency: merely attempting to delete a non-existing physdev rule could create problems.
netfilter-dev: connection tracking support for bridge: WIP bridge netfilter code to prepare stateful bridge firewalling using nftables, this time more elegantly. I think one of the last steps to get rid of iptables ('s kernel side API).
If the above threats not solving your problem, here's how I resolved the problem on my Debian Stretch.
1st, save your current iptables
2nd, delete ALL the Docker created rules
3rd, add itpables rules to accept any traffic to INPUT, FORWARD and OUTPUT
4th, restart your Docker
Once step 3 completed, you can ping your blocked libvert KVM host from another PC, you will see ICMP responses.
Restarting Docker will also add its required iptables rules back to your machine but it will not be blocking your bridged KVM hosts any more.
If the above solution not working for you, you can restore the iptables using the following command:
Restore iptables