I had network problems lately (running Debian, but not specific to any distro, see below using direct /sys manipulation), and I discovered that the source of my woes was that two servers with bonded network interfaces had the same hardware address on their bonds. This MAC address is NOT one of the hardware interfaces' addresses, though it should be (according to most documentations such as this one) :
The bonding interface has a hardware address of 00:00:00:00:00:00 until the first slave is added. If the VLAN interface is created prior to the first enslavement, it would pick up the all-zeroes hardware address. Once the first slave is attached to the bond, the bond device itself will pick up the slave's hardware address, which is then available for the VLAN device.
Furthermore, contrary as what is stated in this documentation, an "empty" bond (without any slaves) hasn't a 00:00:00:00:00 hw address:
# modprobe bonding
# echo +bond0 > /sys/class/net/bonding_masters
# ip link show bond0
3: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether d6:f5:8e:9f:c2:42 brd ff:ff:ff:ff:ff:ff
However that hardware addresses change to the slave's hardware address if and only the slave is added immediately to the bond and the bond address isn't checked. I test with a very simple script (see below) to manipulate the bond.
- First it creates an empty bond without slaves, and displays its hw address (which is apparently pseudo random, but tends to stay the same -- maybe udev plays a role here?).
- Second it creates the bond, sets its mode and adds a slave : the bond takes the slave's hw address as expected.
- Third its creates the bond, reads its hw address, then sets the mode and adds a slave : the bond's hw address doesn't match its slave's hw address (but running the script again and again, from time to time it does! go figure).
- Fourth it creates the bond, wait for one second, then configure it like step 2. The behaviour is mostly the same as step 3 : the bond has the same, pseudo random hw address as step 1, which always differs from the slave's hw address and is always the same.
I don't know exactly how the bond pseudo-random hw address is set up, but it happens to be relatively repetitive across several hardware and software configurations (different hardware, different kernel versions, different Debian releases).
Sometimes, at step 3 the bond's MAC address changes to different pseudo random values, sometimes it matches the slave's address. Most of the time however, it stays the same as in step 1. At step 4 the bond's MAC address is always the same as step one.
Apparently the bond's MAC address setting is very time sensitive?
Here is the script:
echo "unload / load bonding"
rmmod bonding
sleep 1
modprobe bonding
sleep 1
echo "create bond0"
echo +bond0 > /sys/class/net/bonding_masters
echo "bond0 hw address, no slaves:"
cat /sys/class/net/bond0/address
sleep 3
echo "################"
echo "unload / load bonding"
rmmod bonding
sleep 1
modprobe bonding
sleep 1
echo "create bond0 and configure it without delay"
echo +bond0 > /sys/class/net/bonding_masters
# cat /sys/class/net/bond0/address
echo 6 > /sys/class/net/bond0/bonding/mode
echo +enp1s0 > /sys/class/net/bond0/bonding/slaves
echo "Bond0 hw address:"
cat /sys/class/net/bond0/address
echo "enp1s0 hw address:"
ethtool -P enp1s0
echo "################"
sleep 3
echo "unload / load bonding"
rmmod bonding
sleep 1
modprobe bonding
sleep 1
echo "create bond0 and configure it, read its hw address first"
echo +bond0 > /sys/class/net/bonding_masters
cat /sys/class/net/bond0/address
echo 6 > /sys/class/net/bond0/bonding/mode
echo +enp1s0 > /sys/class/net/bond0/bonding/slaves
echo "Bond0 hw address:"
cat /sys/class/net/bond0/address
echo "enp1s0 hw address:"
ethtool -P enp1s0
echo "################"
sleep 3
echo "unload / load bonding"
rmmod bonding
sleep 1
modprobe bonding
sleep 1
echo "create bond0 and configure it after 1 second delay"
echo +bond0 > /sys/class/net/bonding_masters
# cat /sys/class/net/bond0/address
echo 6 > /sys/class/net/bond0/bonding/mode
sleep 1
echo +enp1s0 > /sys/class/net/bond0/bonding/slaves
echo "Bond0 hw address:"
cat /sys/class/net/bond0/address
echo "enp1s0 hw address:"
ethtool -P enp1s0
And here's its output :
unload / load bonding
create bond0
bond0 hw address, no slaves:
ea:dc:34:e6:7c:8d
################
unload / load bonding
create bond0 and configure it without delay
Bond0 hw address:
52:54:00:c8:76:09
enp1s0 hw address:
Permanent address: 52:54:00:c8:76:09
################
unload / load bonding
create bond0 and configure it, read its hw address first
d6:f5:8e:9f:c2:42
Bond0 hw address:
d6:f5:8e:9f:c2:42
enp1s0 hw address:
Permanent address: 52:54:00:c8:76:09
################
unload / load bonding
create bond0 and configure it after 1 second delay
Bond0 hw address:
d6:f5:8e:9f:c2:42
enp1s0 hw address:
Permanent address: 52:54:00:c8:76:09
From time to time, the bond's hw address changes to some other value. However, most of the time it falls back to the same one (here 'd6:f5:8e:9f:c2:42') and seems to cycle across a limited number of MAC addresses across reboots.
However the very serious problem is that different machines end up with the same pseudo random hardware address; when they're connected to the same network switch, chaos ensue. Actually checking across several different machines connected to different networks, at least 4 share the same bond MAC address (though as long as they're not connected together to the same switch, it's mostly harmless).
Notice that in that particular example I set up the bond in mode 6, but I had the problem on machine running in mode 4 (802.3ad) and other modes. This doesn't seem related to the bonding mode at all -- changing the more to 1 or 2 or 4 doesn't change the MAC address.
Of course I could force the bond's MAC address to some meaningful value using a if-up.d script or something similar, but I'd rather have something that works out of the box :)
There have been issues with the mac-addresses used in linux bonding (mode 4) for quite some time; in the past the mac of the first interface in the bond was used, which was fine when a new active aggregate was determined only upon complete failure/disconnection of the active aggregate.
That all changed when the code was changed to select a new active aggregate on every change on the bonded interfaces, which allowed for selecting the aggregate with the most bandwidth/links/etc on every change.
And this is where the problems start; imagine the following:
All was fine on active aggregate 1, but after switching to aggregate 2, eth0 is still connected with its own mac address and aggregate 2 is also using this mac on the other switch! This because (mode 4/802.3ad/lacp) keeps them connected/monitored for state changes, resulting in around 20% or more package loss...
I've seen this happen on many locations and even with appliances like the NetApp Filers and what I tend to end up doing is setting the mac for the bond to a private mac address based on the mac of eth0, by just replacing the first octet with
02
:This way the mac on the bond is unique and traceable/linkable to the actual hardware used.
I did notice some distributions recently started using/setting private mac's for bonded interfaces automagically, but I like to keep it under my own control...