I use the boot options biosdevname=1 net.ifnames=1
in order to get consistent, predictable device names. I'm starting to notice a problem where in some cases, the network device names are not consistent. For example, if I drop to a dracut debug shell and look at the output of rdsosreport.txt, I see this:
+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: p3p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether a8:b4:56:50:97:08 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether a8:b4:56:50:97:09 brd ff:ff:ff:ff:ff:ff
Notice that there is a mix of consistent (p3p1) and legacy style (eth1) naming. However, if I look at the the interfaces from the dracut debug shell, I see this:
initqueue:/run/initramfs# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: p3p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether a8:b4:56:50:97:08 brd ff:ff:ff:ff:ff:ff
3: p3p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether a8:b4:56:50:97:09 brd ff:ff:ff:ff:ff:ff
p3p1/p3p2 are the correct expected names. For some reason, early in the initrd sequence, they are coming up in the mixed format. My assumption is that there is some sort of race going on here and given a bit more time, it (udev?) settles into the correct state, but I'm not sure exactly where it is. Unfortunately, this is causing problems for some of our automated server builds, because servers are coming up after (postinstall) first-boot and trying to bring up eth1
when the real interface name is p3p2
.
I've been digging through the dracut modules to try and figure out where the problem may lie, but haven't been able to determine it conclusively yet, so looking for suggestions.
Also, this behavior doesn't happen all the time. The same server, booting the same image sometimes works fine, and other times gets this mixed naming behavior. Which also sort of tells me this is some kind of race - sometimes the race is won, and sometimes it is lost.
Answering my own question here. It turns out, the problem was (partially) self-inflicted.
The part we can't control:
Using boot option
biosdevname=1
has the potential to cause races during the interface renaming phase. If you can live without it, simply usingnet.ifnames=1 biosdevname=0
might be preferable, even if the resulting names are "less pretty".The part we CAN control:
Our site uses a custom modified dracut
40network
module. One of the main things our version does is that it probes the contents of/sys/class/net/
looking for viable interfaces to automatically add to a bond. (we don't always know the device names in advance, which is why the module needed some logic to identify them on its own). The race mentioned above can cause a delay in the renaming of files in/sys/class/net/
. The solution was simple: add a 5 second sleep to the script prior to probing/sys/class/net/
. This givesbiosdevname
(hopefully more than enough) time to finish renaming devices. Testing so far seems A-OK.