I have a physical SLES 11 SP2 server on a Sun Fire x4140 that is giving me problems with networking upon reboot. The NICs are onboard.
The networking appears successful during boot, but network services such as nfs fail hard. This is because eth0 and eth1 are both receiving the same configuration and are both ifup-ed. Once everything times out and I'm at the console, ifconfig shows that eth0 and eth1 are UP and running with the same IP. Attempting to ping anything in that subnet fails. Restarting the network service fixes the issue.
eth0 is the correct NIC that should be configured as primary, per the MAC address.
Question: Whats causing eth1 to be brought up with the same config as eth0??
I do not have a config script set up for eth1:
banjer@harp:~> ls -la /etc/sysconfig/network/
total 104
drwxr-xr-x 6 root root 4096 Jun 11 12:21 .
drwxr-xr-x 6 root root 4096 Apr 10 09:46 ..
-rw-r--r-- 1 root root 13916 Apr 10 09:32 config
-rw-r--r-- 1 root root 9952 Apr 10 09:36 dhcp
-rw------- 1 root root 180 Jun 11 12:21 ifcfg-eth0
-rw------- 1 root root 180 Jun 11 12:21 ifcfg-eth3
-rw------- 1 root root 172 Feb 1 08:32 ifcfg-lo
-rw-r--r-- 1 root root 29333 Feb 1 08:32 ifcfg.template
drwxr-xr-x 2 root root 4096 Apr 10 09:32 if-down.d
-rw-r--r-- 1 root root 239 Feb 1 08:32 ifroute-lo
drwxr-xr-x 2 root root 4096 Apr 10 09:33 if-up.d
drwx------ 2 root root 4096 May 5 2010 providers
-rw-r--r-- 1 root root 25 Nov 16 2010 routes
drwxr-xr-x 2 root root 4096 Apr 10 09:36 scripts
On a side note, eth3 is also configured with an IP in a different subnet, but this has not posed any problems. FYI the kernel module being used is forcedeth
.
banjer@harp:~> sudo cat /etc/sysconfig/network/ifcfg-eth0
BOOTPROTO='static'
BROADCAST=''
ETHTOOL_OPTIONS=''
IPADDR='172.21.64.25/20'
MTU=''
NAME='MCP55 Ethernet'
NETWORK=''
REMOTE_IPADDR=''
STARTMODE='auto'
USERCONTROL='no'
ONBOOT="yes"
Here's eth3 in case you need to see it:
banjer@harp:~> sudo cat /etc/sysconfig/network/ifcfg-eth3
BOOTPROTO='static'
BROADCAST=''
ETHTOOL_OPTIONS=''
IPADDR='172.11.200.4/24'
MTU=''
NAME='MCP55 Ethernet'
NETWORK=''
REMOTE_IPADDR=''
STARTMODE='auto'
USERCONTROL='no'
ONBOOT="yes"
Perhaps is something related to udev? 70-persistent-net-rules
looks OK to me, but I may not understand it completely.
banjer@harp:~> cat /etc/udev/rules.d/70-persistent-net.rules
# This file was automatically generated by the /lib/udev/write_net_rules
# program, run by the persistent-net-generator.rules rules file.
#
# You can modify it, as long as you keep each rule on a single
# line, and change only the value of the NAME= key.
# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:4f:8d:85:4c", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth2"
# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:4f:8d:85:4a", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"
# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:4f:8d:85:4b", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth1"
# PCI device 0x10de:0x0373 (forcedeth)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:18:4f:8d:85:4d", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth3"
# PCI device 0x1077:0x3032 (qla3xxx)
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:c1:dd:0e:34:6c", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth4"
Any other thoughts on what would cause this?
UPDATE 1
Per suggestions, I gave a config to all the other NICs not being used (eth1 and eth2) e.g. here is eth1:
banjer@harp:/etc/sysconfig/network> sudo cat ifcfg-eth1
BOOTPROTO='static'
BROADCAST=''
ETHTOOL_OPTIONS=''
IPADDR=''
MTU=''
NAME='MCP55 Ethernet'
NETMASK='255.255.255.0'
NETWORK=''
REMOTE_IPADDR=''
STARTMODE='off'
ONBOOT='no'
USERCONTROL='no'
and added the specific HWADDR
to the NICs that are actually plugged in (eth0 and eth3). During the test reboot, I see the networking come up as expected, and eth1 and eth2 say "skipped" as expected. However, eth1 is still getting brought up with eth0's config.
I set udev_log="debug"
in /etc/udev/udev.conf
, and now I have a bunch of debug messages in /var/log/messages
. Here is a paste of grep eth1 /var/log/messages
, but I don't see anything that stands out when comparing to a grep of other eth's.
UPDATE 2
Thinking this is a udev issue, I made a change to /lib/udev/rules.d/75-persistent-net-generator.rules
and did rm /etc/udev/rules.d/70-persistent-net.rules
.
# device name whitelist
#KERNEL!="eth*|ath*|wlan*[0-9]|msh*|ra*|sta*|ctc*|lcs*|hsi*", GOTO="persistent_net_generator_end"
KERNEL!="eth[03]|ath*|wlan*[0-9]|msh*|ra*|sta*|ctc*|lcs*|hsi*", GOTO="persistent_net_generator_end"
After rebooting, this did exactly what I wanted (generated rules for eth0, eth3) but it did not solve the problem. eth1 is still brought up. Is there a way to debug the entire boot process, e.g. strace? I have no idea where this is coming from.
As a band-aid, I'm adding an rc script to restart the network late in the boot process.
you say you don't have a config script for eth1. why not? is it supposed to be configured or not? if it is, then what IP is it supposed to have. static allocations or dhcp?
those are questions for you to think about, btw, not necessarily to answer here.
try creating a config for eth1, even if it's just a minimal one with ONBOOT="no", suse might be doing some insane default automagic crap if there's no config file.
Making the config files more specific should help. Add the following directives to your ifcfg-ethX files:
Rinse, Lather, Repeat for eth3 etc
You could (should?) add config files for eth1 etc as well:
Try adding:
to
/etc/sysconfig/network-scripts/ifcfg-eth0
. You may also want to create anifcfg-eth1
that contains something like this:At least on RHEL that will just bring up the interface with no IP configuration, and the networking init scripts look similar on SuSE 11. The other solution regarding SuSE networking configuration is to clear out the
70-persistent-net.rules
with something like:That will clear the udev rules and tell init to use the ifcfg-eth* files for interface identification.
I was unable to determine the cause behind this mystery of two NICs getting configured the same IP and subnet on boot.
The final solution to the problem however, was to move the cable from the first NIC to the second NIC, i.e. from eth0 to eth1. Then I configured ifcfg-eth1 and "unconfigured" ifcfg-eth0. Now my networking and network-dependent services come up perfectly.
I get the sense that it may be a
forcedeth
module or perhaps a BIOS issue, but I won't be spending any more time on it, as we're building servers with totally different hardware these days and moving from SLES to CentOS, so I don't expect the problem to manifest again.