I have a Linux server hosting our bug tracking software (CentOS 5.2 Kernel 2.6.18-128.4.1.el5) that I have having some strange network problems with. The machine is configured with two NICS, one for the public interface and the other for our server back end network.
The problem is that after doing a service network restart I can ping the public interface and it sends anywhere from 200-500 ICMP packets and then all of a sudden I start getting a request timed out error. Strange but as soon as I connect to the private interface the ping starts working again to the public interface. I clearly have a routing issue somewhere.
I have a Juniper Router with the following configuration.
Interface 0/0 -- Connect subnet to the ISP at our co-location Interface 0/2 -- For our DRAC network Interface 0/3 -- The Server-backend network (plugs directly into a switch that feeds to all the NICs that are on the 10.3.20.x network. Interface 0/4 -- Plugs directly into another switch that feeds our public interfaces, that interface as all the gateways from our public ip rangs as secondary IP addresses.
I hope that someone can ask the right questions that can lead me to check things and figure out what is going on. Has anyone had similar problems and what kind of things should I be checking? Routing issue or something even more complicated?
[root@fogbugz ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth0
BOOTPROTO=static
IPADDR=72.249.134.98
NETMASK=255.255.255.248
BROADCAST=72.249.134.103
HWADDR=00:16:3E:AA:BB:EE
ONBOOT=yes
[root@fogbugz ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
# Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
DEVICE=eth1
BOOTPROTO=static
BROADCAST=10.3.20.255
HWADDR=00:17:3E:AA:BB:EE
IPADDR=10.3.20.25
NETMASK=255.255.255.0
NETWORK=10.3.20.0
ONBOOT=yes
[root@fogbugz ~]# cat /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=fogbugz.dfw.hisg-it.net
GATEWAY=72.249.134.97
[root@fogbugz ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
72.249.134.96 0.0.0.0 255.255.255.248 U 0 0 0 eth0
10.3.20.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1
10.0.0.0 10.3.20.1 255.0.0.0 UG 0 0 0 eth1
0.0.0.0 72.249.134.97 0.0.0.0 UG 0 0 0 eth0
Check the output of 'arp -an' in both working and non-working state and look for MAC addresses / IPs on wrong interfaces.
If you see something wrong there, you maybe have bridged some network segments together or have an proxy ARP issue?
Please show the output of
route -n
andarp -an
when the network isn't working.Showing what it looks like when everything is working is not really useful. ;)
Two things, one is you may be having issues with asymmetric routing with firewalls, this is a common case of connections that come up for a few seconds then die.
The other is the Linux RPF, to turn that off set this
sysctl
:What NIC on that server? I have experienced several times of this kind of problem on Marvell chipset NIC. Normally the driver problem and BIOS problem.
We have the same issue. Were are on RHEL 5.3, 2.6.18-128.el5 We had an issue with our route to our private network. It kept dropping. We found a work around. We put this in our crontab and this fixes the issue. * * * * * /sbin/arp -d IP to storage
http://kbase.redhat.com/faq/docs/DOC-17338 above work around work with you.
Have experienced an issue with the linux kernel with the realtek driver on certain boards where watchdog timeouts would cause network interfaces to stop working. The only way was to reset networking and/or modprobe the ko file for the module. Check /var/log/messages for any such messages.