I have a host which is multihomed, eth0 is on 172.31.254.0/24 and eth0.10 is on 172.31.253.0/24 . Obviously eth0.10 is a subinterface on vlan id 10.
From this host I can succesfully ping hosts on the 172.31.253.0/24 network but not consistently on the 172.31.254.0/24 network. For example, notice the dropout in pings #5-#25:
[root@pbx1 ~]# ping -I eth0 172.31.254.37
PING 172.31.254.31 (172.31.254.31) from 172.31.254.13 eth0: 56(84) bytes of data.
64 bytes from 172.31.254.37: icmp_seq=1 ttl=128 time=1.03 ms
64 bytes from 172.31.254.37: icmp_seq=2 ttl=128 time=0.247 ms
64 bytes from 172.31.254.37: icmp_seq=3 ttl=128 time=0.236 ms
64 bytes from 172.31.254.37: icmp_seq=4 ttl=128 time=4.00 ms
64 bytes from 172.31.254.37: icmp_seq=26 ttl=128 time=0.237 ms
64 bytes from 172.31.254.37: icmp_seq=27 ttl=128 time=0.299 ms
My interfaces look right:
[root@myhost1 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:22:4D:B2:28:AC
inet addr:172.31.254.13 Bcast:172.31.254.255 Mask:255.255.255.0
inet6 addr: fe80::222:4dff:feb2:28ac/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:121808 errors:0 dropped:0 overruns:0 frame:0
TX packets:120948 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:16685937 (15.9 MiB) TX bytes:23059300 (21.9 MiB)
Interrupt:16 Memory:d0020000-d0040000
eth0.10 Link encap:Ethernet HWaddr 00:22:4D:B2:28:AC
inet addr:172.31.253.4 Bcast:172.31.253.255 Mask:255.255.255.0
inet6 addr: fe80::222:4dff:feb2:28ac/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2732 errors:0 dropped:0 overruns:0 frame:0
TX packets:828 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1394817 (1.3 MiB) TX bytes:417925 (408.1 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:30573 errors:0 dropped:0 overruns:0 frame:0
TX packets:30573 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2225449 (2.1 MiB) TX bytes:2225449 (2.1 MiB)
and the routing table looks right:
[root@myhost1 ~]# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
172.31.254.0 * 255.255.255.0 U 0 0 0 eth0
172.31.253.0 * 255.255.255.0 U 0 0 0 eth0.10
link-local * 255.255.0.0 U 1002 0 0 eth0
default firewall.mydomain.com 0.0.0.0 UG 0 0 0 eth0
So why are my packets not getting to (or responses getting back from) hosts on the 172.31.254.0/24 network? I set /proc/sys/net/ipv4/conf/eth0/rp_filter to 0 but it makes no difference
Update: Routing table for destination host:
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 172.31.254.1 172.31.254.37 266
127.0.0.0 255.0.0.0 On-link 127.0.0.1 306
127.0.0.1 255.255.255.255 On-link 127.0.0.1 306
127.255.255.255 255.255.255.255 On-link 127.0.0.1 306
172.31.252.0 255.255.255.0 172.31.254.2 172.31.254.37 11
172.31.253.0 255.255.255.0 172.31.254.2 172.31.254.37 11
172.31.254.0 255.255.255.0 On-link 172.31.254.37 266
172.31.254.37 255.255.255.255 On-link 172.31.254.37 266
172.31.254.255 255.255.255.255 On-link 172.31.254.37 266
224.0.0.0 240.0.0.0 On-link 127.0.0.1 306
224.0.0.0 240.0.0.0 On-link 172.31.254.37 266
255.255.255.255 255.255.255.255 On-link 127.0.0.1 306
255.255.255.255 255.255.255.255 On-link 172.31.254.37 266
===========================================================================
Persistent Routes:
Network Address Netmask Gateway Address Metric
172.31.252.0 255.255.255.0 172.31.254.2 1
172.31.253.0 255.255.255.0 172.31.254.2 1
0.0.0.0 0.0.0.0 172.31.254.1 Default
===========================================================================
You may have 172.31.254.13 used twice on your network.
I am also suspecting routing or arp issues on the receiving side:
Could you check arp -na on host 172.31.254.13 and host 172.31.254.31 ?
Full routing tables with ip route command on each host might also help.
I needed to create an iproute2 policy, and the disable rp_filter for the interface and subinterface.
After that it worked perfectly.
You are having a VIP on a different network on the same card. (same MAC) Try to do the same but create vNIC that are separate segments and then set a default route and specific for the other NIC.