context
There are 2 servers:
server1 - eth0 10.129.76.16 eth0.2 192.168.0.103
server2 - eth0 10.129.79.1 eth0.2 192.168.62.101
The 192.x.x.x addresses are connected to the same vlan (vlan2) and are able to see eachother. The 10.x.x.x addresses are connected to different vlan's which are not able to see eachother.
on request of David Swartz: the routing table on server 1 is:
~$ sudo route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.129.76.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.0.0 0.0.0.0 255.255.192.0 U 0 0 0 eth0.2
0.0.0.0 192.168.61.254 0.0.0.0 UG 100 0 0 eth0.2
the routing table on server 2 is:
~$ sudo route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 <public IP gw> 0.0.0.0 UG 100 0 0 eth0.11
10.129.79.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
<public IP> 0.0.0.0 255.255.255.128 U 0 0 0 eth0.11
192.168.0.0 0.0.0.0 255.255.192.0 U 0 0 0 eth0.2
Problem:
When I ping from server 1 to server 2, it seems no packets are arriving and vice versa. When I check the routes (route -n) I see the default gw uses eth0.2 on both servers. But when I use arping, I get a response one way (from server 2 to server 1) but no response vice versa.
arping 192.168.62.101
ARPING 192.168.62.101 from 10.129.76.16 eth0
^CSent 2 probes (2 broadcast(s))
Received 0 response(s)
As you can see it uses the 10.x.x.x address instead of the 192.x.x.x. And as I told before, the 10.x.x.x address is unreachable from the other server.
When I force arping to use eth0.2, it does work.
I don't have any problems with ping'ing other servers from any of those 2 servers.
I did see this in the arp tables:
~# arp -n | grep 192.168.0.103
192.168.0.103 (incomplete) eth0
and
~# arp -n | grep 192.168.62.101
Question
quite obvious... How can I make these servers see each other again?
Things I've tied
clear the apropriate entries in the arptable and tried to get rid of the (incomplete) But I think the biggest problem is that eth0 is used instead of eth0.2 for the packets from server 1 to server 2
Because of David Swartz' remark about the routing tables, I added a route in there defining the host. I added
192.168.0.103 0.0.0.0 255.255.255.255 UH 0 0 0 eth0.2
and
192.168.62.101 0.0.0.0 255.255.255.255 UH 0 0 0 eth0.2
to the appropriate servers but this didn't solve the problem so I presume the problem is not in the routing.
My guess
I guess the problem lies in the following.
~$ arp -n | grep 192.168.0.103
192.168.0.103 (incomplete) eth0
but I'm unable to remove this entry. (arp -d 192.168.0.103 has no effect)
Thanks for reading and even more thanks for answering!
Here's a snippet:
arpping doesn't respect the local routing table:
use icmp to test:
are your vlans okay?
You didn't mention what kernel you're using or what version of
arping
, and there is the possibility of a bug in one or the other. The fact that you can successfullyarping
when you specify the subinterface does indicate that all of your layer-2 networking is behaving correctly.Try using
ip route get 192.168.62.101
on server1 to ask the kernel directly how it would send your traffic. Based on the routing tables you've posted, sending via eth0.2 is clearly the correct behavior, and ifip route get
returns a different answer, you may be looking at a kernel bug. If it returns the correct answer, thenarping
is to blame, but that seems unlikely.The
(incomplete)
entry doesn't need to be removed; it is a placeholder that lets the kernel know that it did try to ARP that IP, so that an ARP reply should be considered valid and not an ARP-poison attack, but that it didn't get an answer. It'll time out.