I have a few linux test boxes on Scaleway, each having 2x NICs that are all connected to the same network 10.0.0.0/8
but each has its own gateway.
I want to be able to use both NICs (eth0/eth1) and their IPs for communication. So if applications bound to IP .187 then dev eth0 should be used. If an application is bound to IP .189 then eth1 should be used.
Right now only interface eth0 with IP .187 is responding to requests. Any requests.(Thats why I uses ping and ssh for testing). However If I change default route from eth0 to eth1(ip .189) then the outgoing traffic is routed through eth1 correctly, in this case eth0 is then not usable.
So how to configure the box, so both interfaces are usable.
Given
Box 1:
eth0_ip = 10.5.68.187/31
eth0_gw = 10.5.68.186
eth1_ip = 10.5.68.189/31
eth1_gw = 10.5.68.188
Approach
Based on my research, here, here I created a bash script that should add static routes with tables so one can use both nics.
#/bin/bash
# My Vars with IP and GW for eth0
eth0_ip=$(ip -o -4 addr list eth0 | awk '{print $4}' | cut -d/ -f1)
eth0_gw=$(ip route list dev eth0 | awk '{print $1}' | tail -1 | cut -d'/' -f1)
eth1_ip=$(ip -o -4 addr list eth1 | awk '{print $4}' | cut -d/ -f1)
eth1_gw=$(ip route list dev eth1 | awk '{print $1}' | tail -1 | cut -d'/' -f1)
#ip route add 10.0.0.0/8 dev eth0 table 1 priority 100
#ip route add ${eth0_ip} dev eth0 table 1
ip route add default via ${eth0_gw} dev eth0 table 1
ip rule add from ${eth0_ip}/32 table 1
#ip route add 10.0.0.0/8 dev eth1 table 2 priority 110
#ip route add ${eth1_ip} dev eth1 table 2
ip route add default via ${eth1_gw} dev eth1 table 2
ip rule add from ${eth1_ip}/32 table 2
ip route flush cache
I did some variation of the script, but non of them worked
Output
[node]# ip route
default via 10.1.229.186 dev eth0
10.1.229.186/31 dev eth0 proto kernel scope link src 10.1.229.187
10.1.229.188/31 dev eth1 proto kernel scope link src 10.1.229.189
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.18.0.0/16 dev docker_gwbridge proto kernel scope link src 172.18.0.1
[node]# ip route show table 1
10.1.229.187 dev eth0 scope link
[node]# ip route show table 2
10.1.229.189 dev eth1 scope link
Testing
[]]# ip route get 10.5.68.187 from 10.1.229.187
10.5.68.187 from 10.1.229.187 via 10.1.229.186 dev eth0
cache
[]# ip route get 10.5.68.187 from 10.1.229.189
10.5.68.187 from 10.1.229.189 via 10.1.229.188 dev eth1
cache
From another machine.
ping 10.1.229.187 # OK
ping 10.1.229.189 # NOK
nmap 10.1.229.187 -p 22 # OK
nmap 10.1.229.189 -p 22 # NOK
So how can I setup routing so it works, communicate with .187 and .189 at the same time.
Update 2:
With this setup I was able have some sort of success.
eth0_ip=$(ip -o -4 addr list eth0 | awk '{print $4}' | cut -d/ -f1)
eth0_gw=$(ip route list dev eth0 | awk '{print $1}' | tail -1 | cut -d'/' -f1)
eth1_ip=$(ip -o -4 addr list eth1 | awk '{print $4}' | cut -d/ -f1)
eth1_gw=$(ip route list dev eth1 | awk '{print $1}' | tail -1 | cut -d'/' -f1)
ip route add default via ${eth0_gw} dev eth0 table 1
ip rule add from ${eth0_ip} table 1
ip route add default via ${eth1_gw} dev eth1 table 2
ip rule add from ${eth1_ip} table 2
After I applied the above script I modified the default route, switch to eth1 andt then back, after that I was able to ping to .187 and .189. (In another example I also removed it completely) Im not sure what the problem here is.
# remove and add route
ip route change default via ${eth1_gw} dev eth1
ip route change default via ${eth0_gw} dev eth0
ip route flush cache
Update 3:
From various tryouts, it seems to me that table 2 is completely ignored . As the ISP has a custom kernel, is possible to disable routing tables in the kernel? How can I test it?
Update 4:
Once again I had a little progress, but still far away from a working solution. Experimenting with different options, I stumbled across this strange situation. In order to see eth1 working, I need to use the interface in question first once e.g.
I need to ping from IP .189(node1) to another node on the network e.g.:
Example: Node 1-> Node 2: ping -I 10.1.229.189 10.5.68.187
this works and then suddenly in return the ping
from Node 2 -> Node 1 ping 10.1.229.189
is working. If I don't do the initial connection/ping from (Node 1 -> Node 2) then (Node 2 -> Node 1) isn't working.
The problem here is however, If I restart the machine or wait some time (10-60 Minutes), it goes back to the initial state.
The minimal setup that is partly working is this, (I removed everything subsequently, that didn't make a difference)
eth1_ip=$(ip -o -4 addr list eth1 | awk '{print $4}' | cut -d/ -f1)
eth1_gw=$(ip route list dev eth1 | awk '{print $1}' | tail -1 | cut -d'/' -f1)
ip route add default via ${eth1_gw} dev eth1 table 2
ip rule add from ${eth1_ip} lookup 2
This is the output as requested from @Anton Danilov
[root@cluser-node-1 ~]# ip -4 r ls table all
default via 10.1.229.188 dev eth1 table 2
default via 10.1.229.186 dev eth0
10.1.229.186/31 dev eth0 proto kernel scope link src 10.1.229.187
10.1.229.188/31 dev eth1 proto kernel scope link src 10.1.229.189
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.18.0.0/16 dev docker_gwbridge proto kernel scope link src 172.18.0.1
local 10.1.229.187 dev eth0 table local proto kernel scope host src 10.1.229.187
broadcast 10.1.229.187 dev eth0 table local proto kernel scope link src 10.1.229.187
local 10.1.229.189 dev eth1 table local proto kernel scope host src 10.1.229.189
broadcast 10.1.229.189 dev eth1 table local proto kernel scope link src 10.1.229.189
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1
broadcast 172.17.0.0 dev docker0 table local proto kernel scope link src 172.17.0.1
local 172.17.0.1 dev docker0 table local proto kernel scope host src 172.17.0.1
broadcast 172.17.255.255 dev docker0 table local proto kernel scope link src 172.17.0.1
broadcast 172.18.0.0 dev docker_gwbridge table local proto kernel scope link src 172.18.0.1
local 172.18.0.1 dev docker_gwbridge table local proto kernel scope host src 172.18.0.1
broadcast 172.18.255.255 dev docker_gwbridge table local proto kernel scope link src 172.18.0.1
[root@cluser-node-1 ~]# ip rule list
0: from all lookup local
32765: from 10.1.229.189 lookup 2
32766: from all lookup main
32767: from all lookup default
[root@cluser-node-1 ~]# ip n ls dev eth1
10.1.229.188 lladdr 00:07:cb:0b:0d:93 REACHABLE
[root@cluser-node-1 ~]# tcpdump -ni eth1 arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
16:36:17.237182 ARP, Request who-has 10.1.229.188 tell 10.1.229.189, length 28
16:36:17.237369 ARP, Reply 10.1.229.188 is-at 00:07:cb:0b:0d:93, length 46
2 packets captured
4 packets received by filter
0 packets dropped by kernel
This is the other output after system is restarted or after the 15-30 min timeout.
[root@cluser-node-1 ~]# tcpdump -ni eth1 arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
[root@cluser-node-1 ~]# ip n ls dev eth1
10.1.229.188 lladdr 00:07:cb:0b:0d:93 REACHABLE
Check, there are replies (maybe replies are going out through other interface) or replies are missing.
Check the settings of the reverse path filter (check counters in the output of 'nstat -az' or 'netstat -S' - there is TcpExtIPReversePathFilter for packets dropped by rp_filter). Disable it or set in loose mode (see sysctl settins description). Lookup the reverse route for incoming packets to conmirm the assumption.
I think you should add routes for directly connected networks into route tables, because it required by arp resolve of corresponded gateways and for communication to other hosts in directly connected networks. These settings should be enough to solve your case:
Also, you should know, what this setup is only for case, where the ip addresses on these interfaces with overlapsed addressing is different. Otherwise you should use more complex scheme with CONNMARK and pbr by firewall marks.
If you're trying to ping the host from host itsels, you should use these commands: