I have a few hosts connected to the same switch, which are all on the same subnet (10.0.0.0/16). Two of these hosts have faster network interfaces so I have connected them together, meaning these two machines now have a direct link with each other without going through a switch.
I now need to set up the routing such that when these two machines try to talk to each other, the packets go over this faster direct link in preference to the slower link via the switch.
The easiest way would be to configure the direct link to be on a different subnet, however then I will need to use different IPs or hostnames depending on which interface to use, and as I would like to be able to deploy standard configs to all machines (e.g. NFS mounts using hostnames) and not have to maintain custom IP overrides in /etc/hosts
, I feel this solution would be too easy to get a hostname wrong and send traffic over the wrong interface.
What I am looking for is a way to tell the two Linux machines that even though eth0
handles 10.0.0.0/16, when you want to communicate with 10.0.0.5, even though it's in eth0's subnet, send the packets through eth1
instead.
I tried adding a host routing rule with route add -host 10.0.0.5 dev eth1
which does send the packet out on the correct interface, however it comes from the wrong IP address (the direct link's subnet rather than the original subnet.)
I guess the only way to fix this is to set the same IP address on both interfaces, but will this cause any problems? Can a machine correctly have the same IP on multiple NICs without causing problems? I'm assuming I'll need to set routing metrics properly so that the NIC connected to the switch is given priority (to avoid all traffic for the subnet being sent to the other host by mistake), but is there anything else I need to be aware of with this set up? Can it lead to any other issues or difficult-to-resolve problems?
Or is there a better, more robust way to achieve this?
EDIT: Here is the extra into requested by @A.B:
$ ip -br link
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>
eth0 UP ec:f4:xx:xx:xx:a4 <BROADCAST,MULTICAST,UP,LOWER_UP>
eth1 UP ec:f4:xx:xx:xx:a5 <BROADCAST,MULTICAST,UP,LOWER_UP>
$ ip -br address
lo UNKNOWN 127.0.0.1/8
eth0 UP 10.0.0.4/16
eth1 UP 10.0.99.4/24
$ ip route
default via 10.0.0.1 dev eth0 proto static
10.0.0.0/16 dev eth0 proto kernel scope link src 10.0.0.4
10.0.0.5 dev eth1 scope link
10.0.99.0/24 dev eth1 proto kernel scope link src 10.0.99.4
I set up the direct link (eth1
) on a separate subnet and then tried the route
command in my original post, and this is where things are now. It looks like perhaps I need to get the src
attribute set for my direct route.
As 10.0.99.4 is part of 10.0.0.0/16 this IP address should be avoided. Else there would be a conflict with the actual 10.0.99.4/16 address, even on eth0 considering Linux is using the Weak Host Model and would answer by default to ARP requests for this IP address also on eth0 for 10.0.99.4, creating ARP conflicts. Don't use conflicting IP addresses.
cleanup:
The standard method with glue IP addresses
Let's choose two unrelated addresses to be used by the two hosts. They have to not clash with anything else in use on your network, but as they are point-to-point /32 addresses, anything can do, they won't be used as part of a LAN, but only as point-to-point/peer addresses. I'll arbitrarily use 192.168.100.4/32 and 192.168.101.5/32. Should later more than 2 of those hosts inherit a faster switch and are connected together using this separate switch, this can be slightly amended and having related IP addresses in the same block is then again easier.
configuration for host 10.0.0.4:
Actually the two command above have a shortcut, you can replace both of them with this single command below:
Now tell the host that to reach 10.0.0.5/32 (which is more specific than 10.0.0.0/16) there's a route using the peer IP address, but preferring a different source IP address than what would be chosen by default (the obsolete
route
command can't do this):With this in place you get:
There's one minor drawback: IP broadcasts are still sent to eth0 and if Strict Reverse Path Forwarding is active (either
sysctl net.ipv4.conf.eth0.rp_filter
orsysctl net.ipv4.conf.all.rp_filter
gives 1 rather than 0 or 2) those broadcasts, when sent by the peer (eg running on peer host 10.0.0.5 something similar toecho test | socat udp4-datagram:10.0.255.255:5555,broadcast -
) will be ignored because received on the now wrong interface. So if you are using protocols relying on this and already apply a Strict Reverse Path Forwarding, switch eth0 to Loose mode if needed:The equivalent configuration for host 10.0.0.5:
For example on Debian-like ifupdown
interface
configuration files you can use thepointopoint
keyword and a fewup
additional commands for any command that doesn't have a direction configuration equivalent. (sysctl
would rather be put in/etc/sysctl.d
).Simplified method without additional (nor duplicate) IP addresses
Actually the only role of 192.168.100.4 and 192.168.100.5 is to resolve link layer addresses to know the route for 10.0.0.4 and 10.0.0.5: there are used as some kind of glue that doesn't play any other role. Those IP addresses will be completely invisible, and no IP packet will ever use 192.168.100.4 or 192.168.100.5 in their content (except if explicitly using those), only ARP requests and answers will. There's no need to use such glue IP addresses at all.
For example the host provider Hetzner gives an example:
to reach an IP address on an interface without configuring an IP address on this interface (nor having this interface used as bridge port). In this example the peer on
tap0
(which is a tun/tap device in Ethernet mode linked to a VM on the other side) has to answer ARP requests to resolve link-layer addresses.But then again for symetrical reasons it doesn't need an IP address configured there either if it already configured it elsewhere, to answer properly an ARP request done through eth1: that's again part of Linux' implementation of the Weak Host Model.
So this can simply be used for host 10.0.0.4, without involving any extra IP address using only a single command:
Or to specify the source (to avoid ambiguity in case the host has more than one):
And for host 10.0.0.5:
For accepting "slow" broadcasts on eth0 from the peer, they still require as before:
ARP requests to resolve their IP addresses can be answered on both interfaces (as linked above Linux does this by default), but here resolution or entries on the usual (old) side eth0 if any(eg: before those settings are put in place) won't trigger effects such as ARP flux because both peers are configured together to use eth1 leaving no other possible interpretation for the routes.
Choose what method you prefer. The first is more classical, the second has a simpler setup (but you might get a few "this can't work" from your peers). Remember that manually added routes are lost when an interface is administratively put down then up, so those settings must be put in an adequate network configuration setting to stay properly in effect.
Using A.B's excellent explanation, this is what I ended up doing to get it to work, for the benefit of anyone else using
systemd
:Of course the source and destination IP addresses (but not the netmasks) are flipped for the other host.