Under Linux you can combine multiple network interfaces into a "bonded" network interface to provide failover.
But there are several modes, some of which do not require switch support. I'm not constrained in my switch in that I can use any of the modes.
However, in reading about the different modes it's not immediately clear what the pros and cons of each one.
- Do some modes provide a faster failover?
- What about CPU load impact for each mode?
- Which modes can combine the bandwidth rather than just provide redundancy?
- Are there limitations to that?
- Does balance-rr require switch support?
- Reliability? What are your experiences running long term?
Most of these points are quite thoroughly described in the
/usr/src/linux/Documentation/networking/bonding.txt
documentation file from the linux source package of your favorite distro. Speed of failover is controlled by the "miimon" parameter for most modes, but shouldn't be set too low; normal values are under one second anyway.Here are the best parts, completed by me:
balance-rr, active-backup, balance-tlb and balance-alb don't need switch support.
balance-rr augments performance at the price of fragmentation, performs poorly with some protocols (CIFS) and with more than 2 interfaces.
balance-alb and balance-tlb may not work properly with all switches; there are often some arp problems (some machines may fail to connect to each other for instance). You may need to tweak various settings (miimon, updelay) to get stable networking.
balance-xor may or may not require switch configuration. You need to set up an interface group (not LACP) on HP and Cisco switches, but apparently it's not necessary on D-Link, Netgear and Fujitsu switches.
802.3ad absolutely requires an LACP group on the switch side. It's the best supported option overall for augmenting performance.
Note: whatever you do, one network connection always go through one and only one physical link. So when aggregating GigE interfaces, a file transfer from machine A to machine B can't top 1 gigabit/s, even if each machine has 4 aggregated GigE interfaces (whatever the bonding mode in use).
The biggest factor in fail-over is the speed with which a link failure is detected. Unplug the cable from the host and they'll all work pretty well. Leave a live link on an otherwise dead switch and most of the modes (except for those that support beacons/keepalives) are going to send part of your traffic nowhere.
Generally speaking network traffic is interrupt driven. The various hashing algorithms aren't going to make a meaningful difference.
Any mode that isn't active/standby or broadcast-all will share traffic to varying degrees. Some modes can balance on a per packet basis, others work on a per-flow basis. The former will more evenly spread load while the latter is far more useful (read: functional/stable) in actual networks.
Yes - there are limitations to each mode, but we need to know a lot more about your application to speak to them.
Only LACP/802.3ad (mode 4) explicitly requires support on the switch. That said, just because you send to the switch with a particular pattern doesn't mean the switch will send -back- to you in the same manner.
The only mode I tend to trust in production is 802.3ad which, with an appropriately configured switch, will assure that only the correct links will end up in the channel as well as providing some measure of symmetry in traffic sharing and a predictable response when a link is down. This mode also avoids some common-but-nasty problems (i.e. unicast flooding). Active/standby is also quite common. The other modes may be required for certain circumstances but, IMO, tend to be more painful.
Other flow/MAC/IP based balancing modes or active/standby can be fine, too, and may be required when dealing with unmanaged switches.
The kernel docs answer some of those questions:
Ethernet bonding