I have a requirement for a system with no Single Point of Failure. The client have offered two Ethernet cables per server.
Each of the servers will connect into their network, however I want to set-up a separate network exclusively for PostgreSQL replication traffic (using Streaming Replication). The client are sensitive to high traffic volumes on their network and I also want to ensure that replication happens as quickly as possible without being affected by other systems on their network.
The plan is to have two separate dual port NICs so I end up with two connections into each network, which are teamed using Network Card Bonding and a Link Aggregation switch. This way either NIC can fail, and there is still an active connection to both networks.
My problem is that with Network Card Bonding (Teaming/Trunking) you have both network connections going to the same network switch - this way the network switch for my database replication network becomes a single point of failure.
How can I avoid a Single Point of Failure between the database cluster nodes?
Obviously you need two switches. With LAG you need to end on the same physical switch (there are proprietary ways of this working in a switch cluster) but with LACP you can terminate on multiple devices and they coordinate the link.
http://en.wikipedia.org/wiki/Link_aggregation