If you want failure resilience, you can build several servers, where each can server any clients, and put the data in some SAN or shared DB cluster. But if the clients connect directly to one server, and that server fails, then they get a network error. You can solve that by putting a load-balancer in front of them, either software base in a PC, or specialized hardware. In that case, the death of the individual server does not have to cause a network error in the client.
But then the load-balancer itself could die. So you're not really any better than before, right?
If I assume the DC makes sure the packets get safely to the switch(s) on which my servers are, is there any way to route traffic on one IP such that a hardware failure doesn't cause a lost connection? And could this be done without specialized hardware? Or am I missing some reason why it's not possible?
For the sake of the argument, we'll assume the (non-web) tragic is broken into messages, and that the same messages can be buffered and replayed safely, for every message that was received but wasn't acked by the failing server.
Yes, you are right in the point that the load balancer could die also. For this reason, you need also to have two load balancers instead of one. The simplest scenario is to use them in active-standby mode when you access the active one using a configured VIP. Any Linux box with a package like keepalive or heartbeat can do this along with haproxy as HTTP/TCP load balancer.
You will lose only the current connections when the active load balancer and/or the accessed server dies, but subsequent connections should be OK as you have redundant live servers.
Pfsense
doesn't require special hardware and can actually run in a virtualised environment. It supports load balancing and the synchronisation of states.But there will always be a failover time - however brief, and as such, there will always be a few lost
acks
or the need for aretrans
.I'm not sure any highly available technologies (CARP/VRRP/HSRP) can prevent total loss of packets. But for
TCP/IP
- it will be retransmitted anyway, so it shouldn't be an issue.