Imagine a load-balancer facing the internet, dispatching requests to several worker-servers in a local network.
Is there a way for the worker-servers to respond directly on the original socket / connection / IP or do they have to respond to the load-balancer for it to forward the response (making it the SPOF)?
Generally the response to the the load balanced request will go straight to the gateway, not back to the load balancer.
So the load balancer is a SPOF for the inbound requests, but because the outbound request goes straight to the gateway, it can bypass that SPOF - although because the ACK would need to come back in through the load balancer, it is still kind of pointless.
Generally you would have two load balancers. Only one of them owns the public IP address at any given time, and they run a heartbeat between them. As soon as the live load balancer loses its heartbeat (i.e. crashes, goes offline, etc), then the 2nd load balancer takes over the IP address.
There are some load balancers that will transparently load balance by using destination NAT but the load balancer will still need to see the return traffic to NAT it back to the original IP that the client opened a connection to. If I open a connection to a load balancer at 10.10.10.10 and a server in the pool sends back a reply from its IP of 192.168.100.100 I'm going to drop that packet because I don't know who 192.168.100.100 is. I'm trying to talk to 10.10.10.10.
You can avoid single point of failure in load balancers by setting up a dual stack with VRRP so that if a single load balancer fails another will take over all of its traffic. VRRP uses a shared MAC address and IP address to accomplish seamless failover.
IP networks don't allow that directly, you need some software to help with you so that one worker-server works as a loadbalancer and if it fails, another automatically takes over loadbalancing responsibility and starts handling the requests coming to loadbalancer IP address. So they all have to run the loadbalancing software. You should be able to do this with software called HAProxy. You can google for "shared ip" to get forward with your research. This solution will naturally reduce one worker-servers serving capacity because it has to act as a loadbalancer for the whole cluster.
And btw. you would probably get more answers for this in the ServerFault.
It depends on the method of load balancing you are using. If you're just simply doing round-robin DNS, then yes your worker-servers will be making their connections "directly" with their clients.
If you're actually using a reverse proxy such as HAProxy, then no. Your worker-servers will use the load balancer as an intermediary. From the perspective of your worker-servers the client is the load-balancer, not the actual client.
If you're concerned with the creation of Single Point of Failure by using a reverse proxy in this way there are strategies you can employ to mitigate some risk such as using redundant proxies. Details will vary based upon your implementation.
You can set up the loopback address of several servers to be the same (its called aliasing). This IP is known to the outside world. Requests come with this IP as the destination IP address. The load balancer can direct requests based on the load balancing criteria and direct the request to one of these servers.
SO yes you can have a situation where several servers can respond to a single IP.