I often see web app architectures with a SLB / reverse-proxy in front of a bunch of app servers.
What happens when the number of connections to the SLB requires too many resources for a single SLB to handle effectively? For a concrete yet over-the-top example, consider 2 million persistent HTTP connections. Clearly a single SLB cannot handle this.
What is the recommended configuration for scaling out a SLB?
Is it typical to create a group/cluster of LBs? If so, how is client load spread out between the group of LBs?
OK, there is already an accepted answer, but there is something to add.. The most common 'classical' ways of scaling the load balancer tier are (in no particular order):
DNS Round Robin to publicize multiple IP addresses for the domain. For each IP address, implement a highly available server pair (2 servers cooperating on keeping one IP address working at all times.) Each IP corresponds to one load balancer cluster, either using appliances or servers with load balancing software. Scale horizontally by adding more load balancer pairs as needed.
Routing or firewall tweaks to spread load to multiple load balancers. Have the front router or front firewall spread the incoming connections to several IP addresses (each representing one load balancer pair) by hashing the source IP address, having multiple equal-cost routes to the load balancers, or similar.
A layer of IP level load balancers in front of a layer of HTTP level load balancers. IP-layer load balancing can be implemented in ASICs / silicon, and can be wicked fast for some things. Thus a single IP load balancer pair can often 'keep up' with a several HTTP/HTTPS level load balancers, and provide multi-gigabit performance levels while keeping the architecture nice and simple.
Going completely in-depth on the different ways doing the above would require a very long answer. But in general, it is not that hard to scale the load balancer tier, it is far harder to scale the application server tier and especially the database tier.
Whether you choose an appliance form factor (F5, Cisco, A10) or a generic server (Windows / Linux + software) matters less. The major considerations when scaling out the load balancer layer are:
Generally, you don't need to worry about this before your website gets very large -- a single modern server with fx nginx will handle tens of thousands of plain HTTP requests per second. So don't do premature optimization, don't deal with this before you have to.
Load balancers can't easily be scaled by other load balancers since there will inherently be a single load balancer on the chain somewhere maintaining the connections. That said, balancers such as LVS or HAProxy have absurd capacity in the Gbps range. Once you get beyond the capabilities of a single load balancer (software, hardware, whatever), then you'll need to move into other techniques such as round robin DNS.
The key to scale an HTTP load balancing layer is to add another layer of lower-level (IP or TCP) load balancing first. This layer can be built entirely with open-source software, although you'll get better results if you have modern routers.
The flows (TCP sessions) should be hashed using headers such as source/destination IP and TCP ports, to decide which frontend they go to. You also need a mechanism to make sure that when a frontend dies, it stops getting used.
There are various strategies, I'm going to outline a couple that I've used in production on sites serving millions of users, so you can get the idea. It would be too long to explain everything in details but I hope this answer will give you enough information/pointers to get started. In order to implement these solutions you're going to need someone who is really knowledgeable about networking.
Admittedly what I'm describing here is much harder to implement than what is described in other answers, but this is really the state-of-the-art if you have a high-trafficked website with big scalability issues and availability requirements over 99.9%. Provided you already have a network engineer kinda guy onboard, it costs less to setup and run (both in capex and opex) than load balancer appliances, and it can be scaled further at almost no additional cost (vs. buying a new, even more expensive appliance when you outgrow your current model).
First strategy: with a firewall
Presumably you have a couple routers on which your ISP uplinks are connected. Your ISP provides 2 links (active/passive, using VRRP). On your routers, you also use VRRP, and you route the traffic going to your public network to a firewall. The firewalls (
FW 1
andFW 2
below) also are also active/passive and will filter the traffic and send each flow to a healthy frontend server (your HTTP load balancers,FE 1
andFE 2
below).The goal is to have a flow look like this:
Now the magic happens in steps 4 and 5, so let's see in more details what they do.
Your firewall knows the list of frontends (
FE 1
andFE 2
), and it will pick one of them based on a particular aspect of the flow (e.g. by hashing the source IP and port, among other headers). But it also needs to make sure that it's forwarding traffic to a healthy frontend, otherwise you will blackhole traffic. If you use OpenBSD, for instance, you can userelayd
. Whatrelayd
does is simple: it health-checks all your frontends (e.g. by sending them a probe HTTP request), and whenever a frontend is healthy it adds it to a table that the firewall uses to select the next hop of the packets of a given flow. If a frontend fails health checks, it is removed from the table and no packets are sent to it anymore. When forwarding a packet to a frontend, all the firewall does is swapping the destination MAC address of the packet to be that of the frontend chosen.In step 5, the packets from the user are received by your load balancer (be it Varnish, nginx, or whatever). At this point, the packet is still destined to your public IP address so you need to alias your VIP(s) on the loopback interface. This is called DSR (Direct Server Return), because your frontends terminate the TCP connection and the firewall in between only sees simplex traffic (only incoming packets). Your router will route outgoing packets directly back to the ISP's routers. This is especially good for HTTP traffic because requests tend to be smaller than responses, sometimes significantly so. Just to be clear: this isn't an OpenBSD specific thing and is widely used in high-trafficked websites.
Gotchas:
Second strategy: without a firewall
This strategy is more efficient but harder to setup because it depends more on the specifics of the routers you have. The idea is to bypass the firewall above and have the routers do all the work the firewalls were doing.
You'll need routers that support per-port L3/L4 ACLs, BGP and ECMP, and Policy Based Routing (PBR). Only high-end routers support these features, and they often have extra licensing fees to use BGP. This is typically still cheaper than hardware load balancers, and is also far easier to scale. The good thing about these high-end routers is that they tend to be line-rate (e.g. they can always max out the link, even on 10GbE interfaces, because all the decisions they make are done in hardware by ASICs).
On the ports on which you have your ISP uplinks, apply the ACL that used to be on the firewall (e.g. "only allow 80/tcp and 443/tcp going to this particular IP address"). Then have each one of your frontends maintain a BGP session with your router. You can use the excellent OpenBGPD (if your frontends are on OpenBSD) or Quagga. Your router will ECMP the traffic to the frontends that are healthy (because they're maintaining their BGP sessions). The router will also route the traffic out appropriately using PBR.
Refinements
pfsync
.pfsync
will typically double the packet rate on your firewalls.pfsync
.Sample
relayd
configSee also HOWTO at https://calomel.org/relayd.html
Personally I go to simpler, less configurable hardware load balancers at that point - things like Cisco's ACE/ASAs, Foundry ServerIrons, maybe even Zeus ZXTMs (a SW LB that's designed fro very heavy loads).
Perhaps instead of constantly keeping so many open connections to send replies, code your application in such a way so that clients will periodically poll your servers as often as necessary?
Is whatever you are doing actually requires a response this very millisecond or can a client wait 15/20 seconds until the next polling period?
A typical approach would be to create a cluster large enough to handle the required load and to use a SLB which can do deterministic load-balancing (for the case of persistent connections).
Something like CARP uses a hash of the requesting IP to determine which backend web server would handle the request, this should be deterministic but not very useful if there is a firewall or NAT in front of your load balancer.
You may also find something like IPVS useful if you are running on Linux.