I currently have one webserver which runs my websites, all requests go straight there. The Webserver runs ruby on rails and everything is going fine, but as my sites grow, I will need to either get a bigger server, or scale out with more servers handling the extra load.
I would like to go with the second scenario. Instead of having one huge server, I would like two or three smaller, cheaper servers. This is how I think this should be done:
All domains point to x.x.x.30 (HAProxy.) When HAProxy receives a GET request, it sends this request to the least busy webserver available. The webserver directly responds to the client. With this setup I could easily grow by adding web servers at any time and quickly fix any problems by pulling problem web servers off the cluster.
x.x.x.30 <-- HAProxy
x.x.x.31 <-- webserver1: Rails/Passenger3
x.x.x.32 <-- webserver2: Rails/Passenger3
Am I correct in my understanding of this setup?
Yup - I think you have the basics down pat:
What is HAProxy:
It’s is proxy — and only a proxy that works with anything TCP-based - not just HTTP. It does not serve files - it simply just proxies.
Why HAProxy:
HAProxy comes with Plenty of load-balancing algorithms, including a “least connections” strategy that picks the backend with the fewest pending connections.
Backends can be sanity- and health-checked by URL to avoid routing requests to brain-damaged backends. (It can even stagger these checks to avoid spikes.)
A dedicated status page gives you backend status, uptime and lots of really nice metrics. Requests can be routed based on all sorts of things: cookies, URL substrings, client IP, etc.
How to setup HAProxy:
Check here for some additional examples and details
You mostly have it right, but one detail you kind of glossed over is the response part. The setup you describe is called "Direct Routing" meaning that the packets come in to the load-balancer, get forwarded on to a back-end server, and that server replies directly to the client without passing the traffic back through the load-balancer.
In order for DR to work, the load-balancer IP address needs to ALSO exist on all the web servers. However, you need to disable ARP responses for those addresses on the other servers. Here is a reference to a discussion of this on CentOS.
The other thing to remember is that this proxy can now become a bottle-neck or a single point of failure. Usually when these load balancing proxies are set up, they are set up as a High Availability cluster of machines so that maintenance can be done on one or one can fail without taking the whole site down.
Beyond DR, you can also do NAT, where the web servers use the load-balancer as the gateway, and the web servers have a private IP, which the load balancer translates to and from the public IPs. This is generally easier to configure, because you don't need to worry about the ARP issues or asymmetric routing, etc...
Finally, one method not frequently spoken about is using the iptables CLUSTERIP module. This module blocks traffic based on the remote IP address or IP address and port number, and an idea of what node in the cluster it is running on. So, you'd configure the IP address on all the machines as an alias, and configure CLUSTERIP. It uses a hash of the connection information so that every machine in the cluster agrees which node will handle it. Packets are blocked on the nodes that don't handle it, and are accepted by the node that does. This is done using a multicast MAC address at the low level.
This works great because you don't have a dedicated load-balancer to fail. However, it is somewhat primitive and obviously cannot do load-balancing based on URLs or other "layer 7" information, only based on IP address.
Here's an article I wrote on how to set up CLUSTERIP last year.
There are many choices and we use them all, depending on the situation. They all have their strengths and weaknesses, depending on exactly your situation and goals and level of experience.