I have provisioned a server with 8 cores and plan on deploying a network service. For spreading out request load I'd like to run 8 instances of my service. Nothing shocking here. I do not have access to a hardware load balancer. I should mention that I currently have allocated 5 public IP addresses (but I can get more).
Thus, I would like to hear your recommendations for structuring a software load balancing solution.
The obvious choices would be to either:
- use HAProxy; or
- pre-fork my application (like Facebook Tornado and Unicorn both do);
- insert your idea here.
My goals are to:
- spread request load between the service instances; and
- allow for rolling restarts of my service (code upgrades).
I should mention that this is not a HTTP-based service so NGiNX and the like is out.
I do not love HAProxy because of its memory requirements; it seems to require a read and write buffer per client connection. Thus, I would have buffers at the kernel level, in HAProxy, and in my application. This is getting silly! Perhaps I'm missing something in this regard though?
Thanks!
whatever the solution, if you install a process to forward stream data, it will require per-connection buffers. This is because you can't always send all what your received, so you have to keep the excess in a buffer. That said, the memory usage will depend on the number of concurrent connections. One large site is happily running haproxy with default settings at 150000 concurrent connections (4 GB RAM). If you need more than that, version 1.4 lets you adjust the buffer size without recompiling. However, keep in mind that the per-socket kernel buffers will never go below 4kB per direction and per socket, so 16 kB at least per connection. That means that it's pointless to make haproxy run with less than 8 kB per buffer, as it will already consume less than the kernel.
Also, if your service is pure TCP and a proxy has no added value, take a look at network-based solutions such as LVS. It's a lot cheaper as it processes packets and does not need to maintain buffers, so socket buffers will drop packets when they are full, and it can be installed on the same machine as the service.
Edit: Javier, preforked processes relying on the OS to do the load balancing do not sc ale that well at all. The OS wakes every process up when it gets a connection, only one of them gets it and all others go to sleep again. Haproxy in multi-pro cess mode shows its best performance around 4 processes. At 8 processes, perform ance already starts to drop. Apache uses a nice trick against this, it does a lo ck around the accept() so that only one process is waiting for the accept. But t hat kills the load-balancing feature of the OS and stops scaling between 1000 an d 2000 processes. It should use an array of a few locks so that a few processes wake up, but it does not do that.
without any details on your service it's very hard to say; but in general i'd lean to preforking. It's a tried and true server strategy (and not a newfangled trick like some people think after reading the tornado/unicorn fansites).
Beyond that, a few tips:
each preforked process can use modern non-
select
strategies (libevent, mostly) to handle huge amounts of clients.it's very rare that a 1:1 relationship between cores and processes gives optimal performance; it's usually far better do some dynamic adaptability to load.