Currently, I have two load balancers running HAProxy, which forward requests to backends, each of which run Varnish cache in front of nginx. I thought this would distribute most the bandwidth across the backend servers, but my load balancers appear to be using hundreds of gigs of bandwidth a month, which is close to what the backends use. I suppose that makes sense because the traffic is all routed through the load balancers?
My load balancers and backends are located in different parts of the US. Would it be more efficient if I just ran HAProxy and Varnish on frontends, and only nginx on backends? Thanks!
You've got it backwards... put Varnish in front of the load balancers, so that they can answer as many requests as possible early on (if you've got too much traffic for one Varnish to handle, then load balance those with a low-overhead TCP load balancer like ldirectord) then have Varnish pass back to the HAProxy instances and go from there. Having Varnish behind the HAProxies just seems totally backwards to me -- you want to shed as much traffic as early as you can.
To your first question: Yes, in the normal HAProxy configuration all traffic flows through the load balancer both when it comes in to your servers, and when it goes out again from the servers to the clients. This is more or less always so with all load balancers, as they're generally implemented as HTTP proxies or IP level NAT / routing boxes. The exeption is when "direct server return" (DSR) is used, see this inlab.com explanation of what DSR is.
Ehh, why? If you're using geo-loadbalancing or multicast routing then I would not expect you to be asking these questions. In the normal use case you really should have your servers in the same rack, and on a fast collision-free, low-latency LAN. That would make life easier for your server software, and give you more performance from your servers, as well as more consistent / dependable performance characteristics...
The canonical setup for the software you're using would be something like this:
nginx (for HTTP compression) --> Varnish cache (for caching) --> HTTP level load balancer (HAProxy, or nginx, or the Varnish built-in) --> webservers.
Optionally, if your load is high, you could have multiple nginx or varnish servers at the very front; but that's for sites with thousands of request per second.
To your second question When you ask "more efficient", I'm in doubt about what you mean. More efficient as in lower traffic between the servers? Marginally, as the Varnish cache stops some traffic from going further back. More efficient with regards to CPU use -- you can just shuffle the services around to less loaded physical servers, as long as you keep the logical struucture the same.