I've seen people recommend combining all of these in a flow, but they seem to have lots of overlapping features so I'd like to dig in to why you might want to pass through 3 different programs before hitting your actual web server.
nginx:
- ssl: yes
- compress: yes
- cache: yes
- backend pool: yes
varnish:
- ssl: no (stunnel?)
- compress: ?
- cache: yes (primary feature)
- backend pool: yes
haproxy:
- ssl: no (stunnel)
- compress: ?
- cache: no
- backend pool: yes (primary feature)
Is the intent of chaining all of these in front of your main web servers just to gain some of their primary feature benefits?
It seems quite fragile to have so many daemons stream together doing similar things.
What is your deployment and ordering preference and why?
Simply put..
HaProxy is the best opensource loadbalancer on the market.
Varnish is the best opensource static file cacher on the market.
Nginx is the best opensource webserver on the market.
(of course this is my and many other peoples opinion)
But generally, not all queries go through the entire stack.
Everything goes through haproxy and nginx/multiple nginx's.
The only difference is you "bolt" on varnish for static requests.
Overall, this model fits a scalable and growing architecture (take haproxy out if you don't have multiple servers)
Hope this helps :D
Note: I'll actually also introduce Pound for SSL queries as well :D
You can have a server dedicated to decrypting SSL requests, and passing out standard requests to the backend stack :D (It makes the whole stack run quicker and simpler)
Foreword
Update in 2016. Things are evolving, all servers are getting better, they all support SSL and the web is more amazing than ever.
Unless stated, the following is targeted toward professionals in business and start-ups, supporting thousands to millions of users.
These tools and architectures require a lot of users/hardware/money. You can try this at a home lab or to run a blog but that doesn't make much sense.
As a general rule, remember that you want to keep it simple. Every middleware appended is another critical piece of middleware to maintain. Perfection is not achieved when there is nothing to add but when there is nothing left to remove.
Some Common and Interesting Deployments
HAProxy (balancing) + nginx (php application + caching)
The webserver is nginx running php. When nginx is already there it might as well handle the caching and redirections.
HAProxy (balancing) + Varnish (caching) + Tomcat (Java application)
HAProxy can redirect to Varnish based on the request URI (*.jpg *.css *.js).
HAProxy (balancing) + nginx (SSL to the host and caching) + Webserver (application)
The webservers don't speak SSL even though EVERYONE MUST SPEAK SSL (especially this HAProxy-WebServer link with private user information going through EC2). Adding a local nginx allows to bring SSL up to the host. Once nginx is there it might as well do some caching and URL rewriting.
Note: Port redirection 443:8080 is happening but is not part of the features. There is no point in doing port redirection. The load balancer could speak directly to webserver:8080.
Middleware
HAProxy: THE load balancer
Main Features:
Similar Alternatives: nginx (multi-purpose web-server configurable as a load balancer)
Different Alternatives: Cloud (Amazon ELB, Google load balancer), Hardware (F5, fortinet, citrix netscaler), Other&Worldwide (DNS, anycast, CloudFlare)
What does HAProxy do and when do you HAVE TO use it?
Whenever you need load balancing. HAProxy is the go to solution.
Except when you want very cheap OR quick & dirty OR you don't have the skills available, then you may use an ELB :D
Except when you're in banking/government/similar requiring to use your own datacenter with hard requirements (dedicated infrastructure, dependable failover, 2 layers of firewall, auditing stuff, SLA to pay x% per minute of downtime, all in one) then you may put 2 F5 on top of the rack containing your 30 application servers.
Except when you want to go past 100k HTTP(S) [and multi-sites], then you MUST have multiples HAProxy with a layer of [global] load balancing in front of them (cloudflare, DNS, anycast). Theoretically, the global balancer could talk straight to the webservers allowing to ditch HAProxy. Usually however, you SHOULD keep HAProxy(s) as the public entry point(s) to your datacenter and tune advanced options to balance fairly across hosts and minimize variance.
Personal Opinion: A small, contained, open source project, entirely dedicated to ONE TRUE PURPOSE. Among the easiest configuration (ONE file), most useful and most reliable open source software I have came across in my life.
Nginx: Apache that doesn't suck
Main Features:
Similar Alternatives: Apache, Lighttpd, Tomcat, Gunicorn...
Apache was the de-facto web server, also known as a giant clusterfuck of dozens modules and thousands lines
httpd.conf
on top of a broken request processing architecture. nginx redo all of that, with less modules, (slightly) simpler configuration and a better core architecture.What does nginx do and when do you HAVE TO use it?
A webserver is intended to run applications. When your application is developped to run on nginx, you already have nginx and you may as well use all its features.
Except when your application is not intended to run on nginx and nginx is nowhere to be found in your stack (Java shop anyone?) then there is little point in nginx. The webservers features are likely to exist in your current webserver and the other tasks are better handled by the appropriate dedicated tool (HAProxy/Varnish/CDN).
Except when your webserver/application is lacking features, hard to configure and/or you'd rather die job than look at it (Gunicorn anyone?), then you may put an nginx in front (i.e. locally on each node) to perform URL rewriting, send 301 redirections, enforce access control, provide SSL encryption, and edit HTTP headers on-the-fly. [These are the features expected from a webserver]
Varnish: THE caching server
Main Features:
Similar Alternatives: nginx (multi-purpose web-server configurable as a caching server)
Different Alternatives: CDN (Akamai, Amazon CloudFront, CloudFlare), Hardware (F5, Fortinet, Citrix Netscaler)
What does Varnish do and when do you HAVE TO use it?
It does caching, only caching. It's usually not worth the effort and it's a waste of time. Try CDN instead. Be aware that caching is the last thing you should care about when running a website.
Except when you're running a website exclusively about pictures or videos then you should look into CDN thoroughly and think about caching seriously.
Except when you're forced to use your own hardware in your own datacenter (CDN ain't an option) and your webservers are terrible at delivering static files (adding more webservers ain't helping) then Varnish is the last resort.
Except when you have a site with mostly-static-yet-complex-dynamically-generated-content (see the following paragraphs) then Varnish can save a lot of processing power on your webservers.
Static caching is overrated in 2016
Caching is almost configuration free, money free, and time free. Just subscribe to CloudFlare, or CloudFront or Akamai or MaxCDN. The time it takes me to write this line is longer that the time it takes to setup caching AND the beer I am holding in my hand is more expensive than the median CloudFlare subscription.
All these services work out of the box for static *.css *.js *.png and more. In fact, they mostly honour the
Cache-Control
directive in the HTTP header. The first step of caching is to configure your webservers to send proper cache directives. Doesn't matter what CDN, what Varnish, what browser is in the middle.Performance Considerations
Varnish was created at a time when the average web servers was choking to serve a cat picture on a blog. Nowadays a single instance of the average modern multi-threaded asynchronous buzzword-driven webserver can reliably deliver kittens to an entire country. Courtesy of
sendfile()
.I did some quick performance testing for the last project I worked on. A single tomcat instances could serve 21 000 to 33 000 static files per second over HTTP (testing files from 20B to 12kB with varying HTTP/client connections count). The sustained outbound traffic is beyond 2.4 Gb/s. Production will only have 1 Gb/s interfaces. Can't do better than the hardware, no point in even trying Varnish.
Caching Complex Changing Dynamic Content
CDN and caching servers usually ignore URL with parameters like
?article=1843
, they ignore any request with sessions cookies or authenticated users, and they ignore most MIME types including theapplication/json
from/api/article/1843/info
. There are configuration options available but usually not fine grained, rather "all or nothing".Varnish can have custom complex rules (see VCL) to define what is cachable and what is not. These rules can cache specific content by URI, headers and current user session cookie and MIME type and content ALL TOGETHER. That can save a lot of processing power on webservers for some very specific load pattern. That's when Varnish is handy and AWESOME.
Conclusion
It took me a while to understand all these pieces, when to use them and how they fit together. Hope this can help you.
That turns out to be quite long (6 hours to write. OMG! :O). Maybe I should start a blog or a book about that. Fun fact: There doesn't seem to be a limit on answer's length.
It's true that the 3 tools share common features. Most setups are fine with any combination of 2 among the 3. It depends what their main purpose is. It's common to accept to sacrifice some caching if you know your application server is fast on statics (eg: nginx). It's common to sacrifice some load balancing features if you're going to install tens or hundreds of servers and don't care about getting the most out of them nor about troubleshooting issues. It's common to sacrifice some web server features if you're intending to run a distributed application with many components everywhere. Still, some people build interesting farms with all of them.
You should keep in mind that you're talking about 3 solid products. Generally you won't need to load balance them. If you need front SSL, then nginx first as a reverse-proxy is fine. If you don't need that, then varnish on the front is fine. Then you can put haproxy to load balance your apps. Sometimes, you'll like to also switch to different server farms on the haproxy itself, depending on file types or paths.
Sometimes you'll have to protect against heavy DDoS attacks, and haproxy in front will be more suited than the other ones.
In general, you should not worry about what compromise to do between your choices. You should choose how to assemble them to get the best flexibility for your needs now and to come. Even if you stack several of them multiple times it may sometimes be right depending on your needs.
Hoping this helps!
All other answers are pre-2010, hence adding an updated comparison.
Nginx
Varnish
Haproxy
So the best method seems to be implementing all of them in an appropriate order.
However, for general purpose, Nginx is best as you get above-average performance for all: Caching, Reverse proxying, Load balancing, with very little overhead on resource utilization. And then you have SSL and full web server features.
Varnish has support for load balancing: http://www.varnish-cache.org/trac/wiki/LoadBalancing
Nginx has support for load balancing: http://wiki.nginx.org/NginxHttpUpstreamModule
I would simply configure this with varnish + stunnel. If I needed nginx for some other reason, I would just use nginx + varnish. You can have nginx accept SSL connections and proxy them to varnish, then have varnish talk to nginx via http.
Some people may throw nginx (or Apache) into the mix because these are somewhat more general purpose tools than Varnish. For example, if you want to transform content (e.g., using XDV, apache filters, etc) at the proxy layer you would need one of these, because Varnish can't do that by itself. Some people may just be more familiar with the configuration of these tools, so it's easier to use Varnish as a simple cache and do the load balancing at another layer because they're already familiar with Apache/nginx/haproxy as a load balancer.