[Updated to be much more consise!]
I'm new to the world of optimising web servers for lots of traffic, but that's now something I'm getting into.
On Monday our webserver was 'slashdotted' - we got a massive surge of traffic (85,000 visitors in an hour or so) and even though we run Varnish and nginx (which are doing their job properly) the Apache side of things really struggled as there is some dynamic content that gets generated on some requests.
Server currently has 8GB of RAM, it's being upgraded to 32GB very soon, so really I need configuration help for a 32GB system. Currently running 64bit centos.
I've investigated the varnish and nginx setup, they're fine - sensible settings (static content dished out directly by nginx, lots of dynamic stuff dished out by varnish and if it's not in varnish the request is passed through to apache.)
So on to Apache.. we're using the MPM prefork module, each apache process seems to use up a LOT of ram:
Top 3:
S 48 20961 2965 0 75 0 187128 128307 ? ? 00:05:25 httpd
S 48 20959 2965 0 75 0 249788 143435 ? ? 00:05:55 httpd
S 48 18581 2965 0 75 0 314564 157747 ? ? 00:06:40 httpd
Bottom 3:
S 0 2965 1 0 78 0 15132 89017 stext ? 00:00:00 httpd
S 48 20947 2965 0 75 0 38492 93001 ? ? 00:00:00 httpd
S 48 20945 2965 0 75 0 43300 93897 ? ? 00:00:01 httpd
I'm not entirely sure, but I think that one process = one client = one connection to one persons browser. I guess my first question is can someone confirm that? Yes, we're running php and zend framework, MySQL as a database backend on the same server.
Current configuration (server is currently has 8GB ram):
MaxKeepAliveRequests 100
KeepAliveTimeout 2
<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 200
MaxClients 200
MaxRequestsPerChild 500
</IfModule>
My thinking is currently Apache can theoretically try and use up about 63GB of ram in a worst case scenario with this configuration. Eg 315MB process * 200 maxclients = lots of ram. I'm not entirely sure it works like that, but if someone can confirm that too it'd help!
What I'd like to do is get some advice on what sort of things I should look out for - we want the server to be able to handle another surge of requests at any time and make use of all the new RAM we're getting. I'll get onto optimising MySQL in another question if I can't figure it out myself, but here's the conf just in case that makes a difference: http://pastebin.com/GbJU7AxY
Cheers a lot! John.
What exactly is that 85,000/day visitors number? Unique visitors, total HTTP hits, something else?
Varnish should be able to handle thousands of hits per second, with little CPU and memory, as long as the requests can be served from cache. Especially when Slashdotted, since most people will be looking for the exact same content. It does require finetuning though. It's quite conservative by default, since it doesn't know much about the content that passes through. It makes its descisions based on the headers it sees, and a simple ruleset. For example, by default Varnish caches objects for 2 minutes, but only if no cookies are present in the request, and the object's TTL is >0, and... etc. Check the default VCL (specifically vcl_recv and vcl_fetch) to determine the default logic, and make sure you understand it.
So, a single Google Analytics cookie set on your domain causes all requests to be passed to the backend, even though GA cookies are not process by your backend server but by Google's javascript. A WordPress application sets all kinds of cookies, most of which are only applicable to dynamic content, which get returned by the browser on every single request. If your page contains 49 static assets and 1 dynamic page, that means none of those static assets will be cached because the requests contain a cookie that you don't care about. Only the cookie on the dynamic request should get through. A mistake like that essentially disables Varnish. Also, the various cache-control (and related) HTTP headers that your code returns are important. If your application claims that the object that Varnish retrieves from the backend is already expired, for example with an Expires header in the past, Varnish will not cache that object.
In other words, you'll need to adjust your application to emit the correct headers, so the clients (both Varnish and the browser) are allowed to cache the returned content. Anything you can't correct in your application, you can override in Varnish's VCL.
For example, here is my code to remove various client-side tracking cookies from reaching the server. This belongs in
vcl_recv
:Similarly I remove incoming cookies to certain paths, to make those requests cacheable:
I use a similar stanza in vcl_fetch, with
unset beresp.http.cookie;
instead, to prevent the backend from setting any cookies on paths I don't want.You could add some debug headers that give you info about how varnish processed requests. View those with Firebug, and you'll understand a lot more about your app. Another good source of information is the Varnish Book. For example, see: https://www.varnish-software.com/static/book/VCL_Basics.html
Most of our dynamic content is cached for 60s, which is enough to fend off a stampede. If you require some individual content, but most of the content on your pages is quite static, look into Varnish's ESI (edge-side-includes) which allows you to specify different cache-TTL's for different parts of the page.
Now that you've reduced the backend requests to the bare minimum, optimize those requests. Profile your application, find and fix problems.
You're correct that:
MaxClients x (maximum physical memory per Apache process) = (total memory Apache can use)
That's physical memory though, not the virtual memory that you mentioned. In top, the res column shows the physical memory used by each process. Each Apache process grows to the biggest scripts your site runs. Limit MaxClients to the amount your server can handle. It makes no sense to accept requests that you don't have the resources for. As soon as you start swapping, you've lost. Increase the amount of processes (Servers) that Apache preforks, since forking is a heavy operation that you want to do when it's already busy. The ServerLimit line is redundant. Either disable KeepAlive or set it to 1-2 seconds.
If you serve many static assets, consider switching from mod_php to PHP-FPM. This keeps your Apache processes light-weight.