I have a pair of servers hosting a single Magento ecommerce site with moderate traffic (60k page views per day reported from google analytics, I think about 80k reported on the server itself). The database server runs smoothly and quickly, aside from a rare occasional hiccough, but the apache server has been falling over every so often.
I have set up magento to use the recommended PHP caching (APC), as well as holding its own cache files in a 1.5 gig tmpfs (this tmpfs regularly gets pretty full, and I have a script running to clear cache files when the tmpfs is more than 80% full). I serve most imagery from amazon cloudfront. I recently set up nginx as a reverse proxy to apache (nginx also serves the static files). I have configured apache to the best of my ability - keepalives and hostnamelookups are off, and the prefork is configured as follows:
<IfModule prefork.c>
StartServers 50
MinSpareServers 50
MaxSpareServers 100
ServerLimit 512
MaxClients 256
MaxRequestsPerChild 400
</IfModule>
I've not turned off .htaccess files, and access logging is on. I know there are some modules I can turn off. I'm not sure what effect any of those three changes would have, if any.
The apache server is a VPS with 6 gig of RAM. As of the time of writing the server is reporting load average: 17.77, 18.27, 49.76
, but there's about 2 gig of RAM free. When it goes really bad, the load goes to 120+ and stays there - restarting apache brings the site back up and the load back down.
vmstat
is (while the server is reporting the load above), I think, showing a CPU idle value fluctuating between 0 and 70 or so. iostat
is showing an iowait value between 0 and 0.2%.
I'm a bit stuck. What little I know is telling me that the problem is that the CPU is overloaded as a result of combination of the code being run, and the number of users. But I'm not experienced enough to be certain that that is the problem. If that is the problem, I think the solutions are to either improve the code or to split the site hosting over two VPSes with a load balancer.
So, I guess my questions are:
- What else can I do to find problems or bottlenecks on the server?
- Are there any obvious changes I can make to the server config to improve this?
- Is it a good idea to set an automated system to restart apache when the load goes above a certain level?
- From the above, how likely is it that the site has outgrown the server?
Edit:
I found something weird - /var/spool/mail/root was large ... 38 gig. That sounds ... unhealthy. Could that be the problem?
I normally setup the MaxRequestsPerChild to be in the thousands - usually, nearer 10,000.
You say that you have "the recommended PHP caching" - but do you have APC installed? Finally, how many users do you see hitting the website at the same time. If you have Apache extended stats, you will be able to see how many of the Apache processes are actually in the Running state at a time.
800 APC file hits per second, and another 200 user-cache is a lot. If that is a dual or quad-core, I'd expect it to be keeping up OK though. If the database is genuinely keeping up, then getting a bigger machine - and more CPUs, may be the best thing for it, at least right now.
Magento and Zend Framework are quite CPU-heavy, as you noticed. The best way to avoid the CPU load is simply but rendering any content only once, until it changes. Most parts of your catalog don't change that often, and often only the shopping cart block on your page, or the 'most popular items' block are the only dynamic parts.
I would suggest putting a Varnish cache in front of Apache. This gives you high-performance page-caching that can seriously offload your LAMP stack. We recently survived a very public launch of a website thanks to Varnish and I was seriously impressed by the speed and low cpu-load. Varnish is free, and flexible enough to cache entire pages, or cache only the relatively static parts and include the cart dynamically.
However, Varnish will not cache much on a default Magento installation, since there's a lot of per-user dynamic content, cookies, etc. A Magento module such as 'PageCache powered by Varnish' modifies Magento to work well with Varnish. It also provides a Varnish configuration file that matches the Magento setup. These two together make for a very efficient setup. It's a commercial module, but much more affordable than a more powerful server would be.
The parts your offloading to a CDN or Nginx are not your real problem, although it does help. Even Apache can handle quite a number of static requests. You need to cache the stuff that takes effort to generate again and again, i.e. your dynamic parts.
Your average load is entirely too high for a dual core VPS. 8 should be the max.
I've had good success with using mod_pagespeed and event MPM for Magento. I would recommending switching to using event MPM, and installing mod_pagespeed.
More info about Event MPM: Apache event MPM documentation
And mod_pagespeed: Google Code: mod_pagespeed
If you continue to have load issues even after making the above changes, you may want to consider switching to a different, better VPS plan.
As Alister hints, a MaxRequestsPerChild value of 400 is absurdly low.
The load average is very high - but 60k page views per day is not a lot of traffic.
how many processes do you normally have serving requests?
I'm not familiar with Magento but it looks like there's something wrong with this setup. I would expect that you could get significantly more throughput at a lower load level.
Go get a copy of Steve Souders book and read it. Enable compression for all outgoing HTML content (static and dynamic). And make sure you've got a good caching config. Start logging %D in your access_log file and build some tools for analysing the data / isolating the slowness. Similar for MySQL.
Try mysqltuner.pl and see if it flags up any problems.
I run a similar setup, but with nginx/php-fpm/apc (opcode and fast_backend/memcached(slow_backend). I find php to be the biggest resource problem, probably because magento is either insanely big or just badly coded. Take a closer look on what exactly is eating the resources, could it be php as in my case?
In addition to what Martijn Heemels write, there's an open source varnish module you could try. Check out http://moprea.ro/2011/may/6/magento-performance-optimization-varnish-cache-3/ and https://github.com/madalinoprea/magneto-varnish.
I've only tested it in a test environment, and so far so good.
Do you save sessions in database, or on disk (and if so, on tmpfs)?
When you use VPS you are sharing the CPU. I would recommend you talk to your host to move your VPS to a less busy hardware or go dedicated.
Due to the shared CPU your applications are not able to run on time and keep getting queued building up higher requests to be processed and also the overheads that come with it. Eventually there is a condition where Apache or php or mysql would have maxed out its own limits and those cause problems.
Bottomline is. VPS is basically shared CPU. Your host may be putting too many VPS accounts on the same CPU.
If you want to make full use of the allocated CPU either ask for a better Server with fewer VPS if possible (move host though thats troublesome) or go dedicated.
You could also choose Amazon completely and not worry about nginx using their load balancer which is a few clicks to setup for all your servers under their cloud.
the /var/mail.../root folder is hue means its collecting a lot of emails which come from your applications usually. For e.g. a buggy php script is trying to send email or all cron jobs are configured to email you the status of cron runs and the output. You can look inside the mail and see what the file has. I am guessing its error messages so you can find where its coming from.
I will add more if you need further info and may be i'll need some info too