I've got a webserver on a Linode 1024 VPS based on
- Ubuntu 11.10
- Nginx 1.0.5
- PHP 5.3.6 (with PHP-FPM, APC)
- Varnish 3.0.2
And a couple of blogs there based on WordPress 3.3.1. One of them is a plain blog, with the default config, theme and just the "Hello World" post, to test the server. The other one is a blog cloned from other server with almost 10k posts and over 10k comments. This blog has aroung 5k uniques per day.
The server gives good numbers on an ab test for the test blog, But the same test with the cloned blog is impossible to do: the ab test loads the server too much, and I have to stop the process, which anyway makes ab to show this really poor result.
The htop shows also a "normal" load when in normal operation, but anormal big load during the ab test.
There's another strange thing happening (the most important for me): the Time To First Byte is extremely high, but after that wait the site loads really fast. This can be easily tested with services such as tools.pingdom.com, which gives this result. Please pay attention to that yellow region that means "Wait time".
Why is this happening? Possible ideas:
- Bad PHP-FPM config
- Linode DNS response time is awful. Nonsense -the test blog resolves DNS fine, TTFB is fantastic
- Bad Nginx config
In case someone needs more info,
- Here you've got the current cloned blog nginx config file (/etc/nginx/sites-available/muycomputerpro.com)
- Here you've got the current my.cnf config (/etc/mysql/my.cnf) (I know, for the moment not caching, this hasn't make a difference on TTFB on the past)
- Here you've got the current PHP-FPM config (/etc/php5/fpm/pool.d/www.conf)
Firstly, this is not an answer, so much as a diagnostic approach.
This is by no means comprehensive - or even anything close, it is just a starting point.
Time to First Byte
Time to first byte (TTFB) has a number of components:
When you look at an ApacheBench output, you also see:
Comparisons to Eliminate components
With few exceptions, your problem is going to lie in the backend processing, which usually comes down to overly complex/inefficient code, or poorly configured MySQL.
A good way to approach this problem is through a series of comparisons that will eliminate various aspects of your setup. A good comparison should keep as much constant as possible to help narrow down the problem. Currently, you have provided the following comparisons:
The ideal test would have you duplicate your full site, but then delete all the content except for one article and the associated comments. The point of this test would be to conclusively determine if the large amount of content is the problem or if other aspects of your setup (wordpress plugins, theme, etc) are the cause. You would essentially compare the performance of identical sites, on the same (new) server - loading the same page (same length, etc) - with the only difference being the total site content (e.g. there is a good chance that some plugin does not scale well with increased content).
Without changing anything, there are some other comparisons you can do:
Tuning your Backend
By this point you should have either found the problem or concluded that it lies in your backend. That leaves you Nginx, PHP, or MySQL.
(I should mention here, that is it always handy to know if your bottleneck is CPU, RAM, or I/O - between
sar
,top
,iostat
,vmstat
,free
, etc you should be able to come to some conclusion on this.)Nginx
Nginx is just taking requests and either serving static content or shifting the requests to PHP-FPM - there usually isn't much to optimize with Nginx.
Ideally, your test blog and cloned blog have identical configs, in which case, you have effectively eliminated Nginx as the problem.
Application
In the case where you are trying to identify a problem in your code (for instance a slow plugin, etc) the slow logs are the place to start.
MySQL
PHP
PHP-FPM
It is worth noting that your htop results show php-fpm as consuming the bulk of the CPU - and your problem does appear to be directly related to this.
Caching
Once you have optimized each likely bottleneck, start caching.
Sometimes, given the limitations of your application and hardware, you may not be able to improve backend performance that much - however, that is the point of caching - to minimize the use of the backend.
Further reading