I need to find a solution for a website which is struggling under load. The site gets ~500 simultaneous connections during peak time, and counts around 42k hits per day.
It's a wordpress based site bridged with a vbulletin forum with a lot of contents and a fairly complex structure which makes intensive use of the database. I already implemented code level full page caching (without this the server just crashes), and configured all other caching directives as well as combining css files and the like to limit http requests as much as possible.
I need to understand if there is more that can be done via software or if the load is just too much for the server to handle and it needs to be upgraded, because the server goes down occasionally during peak times.
Can't access the server now, but it's a dedicated CentOS machine (I think 4GB ram, can't say what CPU) running apache/mysql.
So back to the main question: how can I know when the users are just too many?
EDIT
I got access to the logs, according to error.log
during yesterday's down it was apache segfaulting:
[Mon Apr 19 18:26:51 2010] [notice] child pid 4825 exit signal Segmentation fault (11)
[Mon Apr 19 18:26:53 2010] [notice] child pid 4794 exit signal Segmentation fault (11)
[Mon Apr 19 18:27:08 2010] [notice] child pid 4595 exit signal Segmentation fault (11)
[Mon Apr 19 18:27:11 2010] [notice] child pid 4826 exit signal Segmentation fault (11)
.....
How can I tell what's the cause of this segfault?
It's too many when you can't find something to optimize. Try to see if you're CPU or I/O bound when the load is high - this determines where to look next. If it's MySql slowing you down, you might gain something by carefully examining the database - e.g. create indizes or reorganize how/where the data is stored. Ideally database content is served from memory though.
When you're CPU-bound do determine which process is maxing out. If it's Apache/PHP, determine which part of your application create the highest load. It might be the bulletin board, the blog etc.
You might also want to look for things like open connections, network throughput etc. Also see if you gain something by serving static content from a different position or in a different way.
It is completely dependent on the software and configuration of the system. For a long time stackoverflow.com served many times those view/connection numbers from a single server. I believe the entire trilogy is still only running only using a 2-3 servers.
If you can mondify the application you may be able to optimize it. You may need to optimize the configuration of the database server. If you can't do either then it may be easier to just throw hardware at the problem. Additional hardware won't help if your application isn't designed to scale to multiple servers though.
I have a Vbulletin 4.0 based forum with 100k hits per day on a CentOS machine. It's 2xIntel(R) Xeon(R) CPU E5205 @ 1.86GHz, 8Gb RAM. Server load with 1k users online is around 2.0-3.0. I didn't care to optimize it too much.
But first you need to try is to DISABLE all vbulletin and wordpress plugins. You can do that via admin control panel both in WP and VB. And after that see if server will run ok. If it will, I guess one or more of your plugins are badly written. Most likely contain unoptimized queries.
If you have VBulletin 4.x with many users and concurrent database access - consider moving to InnoDb. This will improve concurrency. Install memcache and connect it to VB. This will lighten DB.
Try to install WP SuperCache to make static pages served as static files. Also consider moving from Apache to nginx, lighttpd or something more lightweight.