Our production web server has gone down a few times over the course of the last half year. In the end, we've needed to contact the web host and have them restart as I'm unable to even SSH in. This appears to only affect the web server and not the MySQL database server which is separate. When it affects the web server, all hosted websites time out.
I'd like to examine web server optimization/corrections to get to the root of this issue. Any recommendations on how to proceed with that? I'm sure log files would play a role. I'm able to find my way around a Linux-based server and make needed changes, but would be interested in any tips I may not have thought of yet. It may be best for us to speak with an outside consultant as another option.
Thanks.
How about having more than one webserver, and balance the load between them, so if one fails, then you could nominate the backup to take over.. I'd say that having such a large single point of failure should be one of the first places to start looking.
This sounds like a classic case of swapping. If you have any metrics/monitoring system at all available check the memory reports (sar, cacti, munin, etc). If not, time to pick one and set it up.
Odds are its the simple case of (number of apache children) x (average memory size of an apache child) > available memory. You can attack this in several ways, first see if you can trim down your php scripts. Don't go nuts, but if there's some simple include/require/classloader fixes you can make you might be able to chop their footprint in half with a quick afternoons worth of profiling work. After that, whatever your average apache child size is do the math to figure out how much would fill up all available ram, then back off ~20% and make that your MaxClients setting.
RPM (resource protection monitor) from RFX would work wonders for you.
In short - (its free) what this does is - it looks and when it sees a system process is using to many processes it will restart or down that process hold it and then restart.
Really nice :-)
Saves most of the webhosts out there much of the time