I've been doing web development for some time, but server admin has never been part of my job. I'm finding that more and more I need to learn about what is going on under the hood. Looking at server logs and stuff, I'm seeing some interesting things and learning a lot, but I just don't know what a "normal" server looks like, so it's hard for me to judge how our machine is doing by comparison.
We are running Apache (2.0.54) on Red Hat with JRun (4.0). Most of our content is ColdFusion, with a little PHP thrown in. Google analytics says we have about 1,500,000 page views a month and Apache typically reports 5-7 requests per second.
If I do top
on the server it says it's been running for over 700 days. But our Apache instance crashes regularly. Uptime of 3 days would be a long time for our server. I'd say 48 hours would be normal before Apache needs to be rebooted.
I'm keen to hear others say from their experience if that is good or bad. I suspect bad, but I don't have anything to compare it to.
And if this is bad, can anyone point me in the direction of some online resources where I can start learning how to fix this?
Apache will usually not crash or need to be restarted for most scenarios. Regular deaths suggest an issue with a module, configuration issue or resource problem. A good start would be the (error-) logs of apache, they usually reside in /var/log/apache or similar (see server config).
Also, Apache 2.0.54 is quite outdated, you should update this as soon as possible. Try checking with the changelog if one of the fixes might apply to your environment.
If your server uptime is 700 days, then you probably haven't updated your kernel in quite a while. Part of a server admin's job (not sure if that's you or somebody else) should be installing necessary updates, so you might want to look into that, too.
As Jan said, Apache should not be crashing regularly. 5-7 requests/second is no big load. Besides error logs, you might also want to look at /var/log/messages at the time of your crash - sometimes I have seen segfaults show up there.
700 days uptime is a lot, and as Jan says apache 2.0.54 isn't exactly recent. Before tracing any apache issues i'd upgrade the system to whatever the current packages for your rhel release are, and reboot. It's highly possible that a bug that's causing this problem has already been fixed.