A remote linux web/db server stopped responding, and the hosting company just rebooted it. What steps should I take to find out what went wrong?
A remote linux web/db server stopped responding, and the hosting company just rebooted it. What steps should I take to find out what went wrong?
There are many reasons for a remote server to stop responding.
1- The server may be overloaded and is too slow to respond.
2- The server may be crashed (system crash caused by a kernel bug, or application crash).
3- For a remote server, the problem can be related to network access. You just can not access the server, but it is still working.
You need first to narrow your search by excluding the obvious reasons by gathering more information.
In what ways it stopped responding? sshd and Apache went down, but the server still responded to ping? A total black out and not even ping? Is this a virtual server or a physical one?
First, if you have some kind of load average/memory/cpu usage graphs available, see them if something odd happened around the crash. Then, read the logs.
If the problem was a software-related one, it's possible that there's something telling about it in some log file. Maybe it was a botnet attacking your web server and flooded it with HTTP requests -- maybe some other process, say a one run from cron, went bonkers. For example, if you see that kernel has logged Out of memory messages and tells you about OOM killer, then some process had tried to eat all the available memory and kernel shot the process down. Most of the time OOM killer will shoot only the actual bastard process, but occasionally processes like sshd can be shot down, too.
On the other hand, if the server just suddenly stopped working and there was no warnings whatsoever, it might have been a hardware hick-up. Servers can sometimes crash, too. If that was the first time and your server has been very reliable up to this point, don't lose your sleep just yet.
But, if this happens again soon, you'll need to take action. If there's some kind of interface where you can monitor the server hardware, or if your hosting company can check that interface, see if all the fans are operating correctly, that the server is operating at some tolerable temperature, and check that there are no error messages about the hardware.
If the hardware is OK but you see a kernel crash in logs, make sure your Linux distribution is up-to-date.
Sorry, I'm unable to help you any farther. One-and-a-half line long question is not very verbose one.