I have a Ubuntu VPS that has recently begun to be non-responsive through full processor utilisation.
Unfortunately, I'm at a loss as to what is causing this and I was looking for some pointers that I can use to determine what is causing the problem so I can fix it.
- I don't know what was running at the time it locked up, but are there some ways I can figure that out?
- What procedures/logging can I put into place to be able to diagnose the issue the next time it happens?
At my company, we have a simple cron script on each server to check on the load averages. If the load average starts to climb past a certain point, it sends us an email, so we can then log into the server and look for the offending process(es).
top
would be the first command I entered.I believe we check the 5 minute load average, but if you find your server is getting slammed quickly, you may need your script to watch the 1 minute load average.