We have a Redhat Enterprise server (not set up by us, but we have no reason to assume its not a stock installation) that is restarting every few weeks for no apparent reason whatsoever. On previous occasions there was nothing in the log file which suggested it just died without warning.
We just discovered that the server has been down since Saturday, and the logs seem to indicate an orderly shutdown:
Dec 19 14:23:38 SKUNK1 shutdown: shutting down for system halt
The problem is that we have no idea why it shutdown, and are pretty sure that it wasn't anyone doing it deliberately.
Can anyone suggest why this might be occurring, and how we can diagnose it?
See this question. The last time I saw something like this, it was a broken motherboard sensor thinking that there was an overtemp and was shutting the computer down to protect it.
https://bugzilla.redhat.com/show_bug.cgi?id=459043
or you can check the /etc/inittab
One possibility worth taking a look at is the 'action' settings in the /etc/auditd.conf (specifically admin_space_left_action, space_left_action & disk_full_action) if any of these are set to 'halt', then you may be running into the a disk space threshold in which the box is halting itself because it is running low on space to record the log files.
If this is indeed the problem, you will either need to free up additional space, alter the threshold values in the auditd.conf file or alter the behavior action to something other then halt.