Assuming the system is a red hat variant, x86 architecture. Assume no cooling issues.
Is it possible for a very high load on the machine to cause it to reboot?
I understand that a machine make become unresponsive, certainly. But can it actually reboot?
If so, how does this occur?
Not enough detail here...
But maybe, maybe not... This depends on the nature of the load and what's producing it. A high load on its own will not cause a system to reboot, but may be indicative of some other major issue that could cause an unplanned shutdown. E.g. a high-transaction mailserver or database server running a load of 80 is far different than a system whose RAID controller locks-up.
The easiest example could be storage. An instant rise in load following the loss of storage connectivity or a RAID controller malfunction could easily push the system load to 100+ on a busy system. The system may remain pingable and usable to some extent, but I/O operations could fail. Certain commands may stop working even though the TCP/IP stack is in memory and available.
So it's possible to kernel panic in this condition, or for the system or applications to stall. On quality hardware, there may be a watchdog timer that warm-boots the server. HP and the Automatic Server Recovery (ASR) feature or VMware's HA virtual machine monitoring could take this action.
It certainly could, but automatic reboot is usually associated with hardware/software issues such as overheating or kernel bugs. So it is possible that heavy load causes overheat which leads to reboot. In any event, you should investigate the log or kernel dumps to find the exact cause.
I think the answer is that no, high load on its own will NOT cause a system reboot. It will be a hardware issue or software issue of some description. Even if it always fails under high load it will be the high load triggering some other issue. I know this isn't much of a help but it does answer the question I guess :-)
A system soft reboots when 'reboot' executed or the equivalent syscall is called. If you don't have any kind of watchdog in place that triggers a reboot it won't happen.
But certainly and kind of hard reboot/reset can happen because of hardware issues.
On a sane system a reboot won't happen because of high load. Take a look at dmesg, /var/log/messages to track the problem down.
I have had that happen to me, several time.
I got three different categories of problem with a load higher than what the machine is designed for:
In this case, the system is bugged down because it has to swap memory to disk, quickly, back and forth. This will make the server unresponsive until the issue is resolved. If you do not need the server for a while, it may finally come back to normal. If it runs out of memory, you may enter problem #2 or the kernel finally decides to kill a process (maybe because a
malloc()
returnedNULL
and the programmer did not check that case and you get a SEGV...)This is what I would call the usual result for a load that reaches the critical point of your kernel. A place where the kernel cannot even allocate a buffer of memory for itself. This is rare if you have a large (enough) swap file, but it could be that your processes allocate more and more memory non-stop. (As a developer that happens once in a while in my own code, if I do not catch it soon enough, I will have to force a reboot because I won't be able to stop the process and release the memory... IRIX had something to auto-kill such rogue processes, which I thought was really cool.)
Now I had two cases of auto-reboot. In one case I was using a VPS at some company (a while back) and when you were trying to use too much memory, the VPS system would kill the whole machine! So your computer would forcibly be turned off. I still see similar behavior on other VPSes. However, modern ones are more likely to have their kernel kill a process because it requested too much memory. So that process would be down. The VPS itself would still be running... but be rather useless (no daemons running on it...)
On my hardware, I have had that auto-reboot problem. Usually because of two reasons: overload or accessing a piece of hardware either incorrectly (bogus software) or too quickly (which could be viewed as incorrectly too, I guess...) So I had a computer that would just reboot once in a while if my load got too high for too long. I have no clue why it would happen, but I got a different computer since then and did not experience the problem again.
And I also had other auto-reboot where accessing the video board "incorrectly" would somehow send a "hardware" reset to the motherboard. That also results in an auto-reboot. If anything on your computer does such (maybe because of a "slight" incompatibility with a driver) then it could auto-reboot that way too...
On a system with no 'watchdog' software, the most likely causes for spontaneous reboot are hardware related, namely heat issues or power issues in a component. Modern hardware often has emergency power off if the internal sensors report temperatures past a certain point. Power issues in a component can trigger the power supply to reset (more likely you'll blow a fuse or capacitor) or they can cause heat issues which is back to the first source.
Like other answers have noted, high load can trigger these situations. Most likely the source of the problem will be a component which has not completely failed but does not perform to full specifications. E.g. a CPU cooler that doesn't cool enough. (You had one job....)
Linux tends to panic more than spontaneous reboot, if the issue is software, etc. Leaving you with a nice screen full of data that you can search for to get a clue as to where a problem might be. Check all your logs.
My experience says to check hardware, specifically heat related. Find monitoring software for your hardware. Make sure the software writes a log. Run a heavy load. Look for spikes coinciding with the shutdown. More than likely it will be peaking just before the reset, or still rising at reset