I have 5 servers, all with similar hardware (i7, four 2tb 7200rpm drives, two 4tb 5400rpm drives, 430 watt power supply), and lately the machines have been freezing up. This has gotten worse in the last day or so, and I can't pinpoint any explanation. One recent change was adding the two 4tb hard drives. The crashes happen most often while running a large Hadoop job, so I was originally thinking the load was causing some issues, but last night one server just froze without any heavy load on the box (or so I think), other than HDFS (Hadoop's distributed file system) was probably rebalancing itself since two of the five nodes were offline.
If I plugin a monitor and keyboard to one of these frozen machines, I can't get any response or feedback on the screen.
Any ideas on possible points of failure and/or different logs I can look at to investigate? Thanks
Edit: The systems are running Ubuntu 10.04
Edit 2: More on hardware:
- intel core i7-930 bloomfield 2.8ghz processor (quad core)
- 12gb (6 x 2gb) kingston ddr3 1333 ram
- antec earthwatts green 430 power supply
- msi x58m lga 1366 motherboard
Edit 3: I pulled the two 4TB hard drives out temporarily to see if it helped with the crashing, and so far the servers are staying up, even under heavy Hadoop load. I will try the power meter soon to confirm if they are drawing too much power.
How many horse powers does my car need? I recently added 50 kg weight.
You see the problem? We do not know. YOU tell us.
For example you talk so much about hard discs and Ubuntu, but do not say memory (uses power) and PROCESSOR. 430 watt is WAY too little for a high end processor - but may work for an atom. It may even work for a single processor, but not a dual - but you never tell us what you have.
Also, have you thought of just plugging a power meter in for a server? They are cheap and may tell you how much power you draw. I would just get a 15 USD Power meter and find out.
And yes, an overtaxes power supply can destabilize your server.