We have (quite an old, yeah) HP ML115 G5 AMD based server which shuts itself off 10-15s later (during fan test I suppose) after pushing the power button and before BIOS POST single beep appears.
We need some help with remote (200 km) hardware failure diagnosis. Our hardware specification is as follows:
root@linux:~/# dmidecode -t1
# dmidecode 2.12
SMBIOS 2.5 present.
System Information
Manufacturer: HP
Product Name: ProLiant ML115 G5
Serial Number: CZC94743QJ
SKU Number: 470064-894`
root@linux:~/# head -n 30 dmidecode.txt
# dmidecode 2.12
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: HP
Version: O18
Release Date: 07/06/2009
At this point it works stably. I've managed to turn it on by:
- turning the server off,
- removing the power cord for five minutes,
- putting it to the ground on the longer edge, CPU heatsink turned in the direction of ceiling.
If we put it into standard position as it is supposed to stand it doesn't turn on like I wrote at the beginning. Totally reproducible.
Voltage/Temp/Fans stats look okay to me:
root@linux:~/# ipmitool sdr
POST Error | Not Readable | ns
Memory ECC | Not Readable | ns
ACPI State | 0x01 | ok
PCI Reset | 0x00 | ok
CPU Fan | 1048.88 RPM | ok
Rear Fan | 2107.04 RPM | ok
CPU Diode | 26.50 degrees C | ok
Front Ambient | 19 degrees C | ok
System 12V | 11.93 Volts | ok
System 5V | 5.12 Volts | ok
System AUX 5V | 4.98 Volts | ok
System 3.3V | 3.39 Volts | ok
System AUX 3.3V | 3.33 Volts | ok
CPU Vcore | 1.07 Volts | ok
CPU 12V | 11.82 Volts | ok
HT 1.2V | 1.20 Volts | ok
Mem Vcore | 1.81 Volts | ok
MEM VTT | 0.90 Volts | ok
MCP55 1.5V | 1.50 Volts | ok
MCP55 1.4V | 1.40 Volts | ok
Therm-Trip | 0x00 | ok
CPU Prochot | 0x00 | ok
System Reset | 0x00 | ok
NMI | 0x00 | ok
PCI Error | Not Readable | ns
CPU Socket | 0x01 | ok
LO100 Present | 0x00 | ok
Watchdog | Not Readable | ns
IPMI events:
18 | 03/18/2015 | 09:29:46 | Temperature #0x20 | Upper Non-critical going high | Asserted
30 | 03/18/2015 | 09:30:08 | Temperature #0x20 | Upper Critical going high | Asserted
48 | 03/18/2015 | 10:38:59 | Temperature #0x20 | Upper Non-critical going high | Asserted
60 | 03/18/2015 | 10:39:20 | Temperature #0x20 | Upper Critical going high | Asserted
78 | 03/18/2015 | 10:45:26 | Temperature #0x20 | Upper Non-critical going high | Asserted
90 | 03/18/2015 | 10:45:30 | Temperature #0x20 | Upper Non-critical going high | Deasserted
a8 | 03/18/2015 | 10:45:56 | Temperature #0x20 | Upper Non-critical going high | Asserted
c0 | 03/18/2015 | 10:46:12 | Temperature #0x20 | Upper Critical going high | Asserted
d8 | 03/18/2015 | 10:48:42 | Temperature #0x20 | Upper Non-critical going high | Asserted
f0 | 03/18/2015 | 10:48:46 | Temperature #0x20 | Upper Non-critical going high | Deasserted
108 | 03/18/2015 | 10:49:04 | Temperature #0x20 | Upper Non-critical going high | Asserted
120 | 03/18/2015 | 10:49:18 | Temperature #0x20 | Upper Critical going high | Asserted
138 | 03/18/2015 | 10:50:24 | Temperature #0x20 | Upper Non-critical going high | Asserted
150 | 03/18/2015 | 10:50:25 | Temperature #0x20 | Upper Critical going high | Asserted
168 | 03/18/2015 | 10:57:53 | Temperature #0x20 | Upper Non-critical going high | Asserted
180 | 03/18/2015 | 10:57:57 | Temperature #0x20 | Upper Non-critical going high | Deasserted
198 | 03/18/2015 | 10:58:24 | Temperature #0x20 | Upper Non-critical going high | Asserted
1b0 | 03/18/2015 | 10:58:41 | Temperature #0x20 | Upper Critical going high | Asserted
1c8 | 03/18/2015 | 11:14:23 | Temperature #0x20 | Upper Non-critical going high | Asserted
1e0 | 03/18/2015 | 11:15:06 | Temperature #0x20 | Upper Non-critical going high | Deasserted
1f8 | 03/18/2015 | 11:16:33 | Temperature #0x20 | Upper Non-critical going high | Asserted
210 | 03/18/2015 | 11:16:33 | Temperature #0x20 | Upper Critical going high | Asserted
228 | 03/18/2015 | 11:49:12 | Temperature #0x20 | Upper Non-critical going high | Asserted
240 | 03/18/2015 | 11:49:18 | Temperature #0x20 | Upper Non-critical going high | Deasserted
258 | 03/18/2015 | 11:55:45 | Temperature #0x20 | Upper Non-critical going high | Asserted
270 | 03/18/2015 | 11:55:46 | Temperature #0x20 | Upper Non-critical going high | Deasserted
288 | 03/18/2015 | 11:56:32 | Temperature #0x20 | Upper Non-critical going high | Asserted
2a0 | 03/18/2015 | 11:57:06 | Temperature #0x20 | Upper Critical going high | Asserted
2b8 | 03/18/2015 | 12:00:11 | Temperature #0x20 | Upper Non-critical going high | Asserted
2d0 | 03/18/2015 | 12:00:14 | Temperature #0x20 | Upper Non-critical going high | Deasserted
2e8 | 03/18/2015 | 12:00:59 | Temperature #0x20 | Upper Non-critical going high | Asserted
300 | 03/18/2015 | 12:01:34 | Temperature #0x20 | Upper Critical going high | Asserted
318 | 07/06/2009 | 00:00:22 | Fan #0x42 | Upper Critical going high | Asserted
330 | 11/13/2016 | 13:25:47 | Fan #0x41 | Upper Critical going high | Asserted
348 | 11/13/2016 | 13:33:00 | Fan #0x41 | Upper Critical going high | Asserted
360 | 11/13/2016 | 13:33:47 | Fan #0x41 | Upper Critical going high | Asserted
378 | 11/13/2016 | 13:44:58 | Fan #0x41 | Upper Critical going high | Asserted
390 | 11/13/2016 | 13:45:48 | Fan #0x41 | Upper Critical going high | Asserted
3a8 | 11/13/2016 | 13:47:45 | Fan #0x41 | Upper Critical going high | Asserted
3c0 | 12/01/2016 | 17:00:29 | Fan #0x41 | Upper Critical going high | Asserted
3d8 | 12/01/2016 | 17:01:53 | Fan #0x41 | Upper Critical going high | Asserted
3f0 | 12/01/2016 | 17:04:02 | Fan #0x41 | Upper Critical going high | Asserted
408 | 12/01/2016 | 17:31:34 | Fan #0x41 | Upper Critical going high | Asserted
420 | 12/01/2016 | 17:43:42 | Fan #0x41 | Upper Critical going high | Asserted
11/13/2016 it happened to me first time, I thought it could be hardware watchdog, so we disabled it in BIOS.
The server has 2x1TB disks, 2x3TB without optical drive. 365 watts non-hot plug, non-redundant power supply.
Now, we recommended to replace the box, but as far as I am concerned I cannot explain why this is happening (I assume it's some sort of mechanical mainboard failure). I wonder if You have any other ideas.
** Update, mr Chopper3 asked what I meant with but CPU one is not standard
.
So, original hatsink has been damaged like this:
Time and bad material choice, plastic was not meant to be durable under constant pressures. I've never seen plastic mount since that setup anymore in any other boxes...
Server has been kept in fair conditions, never overheated, not in the direct influence of the Sun, nobody touched it during work.
It was about 1,5 year ago. We couldn't find original HP part anymore on the market. We replaced it with 3 times larger one, because AM2 sockets were not so popular at given time. I cannot remember now if it has had 2 signaling wires plus VCC and GND (4) like stock posted above. It could have just three. VCC + GND and rotation signaling (3). From that point of time we had multiple power outages and situation like this never happened.
I vote for a fault on the motherboard. Like a failed solder joint or a marginal component. I experienced a similar failure where pushing the motherboard just so would allow the server to boot, but as soon as I released pressure the server powered off with a fan failure or hung with an ECC error.
You probably have a fan failure, and the server is configured to halt on critical fan failure.