I am experiencing rare but real unrecoverable machine checks on HP DL370 G6 dual-core Xeon server. I ran memtest86+ before, and ran CPU-intensive operations without any problems.
In your opinion, does this indicate a real problem, or is it normal and expected behavior?
How would you approach this problem?
EDIT after some troubleshooting, it seems that these machine checks, as well as problems when showing device manager can be traced back to NC375i NICs. All is well when the NICs are not in the server.
Further improvements to stability of HP Gen6 with Intel Xeon have been brought in with BIOS update in September 2013 HP Update DVD. Intel's newer microcode makes these CPUs much more stable. We haven't seen hardware-related BSODs since the update in September.
Usually MCE's that show up in the system IML log point to issues with the system board. A DL370 G6 is still within the realm of manufacturer warranty support, so call it in.
Seeing the error affect both CPU modules indicates system board replacement (versus individual CPU or socket).
In the US - 1-800-633-3600 will get you HP. Hit option #2 and speak "ProLiant running Windows" at the prompt to obtain the fastest service.
The NC375 are problem filled NIC's. I've had a LOT of bad luck with them across various customers we work with. Total loss of connectivity across all ports, lock ups, etc.. Multiple updates to the critical HP Advisories around this NIC as well. General rule we have with the NIC's are replace it with something else if you can, otherwise, ask HP to replace it with newer hardware revisions and grab the latest firmware/drivers as soon as possible.