The past few days I've been setting up a used Dell PowerEdge 2950 server. It's been running fine until just now, when it suddenly stopped while running a fairly heavy task (svnadmin verify
). The LCD was and is showing the following message:
PowerEdge 2950 E1422 CPU Machine Chk E2118 Fatal NB Mem CRC
Now, the memory error E2118 seems straightforward. One of the banks is probably broken. I'm running memtest right now. Update: Well, memtest didn't find any errors, so it's not that easy.
E1422, less so. Is this likely a separate problem, or just a result of E2118? Googling for this code says "update the BIOS" which isn't very specific as to the cause of the problem.
E1422 CPU Machine Chk
means that the CPU detected an hardware error and stopped operations. It can be related to the other error - memory problem.You can see here and here for more details.
If the CPU or memory is having an issue, such as a voltage regulation problem, then it could drop voltages on the shared bus between the memory and the CPU and fault them both. I was reading some release notes last week for different model Dell server (maybe an R710) where issues can be caused by Intel SpeedStep changing the CPU speed and there being a delay before the memory adjusted speed. This set up a condition where there could be a fault in the time between. In that particular case, a BIOS update addressed that issue.
Run a CPU benchmark and a memory benchmark to see if you can recreate the fault. If so, maybe you can narrow it down to one component or the other.