I updated this post since I replaced the processor, but the core of my question (and unfortunately the results as well) are the same.
I built my first FreeNAS box and wanted to use ECC RAM since I want to store critical data. Because I am on a budget, I wanted to go for the most affordable solution that would still support ECC RAM.
After doing some research, I found out, that I need a motherboard, memory and a CPU that supports ECC. My motherboard of choice is the "Gigabyte X150M-Pro ECC" which has the C232 chipset, DDR4 and a LGA1151 socket.
I also bought a kit of two DIMMs made by Kingston with the model number "KVR21E15S8K2/8" (spec sheet). Gigabyte published a list of tested memory modules and my modules seem to be supported with working ECC (list of supported modules).
Since I am on a budget I needed an affordable Skylake CPU that supports ECC. According to Intel the Celeron G3900 does support ECC, so I went with that one.
After building the computer, I wanted to verify that my system is indeed running with ECC memory and entered the motherboard's BIOS. From various internet sites, I found out that some motherboards have a special section which should tell if ECC is working, but my motherboard doesn't seem to have that. I checked all menus and I couldn't find a similar section.
After doing some more research and found a post over on the Unix&Linux stackexchange which didn't solve my problem. I tried the latest memtest86+
which from what I could tell, doesn't even show the value "ECC". I tried the older 4.20 version that Puget systems used which showed "ECC: off". However after reading the previously mentioned post, I doubt that it tells the truth (maybe that's why the feature was removed?). Both version also didn't read out the correct speed and latency of the DIMM which adds to my doubts towards memtest86+
.
Another popular way to find out, if ECC is working, was to issue the dmidecode -t memory
command and read out the Total Width
and Data Width
. My results were 128 Bits
and 64 Bits
respectively. One part of the output showed details about the memory array which had a key-value pair of Error Correction Type: Single-bit ECC
.
I was expecting 72 bits
for the Total Width
, so I thought it might be related to dual channel and moved the memory modules into two adjacent slots which should prevent dual channel, but the result was the same. Here is the full output of dmidecode -t memory
.
I even tried out the interesting C-program that Puget systems published, but the result was 0
, indicating no ECC support.
Now I am starting to doubt that the data on Intel's own website is correct and my CPU doesn't actually support ECC. Both the memory and the motherboard are specifically branded with "ECC", so I can rule out those.
Is it possible that the BIOS version needs an update (currently there is none) to enable ECC or is ECC actually already working and I was just not able to verify it? Or is my choice of CPU wrong, if I want to run ECC memory and Intel's website is wrong/misleading?
If my CPU turns out to be the wrong choice, what would be the next best choice for a "budget ECC CPU"?
UPDATE: I saw some new indication that my system actually might be running with ECC enabled and the dmidecode
tool just reports weird data. Over at the FreeNAS forum the user Dusan is using server grade hardware (SuperMicro MB, Xeon CPU, Kingston DIMM) and has a similar output with 128 Bits
. But he wrote that he is not sure himself, if it actually works.
UPDATE 2: As yagmoth555 mentioned in his answer to this question, it seems that my motherboard only supports ECC with Xeon processors, though I thought that note was a relict from previous manuals that got copied over. I guess that means that I need to look into a Xeon processor.. :-/
UPDATE 3: I bought a Xeon E3-1220v5 now which of course supports ECC and should meet the requirement from the manual. I ran all the tests again to check for ECC functionality and the results are basically identical:
From the comments at Puget Systems, it also seems like that the ecc_check.c
program doesn't work on Xeon and Core i7 processors.. :-/
I checked out memtest86+
some more this time and I am fairly certain that it doesn't support DDR4 or the C232 chipset at all, since it reports not only the wrong speed and timings but also DDR3 instead of the installed DDR4. However, it detected processor just fine, but I still got the same end result with both versions of memtest86+
:
Version 4.20 doesn't even detect my processor properly..
Any ideas on how else I can test for ECC are very much appreciated.
Today I found out that there is a commercial version of
memtest86
(without the+
) from PassMark that offers a free version too which thankfully included ECC-Checks.In addition it also supports DDR4 and all the other features of
memtest86+
.My result seem to be positive for ECC support, so I will call this done, even though I was hoping to get the same result with "traditional" tools like
dmidecode
.If someone stumbles upon this post at a later point in time and needs further validation and tests, they also offer a paid version that supports ECC error injection for actually testing the ECC capabilities.
Edited: Bad new from your motherboard manual... :
I see you run BSD/linux, run that inside the OS; (Available for FreeNAS)
dmidecode -t 17
You should have a output like:
dmidecode 2.12 SMBIOS 2.5 present.
Handle 0x1100, DMI type 17, 28 bytes Memory Device Array Handle: 0x1000 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 2048 MB Form Factor: DIMM Set: 1 Locator: DIMM1 Bank Locator: Not Specified Type: DDR2 Type Detail: Synchronous Speed: 667 MHz Manufacturer: AD00000000000000 Serial Number: 00002062 Asset Tag: 010839 Part Number: HYMP125P72CP8-Y5 Rank: 2
The Total Width: 72 bits is the part you are looking for.
On Windows system you can run
wmic MEMORYCHIP get DataWidth,TotalWidth
Answer for FreeBSD & Windows took from there
Using a Ryzen 7 processor, none of the mentioned tools worked for me either. However with a recent enough Linux kernel, the tools in edac-utils, edac-ctl and edac-util can read out the ECC status and also things like number of corrected errors. The kernel log will also contain lines with "EDAC" in dmesg, which should give some information as well. This functionality can be further tested by overclocking the RAM and checking that errors are reported (if going high enough), that is about as much proof as you can get that it really works. However even if these tools report errors or do not work, that only means that reading ECC status information is not supported, there seems to be no 100% reliable way to prove that ECC is NOT working...
I have found
dmidecode
results to be hit or miss, with dmidecode often reporting board "capabilities" having ECC even if non-ECC memory is installed. Similarly,edac-utils
also often shows ambiguous results with "no DIMM info":However, the output of the
lshw
utility always seems to indicate if ECC is configured and working correctly, even on fringe platforms like the LGA1155 i3-2100 (which is one of few desktop Intel CPUs that does support ECC if all requirements are met):For non server motherboards and chipsets, only specific AMD motherboards(like ASRock) and any AMD chipsets offer ECC.
For Intel, they only make ECC available on server Xeon chipsets. Intel disables ECC on their desktop chipsets.