Are ECC memory modules important to have on a non-critical server?
I was thinking about getting myself a toy dedicated server for lots of random, non-critical stuff. Sporadic reboots are no big deal. I'm looking at one provider but the prices are insanely cheap. Their hardware sounds like a joke for any any serious server box: desktop processors, non-ECC RAM, no-name chassis, no hotswap SATA HDD, etc. (well, the price justifies it, I guess).
I take ECC memory for granted on any "serious" server, so I'm wondering if it's a big deal or not for "toy" appliances.
Data published by CERN IT staff (Data Integrity) would suggest that the amount of errors that comes from RAM is quite low. You still have to weight your data and the cost of hardware.
You can read a bit more about this at StorageMojo.
ECC RAM basically helps to prevent errors that occur when reading and writing from RAM. The chance of there actually being an error is quite small, but non-zero. I would say that if you aren't doing mission-critical stuff you could get away without ECC RAM - like I said, the chances of encountering an error that ECC would prevent is really, really small.
What is a non-critical server? One that can fail?
ECC RAM is fundamental when memory reliability is fundamental.
Two things grow with the growth of memory sizes:
This intel presentation on ECC reports these facts:
Another recent research by WISC shows ECC to be essential for these ZFS systems:
It is important to note that other filesystems are just as sensitive to this form of data corruption as ZFS is.
ECC is what saves you from running into these problems, when possible, and in disastrous cases, what warns you about this happening before it's too late.
It's simply not that important. If you needed 99.999% uptime you'd worry about it. Other than that you'll reboot more often than you'll get memory errors.
This study by Google from 2009 found an error rate between 25000 and 70000 errors per billion device hours per megabit. That means for 8GiB of (used) RAM there were roughly 1.7 to 4.8 errors per hour.
Bitflips are something that exists and shouldn't be ignored as soon as data integrity is of importance.
In your case (random, non-critical stuff) it would propably be overkill.