There are plenty of resources available online that discuss using SSD drives in RAID configurations - however these mostly date back a few years, and the SSD ecosystem is very fast-moving - right as we're expecting Intel's "Optane" product release later this year which will change everything... again.
I'll preface my question by affirming there is a qualitative difference between consumer-grade SSDs (e.g. Intel 535) and datacenter-grade SSDs (e.g. Intel DC S3700).
My primary concern relates to TRIM
support in RAID scenarios. To my understanding, despite it being over 6 years since SSDs were introduced in consumer-grade computers and 4 years since NVMe was commercially available - modern-day RAID controllers still do not support issuing TRIM
commands to attached SSDs - with the exception of Intel's RAID controllers in RAID-0 mode.
I'm surprised that TRIM
support is not present in RAID-1 mode, given the way drives mirror each other, it seems straightforward. But I digress.
I note that if you want fault-tolerance with disks (both HDD and SSD), you would use them in a RAID configuration - but as the SSDs would be without TRIM it means they would suffer Write-Amplification which results in extra wear, which in turn would cause SSDs to fail prematurely - this is an unfortunate irony: a system designed to protect against drive failure might end-up directly resulting in it.
So:
- Is
TRIM
support necessary for modern (2015-2016 era) SSDs?- Is there any difference in the need for
TRIM
support between SATA, SATA-Express, and NVMe-based SSDs?
- Is there any difference in the need for
- Often drives are advertised as having improved built-in garbage-collection; does that obviate the need for
TRIM
? How does their GC process work in RAID environments?- For example, see this QA from 2010 which describes pretty-bad performance degradation due to not-TRIMming - and this article from 2015 makes the case that using TRIM is strongly recommended.
What is your response to these strong arguments for the necessity ofTRIM
?
- For example, see this QA from 2010 which describes pretty-bad performance degradation due to not-TRIMming - and this article from 2015 makes the case that using TRIM is strongly recommended.
- A lot of articles and discussion from earlier years concerns SLC vs MLC flash and that SLC is preferable, due to its much longer lifespan - however it seems all SSDs today (regardless of where they sit on the Consumer-to-Enterprise spectrum) are MLC thesedays - is this distinction of relevance anymore?
- And what about TLC flash?
- Enterprise SSDs tend to have have much higher endurance / write-limits (often measured in how many times you can completely overwrite the drive in a day, throughout a drive's expected 5 year lifespan) - if their write-cycle limit is very high (e.g. 100 complete writes per day) does this mean that they don't need
TRIM
at all because those limits are so high, or - the opposite - are those limits only attainable by usingTRIM
?
Let's try to reply one question at a time:
Short answer: in most cases, no. Long answer: if you reserve sufficient spare space (~20%), even consumer-grade drive usually have quite good performance consistency values (but you need to avoid the drives which, instead, choke on sustained writes). Enterprise-grade drives are even better, both because they have higher spare space by default and because their controller/firmware combo is optimized toward continuous use of the drive. For example, take a look at the S3700 drive you referenced: even without trimming, it has very good write consistency.
The drive garbage collector does its magic inside the drive sandbox - it does not know anything about the outside environment. This means that it is (mostly) unaffected by the RAID level of the array. That said, some RAID levels (the parity-based one, basically) can sometimes (and in some specific implementation) increase the write amplification factor, so this in turn means higher work for the GC routines.
SLC drives have basically disappeared from the enterprise, being relegated mainly to military and some industrial tasks. The enterprise marked is now divided in three grades:
In reality, any of the above flash types should provide you with plenty of total write capacity and, in fact, you can find enterprise drives with all of the above flash types.
The real differentiation between enterprise and consumer drives are:
Enterprise grade drivers are better mostly due to their controllers and power capacitors, rather than due to better flash.
As stated above, enterprise grade drives have much higher default spare space (~20%) which, in turn, drastically lowers the need for regular TRIMs
Anyway, as a side note, please consider some software RAIDs that support TRIMs (someone said Linux MDRAID?)
TRIM isn't something I ever worry about when using SSDs on modern RAID controllers. The SSDs have improved, hardware RAID controller features have been optimized for these workloads, and endurance reporting is usually in place.
TRIM is for lower end SATA drives. For SAS SSDs, we have SCSI unmap, and perhaps that's the reason I don't encounter TRIM needs...
But the other commenter is correct. Software-Defined Storage (SDS) is changing how we use SSDs. In SDS solutions, RAID controllers are irrelevant. And things like TRIM tend to be less important because SSDs are filling specified roles. I think of Nimble storage read cache or the ZFS L2ARC and ZIL... They all meet specific needs and the software is leveraging the resources more intelligently.
RAID levels with SSD An answer above suggests that RAID levels with parity, like RAID 5, increase write amplification. There is really more than one way to interpret that: the impact on one drive or the impact on the set of drives.
Compared to no redundancy, RAID 5 does add writes to the set as it adds checksum parity. Compared to a RAID 0 array of (n-1) drives, the per drive impact of RAID 5 array with n drives is nothing. Each of the n drives receives just as many writes. RAID 5 adds 1/(n-1) extra writes to the set. RAID 1 and RAID 10 however, add 100% extra writes to the set, because everything written to one SSD is written to its mirror.
So, in terms of write to a RAID 5 set vs a RAID 10 set with the same number of drives, the SSDs in the RAID 5 set will receive fewer writes. And that stays true even if you increase the number of SSDs in the RAID 10 set to equalize usable capacity.
shodanshok touched on the real answer here. If you reserve extra space, "over-provision", your SSD's endurance and write performance consistency will both be improved over time, and the lack of TRIM support becomes mostly irrelevant. Reserving that extra space can be done as simply as, starting with a new SSD, partitioning less than the full capacity. Most of the in-drive controllers treat never used space the same as reserved space and thereby significantly reduce write amplification. For boot and OS, 10% reserved space is probably enough. For drives that are re-written often, increase that space.