What are the pro's and con's of consumer SSDs vs. fast 10-15k spinning drives in a server environment? We cannot use enterprise SSDs in our case as they are prohibitively expensive. Here's some notes about our particular use case:
- Hypervisor with 5-10 VM's max. No individual VM will be crazy i/o intensive.
- Internal RAID 10, no SAN/NAS...
I know that enterprise SSDs:
- are rated for longer lifespans
- and perform more consistently over long periods
than consumer SSDs... but does that mean consumer SSDs are completely unsuitable for a server environment, or will they still perform better than fast spinning drives?
Since we're protected via RAID/backup, I'm more concerned about performance over lifespan (as long as lifespan isn't expected to be crazy low).
Note: This answer is specific to the server components described in the OP's comment.
Also see: Are SSD drives as reliable as mechanical drives (2013)?
Yes, the SSDs will be way faster than the SAS drives. For sequential throughput, a good RAID of SAS drives might do pretty well, but for random access, the SSDs will blow them out of the water which can result in a very noticeable performance difference.
Depending on the particular SAS drives and the particular SSD drives, the SSDs may have a better unrecoverable read error rate by up to a factor of 10.
Some tips for if you do use consumer SSD drives:
Consumer grade SSDs will work fine in many servers for use cases.
They are way, way faster then SAS disks. I'd suggest the reason to get enterprise disks over consumer disks is not the speed, its the read-write cycles and better engineering - for example supercaps are present in some enterprise SSD's where the consumer grade version does not have this - if you loose power to the server your data is less likely to be killed.
You need to be aware that RAID is not backup - if you are going to RAID a couple of SSD's thats fine, but get different brands of SSD's, or at least different models so they have different performance characteristics. WHEN SSD'S DIE THEY ARE WAY MORE LIKELY TO DO SO WITHOUT WARNING, AND NO ABILITY TO PULL DATA OFF - on the flip side they are 10x as reliable as regular hard disks.
Look into the Samsung 850 series disks - at least for 1/2 the array - they are/were prosumer and offer good bang for buck, and are touted as being more reliable then 2d nand. (They use 3d nand).
Also, as someone else mentioned, don't do RAID5. Drives hold to much for it to work reliably - and back up your data.
Even consumer-grade SSDs are much faster than the faster 15k HDDs, so from a performance standpoint they will be fine (if using the right disk and if overprovisioning them), but you had to carefully pick them, especially due to how they interact with hardware-based RAID controller...
If you are using them for writes, to avoid data corruption in the event of power failure you need to make sure that you only consider models with a supercap. Eg. Intel S3500, Samsung 845DC Pro
Otherwise consumer SSDs are more suited to caching.
The reason to go with enterprise grade gear is reliability more than speed. Most consumer SSDs are MLC, with the lower end stuff being TLC (MLC does 2 bits a cell, TLC does 3, and they're less performant, and reliable than SLC). At some point, they may also drop the onboard ram cache to save costs, as nand cells get cheaper. A enterprise SSD also has greater redundancy built in with more spare nand chips
TLC's newer, slower, theoratically less reliable, has a lower MBTF. You'd want to go for MLC drives
As for reliability, its a mixed bag. You have resistance to physical head crashes, sure, but controllers can die. Drive endurance has improved significantly.
Consider a few things - All drives die. If its important, it absolutely needs to be backed up. Consider this to be nearline storage, and factor in unreliability.
If you're looking at endurance, a modern, high end, consumer SSD (like the samsung 850 pro) have pretty decent endurance. The 850 pro's rated for 150-300 tb of writes (compared to 73 tb for the older model, and 7300 to 14600 tb for the newer models). You might be able to trade off space for nand endurance by playing with spare space. Enterprise SSDs come with more spare space so if a SSD cell or chip wears out it can adjust.
Many consumer drives won't let you read when write endurance failed. One big brand does it, but I can't remember which.
Edit : Recently, a 'linux kernel bug' with samsung SSDs was reported in general, enterprise grade hard drives are boring, reliable old tech. Consumer hard drives, I guess slightly less so. Some of the bugs are being shaken out - and there's changes going on, like nvme becoming more common. Be prepared to test your SSDs before committing anything critical to it. This seems to be a unique edge case but it could be you!
The performance inconsistency of consumer SSDs can cause problems with some raid controllers, the spikes in I/O latency are exacerbated when using a raid controller as it often will not be using TRIM (I don't know of any controller that does). Enterprise drives are designed around consistent performance even without TRIM so they typically play well with RAID controllers.
If you do not need the high endurance there are lower end enterprise SSDs designed around high read, low write cycles. Intel S3500 or Samsung 845DC both offer cheap but raid controller compatible SSDs.
However if you are using dell/hp raid controllers you have to be careful around compatibility, in my experience HP is the worst when it comes to non-hp drives with their controllers and will sometimes not present any monitoring info about the drives.
Just a bit of info, adding to the confusion:
If You are planning to deploy S2D - now or some time in the future - You can NOT, I repeat NOT use consumer SSD´s!
MS decided (wise enough), that S2D should always ensure, that every bit is written correctly AND SECURELY, before the next is sent. So S2D will only use any on disk cache, if it is fully protected against power loss (read: full PLP). If so, the disk - regardless of type - is as fast as its cache, at least until this is exhausted.
BUT, if You are using consumer SSD´s (no PLP), S2D will per design write-through the cache and wait for the data to be confirmed written directly to the individual NAND circuit. Which pr. design results in write latency being measured in seconds as opposed to microseconds even at relatively low loads!
I have seen a lot of discussions on the subject, but never seen anyone actually finding a workaroud this. One could argue, that dual PSUs and UPS would provide sufficient protetction at least for non-critical workloads, especially if they are replicated. So in specific use cases, it would be relevant to be able to "cheat" S2D into using on disk cache that is not PLP. But that decision to overrule basic data integrity is NOT up for discussion - it is PLP or no S2D, period!
I learned this the hard way in a really overdimensioned 4 node cluster (256 cores, 1,5Tb RAM, 16x4Tb Samsung QVO 860, 20 relatively small Hyper-V´s), where performance started out acceptable. When replication was set up, performance went over poor to really bad. The VMs went from somewhat slow do completely nonresponsive. Eventually ending up in the whole pool crashing beyond repair. Studying the logs revealed a bunch of errors - all related to write latency, sometimes values were beyond 15 SECONDS...!
We suspected network errors or just bottlenecks (2x10Gbit without RDMA), but no matter what We did to tweak performance (even tried 4x10Gbit with RDMA), We ended up with the same result. So I studied more and stubled upon an article explaining why You should NOT use consumer SSDs with S2D. Being cheap (and having bought two sets of 16x4Tb consumer disks!) I studied some more, trying to bypass this per-design obstacle. I tried a lot of diffent solutions. With no luck...
So I ended up buing 16x1Tb real datacenter SSDs (Kingston DC500M, the cheapest PLP disks I could find) for testing. And sure enough, all problems dissapeared and HCI is suddently as fast, robust and versatile as claimed. Damn!
Now the same setup is running twice the load with the original network configuration, half as many cores and half as much RAM, but write latency rarely exceeds 200 microseceons. Furthermore, all VM are responsive as h..., users are reporting sublime experience and we have no more errors in backup or syncronisation or anywhere else, for that matter.
The only difference is that disks are now 16x4Tb Kingston DC500M.
So use this hard learned lesson as adwise: do NOT use disks without PLP in HCI...!
If it matters, RAID 1. I would rather have two cheap consumer SSD's in RAID 1 than the best enterprise SSD. The pair should wear at approximately the same rate, but other than wear, they are extremely unlikely to fail at the same time. You should have enough RAM to drastically limit paging so that you can put your system and programs on a hard drive and then put your database(s) on the SSD pair. Since hard drives are cheap, you can afford to RAID 1 those, too. Outside of a fire, that setup will protect your data and provide excellent performance. Then, you can backup to the cloud and call it a day.