Issue
I have read many discussions about storage, and whether SSDs or classic HDDs are better. I am quite confused. HDDs are still quite preferred, but why?
Which is better for active storage? For example for databases, where the disk is active all the time?
About SSD.
Pros.
- They are quiet.
- Not mechanical.
- Fastest.
Cons.
- More expensive.
Question.
- When the life cycle for one cell of a SSD is used, what happens then? Is the disk reduced by only this cell and works normally?
- What is the best filesystem to write? Is ext4 good because it saves to cells consecutively?
About HDD.
Pros.
- Cheaper.
Cons.
- In case of mechanical fault, I believe there is usually no way to repair it. (Please confirm.)
- Slowest, although I think HDD speed is usually sufficient for servers.
Is it just about price? Why are HDDs preferred? And are SSDs really useful for servers?
One aspect of my job is designing and building large-scale storage systems (often known as "SANs", or "Storage Area Networks"). Typically, we use a tiered approach with SSD's and HDD's combined.
That said, each one has specific benefits.
SSD's almost always have a higher Cost-per-Byte. I can get 10k SAS 4kn HDD's with a cost-per-gigabyte of $0.068/GB USD. That means for roughly $280 I can get a 4TB drive. SSD's on the other hand typically have a cost-per-gigabyte in the 10's and 20's of cents, even as high as dollars-per-gigabyte.
When dealing with RAID, speed becomes less important, and instead size and reliability matter much more. I can build a 12TB N+2 RAID system with HDD's far cheaper than SSD's. This is mostly due to point 1.
When dealt with properly, HDD's are extremely cheap to replace and maintain. Because the cost-per-byte is lower, replacing an HDD with another due to failure is cheaper. And, because HDD failures are typically related to time vs. data-written, replacing it doesn't automatically start using up TBW when it rebuilds the RAID array. (Granted, TBW percentage used for a rebuild is tiny overall, but the point stands.)
The SSD market is relatively complex. There are four (current, at the time of this writing) major types of SSD's, rated from highest number of total writes supported to lowest: SLC, MLC, TLC, QLC. The SLC typically supports the largest numbers of total writes (the major limiting factor of SSD lifetimes), whereas the QLC typically supports the lowest numbers of total writes.
That said, the most successful storage systems I've seen are tiered with both drives in use. Personally, all the storage systems I recommend to clients generally follow the following tiers:
Read/Write performance drops as you increase tiers, data will propagate down to a tier where most of the data shares the same access-/modification-frequency. (That is, the more frequently data is read/written, the higher the tier it resides on.)
Sprinkle some well-designed fibre-channel in there, and you can actually build a SAN that has a higher throughput than on-board drives would.
Now, to some specific items you mention:
Your SSD Questions
Your HDD Questions
Is it? I'm not sure it is to be honest.
HDD's come in large sizes for a decent price right now, that's undeniable, and I think people trust them for longer data retention than SSDs too. Also when SSDs die they tend to die completely, all in one go, whereas HDDs tend to die in a more predictable way that maybe allows more time to get data off first if needed.
But otherwise SSD is the way forward for most uses - you want a boot-pair, a couple of 500GB SATAs in R1 won't cost the earth, for DB use you can't really beat SSDs (so long as your logs are on high-endurance models anyway). For backups yeah you might use big 7.2k HDDs, same for very large datasets (in fact I bought over 4,000 10TB HDDs early last year for just this requirement), but otherwise SSD is the way forward.
Solid state for everything hot: interactive use, databases, anything online. Spindles as cheap warm storage, only for not-quite-cold archives or infrequently accessed data. In particular, HDDs in a staging area before backups are archived to tape.
Different media types for hot versus cold also helps with some diversity. A data loss flaw in a brand of SSD controller would be much worse if it took out both online and backup data. Unlikely, but spindles and tape are cheap anyway so why take the risk.
The failure mode of any particular device is not important, as long as the arrays stay redundant and backed up. Usually the procedure is to replace a drive with any symptoms of failure. Experiment with repairing them in your test systems, where any catastrophic failure does not impact production services.
File system is a matter of personal preference. While there are SSD optimized file systems, something you know and can repair may be more important.
The big advantage of an SSD is speed and reliability however, one of the dirty little secrets is the limited number of write cycles that an SSD has. If you are building a server that has a lot of hard drive write activity like a database or email server you will need a more expensive SSD that has higher endurance.
NAND Flash has 3 types
TLC is mainly designed for web servers or archive servers that have little write cycles. MLC is for servers that have a mix of read and write cycles like a low volume database servers. SLC is designed for servers that have a lot of read/write cycles like a high volume database server.
The main driving factor between SSD and HDD is application and budget. In a perfect world, SLC SSD hard drives would make a standard HDD obsolete but we are just not there yet.
That depends on who you talk to, their background (management, IT, sales, etc), and what type of server the discussion is in reference to. HDDs are generally an order of magnitude less expensive per byte, but use more power and are almost always slower, workload dependent.
Almost always it comes down to cost and how much storage can be fit into a given amount of servers. If you can get the performance of a 5-disk raid array with a single SSD, the SSD is probably a lot less expensive and uses a fraction of the power, but you will also get maybe 1/10 the storage.
This is where it gets complicated, and why many people will skip the complication and just go with the HDDs they know.
SSDs come in different grades with limits on how much data can be written to the cells, which is NOT the same as the amount of data written by the host. Writing small amounts of data end up writing large amounts to the cells, this is called write amplification, and can quickly kill drives with low endurance ratings.
SSD cells are named for the amount of bits they can store, in order to store n-bits, they need 2^n voltage levels per cell. A TLC (triple bit) needs 8 voltage levels to address those bits. Generally, each time you increase the level of bits per cell, you get a 3-10X drop in cell durability. For example, an SLC drive may write all cells 100000 times before the cells die, enterprise eMLC 30000 times, MLC 10000, TLC 5000, QLC 1000.
There are also generational improvements in SSD cell technology, better lithography and 3D NAND improve density and performance over older 2D NAND, "Today's MLC is better than yesterday's SLC", as quoted by analyst Jim Handy.
SSDs do not actually write directly to addressed cells, they write to blocks of cells. This way the block has a more consistent amount of cell writes, and when cells drop out of tolerance the entire block is marked bad, and the data is moved to a new block. SSD endurance is based on the cell type, how many spare blocks are available, how much overhead for error correction, and how the drive uses caching and algorithms to reduce write amplification. The tolerance the manufacturer selects to mark bad also comes into play, an enterprise drive will mark blocks bad earlier than a consumer drive, even though either one is still fully functional.
Enterprise grade "high-write" SSDs are based on SLC or eMLC cells and have large amounts of spare blocks, and usually have a large cache with capacitors to make sure the cache can flush to disk when power is lost.
There are also drives with much lower endurance for "high-read" applications like file servers that need fast access times, they cost less per byte at the price of reduced endurance, with different cell types, less spare area, and so on, they may have only 5% of the endurance of a "high-write" drive, but they also do not need it when used correctly.
My database is small, with intermittent reads being 95% of access, and most of it is cached in RAM, it is almost as fast on a HDD as on SSD. If it was larger, there would not be enough RAM on the system, and the SSD starts to make a huge difference in access times.
SSDs also make backups and recovery orders of magnitude faster. My DB restored from backup in about 10 minutes to a slow SSD, or about 11 seconds to a really fast one, backup to a HDD would have been about 25 minutes. That is at least 2 orders of magnitude, and that can make a huge difference depending on workload. It can literally pay for itself on day 1.
Databases with huge amounts of small writes can murder a consumer grade TLC drive in a matter of hours.
Absolutely, if the correct drive type and grade are selected for the application, if you do it wrong it can be a disaster.
My server runs several databases, plus high-read network storage, plus high-write security footage storage, plus mixed read write file storage and client backup. The server has a RAID-6 array of HDDs for the bulk network storage and NVR, a single high-performance MLC SSD for MySQL, and 3 consumer TLC drives in RAID-5 for client and database backups and fast access network storage.
Write speed on the SSD RAID is about the same speed as the HDD RAID, but random access read speed is more than 10X faster on the SSD RAID. Once again this is a consumer TLC SSD, but since the sequential write speed is about 3X faster than the gigabit LAN, it is never overloaded, and there is plenty of overhead if the system does local backups when it is being accessed remotely.
Most SSDs also offer instant secure erase (ISE), which can wipe the data in a few seconds, versus many hours or days for HDDs that do not have that feature, only a few enterprise grade HDDs tend to offer ISE, but they are becoming more common. This is very useful if you are retiring or re-purposing a drive.
Depends on the type of data and the types of filesystem features you want. I am only using EXT4 and BTRFS (need snapshots and checksums). Filesystem overhead will decrease usable space and can slightly reduce the life of SSDs, BTRFS has high overhead for checksums and other features, and snapshots will use a lot of space.
Regardless of drive type, have you ever had to have data recovery done on a dead drive? It can be very expensive, you are better off having a tiered backup, RAID on main storage, versioned backups locally on a different device or machine, then sync to offsite or cloud. 1TB of cloud storage is $5 per month, data recovery on a HDD can cost you 2 grand, and a dead SSD may be impossible to recover... just do the backups and forget about repair.
BOTH.
I have yet to see an SSD dying because of the write load (they are supposed to become readonly in this case). Not that they don't die for other reasons - including, but not limited to overheating and firmware bugs.
And I have seen a dead HDD. A lot more of them, actually.
So much about the reliability.
In some cases it makes sense to make mixed RAID1 (HDD + SSD). This way you can hedge for the failure modes related to both of them and still have SSD read performance.
In other cases it makes sense to use an SSD for the filesystem's journal only - you'll get 2x the write performance of the HDD (because you save half of the writes and half of the seeks) and generally no risk even if your abused SSD dies. Ext4 loses it's journal pretty gracefully.
The two main factors to consider are:
SSDs blow HDDs out of the water in terms of performance. If you need high throughput and low access times, nothing beats SSDs.
But the cost per gigabyte of SSDs is much higher than that of HDDs. If you need a lot of storage and throughput or access times are less important, nothing beats HDDs.
Throughput (bandwidth) figures may be helped by the appropriate RAID level (not so much access times, though, unless your drives are backlogged enough that queuing is an issue).
Read access time figures for small datasets may be helped by appropriate caching (i.e. put more RAM in your server). Won't help for writes, though (with the exception of battery-backed RAM caches in controllers or disks).
So it all really depends on your use case. A backup/archive server which needs a lot of capacity but doesn't care much about access times or bandwidth will be better off using HDDs. A high-traffic database server will prefer SSDs. In between... depends.
Whatever the situation:
You need backups. It's not a matter of if a drive (SSD or HDD) will fail, it's a matter of when.
If the server has any kind of importance, you want some kind of RAID to maintain uptime and protect data. RAID will also usually help with performance. Which depends a lot on your requirements (again, a performance/cost compromise).
As already mentioned, the big difference is price per GB vs random IO performance.
Take, for example, a Seagate Exos 16 TB: at ~550$, it commands 0,034$/GB. Now compare it with with an entry-level (speed wise) Micron 5200 ECO 7.68 TB priced at ~1300$, with a resulting 0,14$/GB ratio: the HDD is 5x cheaper, while being 2x bigger also. On the other side, SSD random IO performance are wastly better, with a catch: consumer SSDs, lacking powerloss-protected writeback cache, are quite slow (sometime as slow as HDD) for synchronized random IO rich workload (eg: databases, virtual machines). This is a very important point, rarely analyzed by online reviews. Enterprise SSDs, with almost univesally use capacitors as power-loss protection, do not suffer from this weakness, having very high read and write random IO.
From the above, you can understand why SSD have killed the high-end 15K and 10K SAS disks: they provides much better performance at a comparable cost (15K disks were especially expensive). On the other hand, 7.2K HDD have a very strong foothold in high capacity storage systems.
Intel Optane (which is based on Xpoint rather than NAND) is in a class of its own both in speed and durability, commanding a very high price/GB: a 100 GB Optane P4801x costs over 260$, with a per-GB cost of > 2.6$, 80x more when compared to HDDs. For this reason, it is often used as an "application accelerator", or as a log/journal device.
For these reasons, modern SANs and server often used a tiered or cached storage subsystem:
tiered systems put hot data in the fast tier (SSDs) and cold data in the slow tier (HDDs). In such systems, the total storage space is the sum of the fast and slow tier; however, they are staticall partitioned - if a cold data suddenly become hot, you need to wait for it to being moved to the fast tier. Moreover, the fast tier must be as much as durable than the slow one;
cache-based system have all data on slow HDD, augmented with a dynamic cache on SSD where hot data are copied (rather than moved); this means that such systems have a total storage space equal to what the slow tier offers, but with the added flexibility of a dynamic cache. With cache-based systems, the fast tier can be formed by inexpensive, cheap SSDs.
What is the best filesystem for a flash-based SSDs? A naive answer can be "the one which writes less", but the reality is that any advanced filesystem tech is based on a CoW approach which, based on the specific implementation, can led to a quite substantial write amplification (ie: ZFS and WALF are going to write more than, say, EXT4 or XFS). For a pure "write-less" standpoint, I think that it is difficult to beat EXT4 and XFS (especially when backed by lvmthin, which enables fast snapshots even on these classical filesystems); however, I really like the added data protection guarantee and lz4 compression grated by ZFS.
So, do you really need an SSD storage for your server duties? It depends:
if you need to cheaply store multiple TBs of data, HDDs (or at most cheap consumer SSDs) is the way to go;
if you have a mostly sequential workload (eg: fileserver), you don't need SSDs;
if your workload is random IO rich, you will greatly benefit from SSDs;
if you have an fsync-heavy write pattern, enterprise SSDs (or a beefy RAID controller with powerloss-protected writeback cache) are your best bet, with the downside of high cost.
Simple answer here : Use SSDs for fast perfomance data for eg, when building a server to do large and quick data operation (like video editing)
Use HHD's for slow archival storage.
Generally HDDs are less reliable than SSDs even though they have a lower cost per gig than SSDs.
if sensitive data is being stored, consider using a ssd and also a hdd for backup.
Quiet isn't always good. Like electric cars on the road being too quiet. HDD access noises can provide security ( how I detected a break in to a work perforce server while watching a movie. (Addition: line feed printers linked to /var/log/messages are harder to erase a single entry)