This is a Canonical Question about the Cost of Enterprise Storage.
See also the following question:
Regarding general questions like:
- Why do I have to pay 50 bucks a month per extra gigabyte of storage?
- Our file server is always running out of space, why doesn't our sysadmin just throw an extra 1TB drive in there?
- Why is SAN equipment so expensive?
The Answers here will attempt to provide a better understanding of how enterprise-level storage works and what influences the price. If you can expand on the Question or provide insight as to the Answer, please post.
Server hard-drive capacities are miniscule compared to desktop hard-drive capacities. 450 and 600GB are not uncommon sizes to see in brand new servers, and you could buy many 4TB SATA desktop drives for the price of one 600GB SAS (server) hard drive.
Your SATA hard-drive in your desktop PC at home is like a muscle car from Ford, or GM or Mercedes or any other manufacturer of cars for every-day people (large capacity V8 or V12, 5 or 6 litres). Because they need to be driven by people who don't have a racing license, or understand how an internal combusion engine works, they have very large tolerances. They have rev limiters, they're designed to run on any oil of a certain rating, they have service intervals say 10,000km apart, but if you miss a service interval by a few weeks it won't explode in your face. They don't catch fire when you drive long distances.
The SAS drive in a server is more akin to a Formula 1 engine. They're really small (2.4 litres) but have immense power outputs because of their tiny tolerances. They rev higher, and often have no rev limiter (which means they suffer serious damage if driven incorrectly), and if you miss a service interval (which is every few hours) they explode.
You're basically comparing chalk and cheese. Numbers and a full breakdown are discussed in the Intel Whitepaper Enterprise-class versus Desktop-class Hard Drives
Let's talk some hard numbers here. Let's say you request 1MB of additional data (a nice round number). How much data is that really? Well, your 1MB of data is going to go into a RAID array. Let's say they're being safe and making that into RAID1. Your 1MB of data is mirrored, so it's actually 2MB of data.
Let's say your data is inside a SAN. In case of a SAN node failure, your data is synchronized at a byte-level to a 2nd SAN node. So it's duplicated, and your 2MB of data is now 4MB.
You expect your provider to keep on-site backups, so your data can be restored in the case of a non-disaster emergency? Any decent provider is going to provide you with at least 1 on-site backup, perhaps more. Let's say they take snapshots once a week for three weeks on-site. That's an extra 3MB of data, so you're now up to 7MB.
If there is a critical disaster, your provider had better have a copy kept off-site somewhere. Even if it's a month old, it should exist. So now you're up to 8MB.
If it's a really high-level provider, they may even have a disaster recovery site that's synchronized live. These disks will be RAIDed as well, so that's an extra 2MB, and thus you're up to 10MB of data.
You're going to have to transfer that data eventually. What? Transfer it? Yes, data transfer costs money. It costs money when you download it, access it over the internet, it even costs money to back it up (someone has to take those tapes out of the office, and it could be that your 1MB of data means they have to purchase an extra set of tapes and transfer them somewhere).
When your SATA home drive fails you get to call tech support and convince them your drive is dead. Then send your drive in to the manufacturer (on your own dime most times). Wait a week. Get a replacement drive back and have to reinstall it (it almost certainly isn't hot swappable or in a drive sled already).
When that SAS drive fails you call the tech support. They almost never question your opinion that the drive needs immediate replacement and drop ship a new drive; usually the new drive is delivered later that same day, otherwise the next day is very common too. Commonly the manufacturer will send a representative out to actually install the drive if you don't know how (very handy if you plan on taking a vacation ever and need for things to keep working while you are away).
Enterprise drives have tight tolerances, see #2 above, and tend to last about 10 times longer than Consumer grade drives (MTBF). Enterprise drives almost always support advanced error and failure detection, which a Google report found works about 40% of the time, but that's something anyone would prefer to a computer suddenly dying.
When you have a single drive in your home computer, its statistical chances of failure are simply that of the drive. Drives used to be rated in MTBF (where SAS drives still enjoy ~50% higher ratings or more), now it's more common to see error rates. A typical SAS drive is 10 to 1,000 times less likely to have an unrecoverable error (with 100x the most common that I found recently). (error rates according to manufacturer documentation supplied by Seagate, Western Digital, and Hitachi; no bias intended; expressly disclaim indemnification).
Error rates are particularly important not when you run across an unrecoverable error on a drive, but when another drive in the same array fails and you are not relying on all the drives in an array to be readable in order to recover the failed disk.
SAS is a derivative of SCSI, which is a storage protocol. SATA is based on ATA, which is itself based on the ISA bus (that 8/16-bit bus in computers from the dinosaur age). The SCSI storage protocol has more extensive commands for optimizing the manner in which data is transferred from drives to controllers and back. This uptick in efficiency would make an otherwise equal SAS drive inherently faster, especially under extreme work loads, than a SATA drive; it also increases the cost.
There are fewer SAS drives produced, economies of scale dictate that they will be more expensive all else being equal.
SAS drives typically come in 10k or 15k rotational speeds; while SATA typically come in 5.4k or 7.2k. SAS drives, particularly the 2.5" size which are becoming increasingly popular, have faster seek times. The two combined dramatically increase the IOps a drive can perform, typically a SAS drive is ~3x faster. When multiple users are demanding disparate data, the IOps capacity of the drive/array becomes a critical performance indicator.
The drives in a data center are typically powered up all the time. Studies have found that drive failure is influenced by the number of heating/cooling cycles it goes through (from running vs turned off). Keeping them running all the time typically increases the drive's life. The consequence of this is that the drives consume electricity. This electricity has to be supplied by something (in the case of a large DC the drives alone might take more power than a small neighborhood of houses). They also need to dissipate that heat somewhere, requiring cooling systems (which themselves take more power to operate).
Infrastructure and staffing costs. Those drives are in high-end NAS or SAN units. Those units are expensive, even without the expensive drives in them. They require expensive staff to deploy and maintain them. The buildings that those NAS and SAN units are in are expensive to operate (see the point about cooling, above, but there's a lot more going on there.) The backup software is typically not free (nor are the licenses for things like mirroring), and the staff to deploy and maintain backups are usually pricey too. The cost of renting off-site tape delivery and storage is just one more of the many things that start to pile up when you need more storage.
Keeping in mind that the capacity of their drives may well be 1/10th the size of a desktop drive, and five times the price, your 1MB of data is actually 10, and all the other differences, there's no way you can draw any meaningful conclusions between the price of your desktop storage and the price of enterprise level storage.
I'm not adding this to the top CW answer primarily because it's a difference of opinion. Feel free to merge/edit this if you wish.
Frequently, the reason "enterprise-level" storage is so expensive could be that the asker does not understand the requirement, but it sometimes also the sysadmin does not understand the requirement, cannot communicate the requirement to someone with purchasing authority, or is simply getting ignored by said authority.
High performance, highly-available, low maintenance off-the-shelf storage arrays are expensive. Part of the job of a system designer is to know where these are appropriate, and where a different design is appropriate.
I don't think the relative costs of different types of disk drives is actually relevant to either of the example questions.
This is clearly addressed to a service provider of some sort. The two possible answers to my mind are:
You have 5 nines uptime, 24/7 support, in Manhattan/London/Hong Kong. The rotating platters are just a small part of the stack you're paying for.
You're paying too much. Negotiate, switch provider, or bring it in house.
This is almost certainly a bad design (and probably for political reasons). The data on that fileserver falls somewhere in this spectrum:
Data is worth storing on a high performance, highly available, high cost setup. Down time caused by running out of space affects your high availability and is a design or planning failure.
Data is either unimportant or slow performance or longer downtime are acceptable. Cheap disks and cheap backup solutions are acceptable. Regular downtime due to lack of disk space still seems like an odd trade-off, since the majority of your cost in this case is probably going to be your sysadmin time, and in the long run, they'll spend more time troubleshooting low disk space.
Note that I said this is a spectrum, and most requirements come somewhere between the two.
I agree with the other posts about the quality of what a hosting company is offering. But we recently re-did our hosting contract and shopped around and no-one was competitive on storage space, nor were the prices lower than our previous 3 year old contract. SAS drives have been dropping in price, disk shelf/arrays/SANs/FC/switches have been dropping, everything has been going down in price. But not disk storage?
A colleague with far more experience pointed out the tactics. The CPU, memory, bandwidth pricing was dazzling! Sign up here! Sign up now and ignore that disk space issue! You won't need that much disk space. Look at the CPU and memory!
Once you commit to their contract they have truly got you, and they make up their revenue on the disk space. Yes it is RAID-5 and high performance etc but backups cost extra, offsite replication costs more.
For hosting companies it is a business model. Most businesses do something similar with their prices - reduce this price here but increase that one over there to make up their revenue somewhere else. They have to pay their rent and salaries too.
For internal servers you have different problems. You can't just walk into a server room with the Fedex box that has your new 3Tb hard disk. If you have planned for expansion it is easier but servers/racks/arrays may already be at their capacity in terms of slots, I/O, controller cards, power.
It's like looking under a rock, you will be surprised what you might find.
It's also important to note that 'local' storage might well cost more than you think anyway.
As part of an exercise in looking to move some of our 'archived' data to the cloud I recently completed a pricing exercise comparing the cost of available (e.g. formatted rather than raw) disk space available on our most recent SAN against the cost of storage in the Amazon cloud data service.
Just considering the price paid for the SAN itself including disks, assuming a 5 year lifetime for the SAN hardware, and not the 'overhead' costs of running our server room, our price for 150Gb of local storage is $31.88 per month vs Amazon's $28.41 (assuming a traffic rate per month of 20% up and down).
Now I'm not about to rush out and move all our storage to the cloud because there are other benefits in having local storage, but I think that this sort of pricing exercise is useful: If you think that cloud storage is expensive then how much are you really paying for your local storage?
The cost of producing any item is directly connected with the volume of unit's it will sell in a feedback loop.
In the case of a conventional hard disk, with spinning rusty glass and electronics, there's potentially a huge variation in the cost of the mechanical and electronic components - however 2 clear price/quality bands have emerged - Enterprise and Commodity.
However the reduced sales volume of an enterprise drive takes a heavy toll on what you get for your money - something which costs seven times as much won't be seven times better.
The enterprise units (for a given capacity) are slightly faster than the commodity units, e.g. comparing Seagate Barracuda SATA (commodity) and Cheetah SAS (enterprise) drives:
But in an enterprise context, no sane system administrator would ever store important data on a single drive - using multiple drives provides greater reliability and bandwidth, and effectively reduces latency; four of the Barracuda drives configured as RAID10, will be a lot faster than the single Cheetah drive, with much less risk of data loss at around 60% of the price.
Certainly you'll get a better warranty with the Enterprise drive, and the vendor will usually be able to get one to you the same day - but you'll probably be able to source a commodity drive from a local supplier faster than your vendor can courier out the replacement disk. On the other hand, the enterprise disk is more likely to be an exact replacement for the failed drive.
So maybe you get a lot more reliability from the enterprise drives? While the people making and selling the drives often say this is the case (Seagate are rather coy about this on their website - but even the obfuscated numbers they publish show it's less than a factor of 2). Independent studies suggest that there's no significant difference.
The SCSI command set does have some technical advantages over the ATA command set - particularly in terms of allowing the OS to know exactly what is committed to the disk - however again, this only makes an effective difference looking at the performance and reliability of a filesystem implemented on a single disk.
If your service provider operates a Fibre Channel SAN, then the cost per Gigabyte of storage will be at least 8 times higher than buying a disk off the shelf at your nearest hardware store. But there are other approaches which can bring the cost down significantly.
Note that this will still never be cheaper than buying an off-the-shelf disk, as you are also paying for redundancy, power, air-conditioning and support, but these costs should be small in relation to the cost of the storage provision.
My take on this question is simple IO... a file that sits on a single commodity hard drive with no raid and probably no hot swapping, and is normally accessed by one person & probably never backed up. This is a cheap and easy method of IO...
In our business, I've used one of the most expensive methods of raid (raid 10) which requires a minimum of 4 drives;we use 6... this gives us high IO rates and fault tolerance.
This configuration has saved my a$$ in a big way, and this result has meant higher performance and less downtime to end users.. for simple IO there is only one person to disappoint and likely there is little financial value tied to the downtime.
We also have a dedicated iSCSI server that is used for Xen virtualization and that is also configured for raid 10...
The more IO that is needed to be used and backed up, the more expensive it is to implement... if your enterprise requirement will accept loss of data, very slow speeds, and no redundancy - then business class storage can be done on the cheap!!! Just be prepared to get fired...