In our shop we're faithfully using RAID in all our workstations, probably just because that seems to be the way it ought to be done. I'm talking about workstations for scientific simulations, using the onboard RAID chips.
But I've heard a lot of RAID horror stories. Stackoverflow itself has had an outage caused indirectly by RAID controller.
RAID protects you against a very narrow type of failure - physical disk failure - but at the same time it also introduces extra points of failure. There can be problems with the RAID controller, and there often are. In our shop at least, it seems that RAID controllers fail at least as often as disks themselves. You can also easily mess something up with the process of swapping a faulty drive.
When is RAID worth the trouble? Don't you get a better return on investment by adding more redundancy to your backup solutions? Which type of RAID is better or worse in this regard?
Edit: I've changed the title from the original "Is RAID worth the trouble?", so it sounds less negative
Don't worry, RAID isn't used throughout the business world because of groupthink! The chance of decent RAID controllers failing is far, far lower than the chance of a disk failure. I don't recall ever seeing a RAID controller fail in real life, while I've seen many a disk die, both in the office and datacenter.
PS: I see your tags. RAID is not backup! :)
ZFS by SUN (also part of OpenSolaris; Apples OSX - currently read only) not only does raid with various levels but always check to see if the data written to disk is actually there. consistency is key! RAID is useless if you can´t rely on its integrity. Pick a decent RAID controller (I prefer HP´s) and scrub your RAID to find errors periodically.
Softwareraid (as ZFS) on the other hand amkes you more hardware independant if the RAID controller dies and you can´t get an exact replacement.
For those of you saying you won't use hardware RAID because if the controller fails and you can't get an identcial replacement your screwed, you're going about it the wrong way.
If uptime is that critical to you, you should NOT be buying cheap hardware. As was said before, use a good raid controller, HP, LSI, Dell etc.
If the controller was purchased from the computer manufacturer, ie Dell server, with Dell RAID controller, Dell will tell you how long they will be stocking those parts, usually this in the in the 4+ year from the EOL of that server.
If having someone running again quickly means you cannot wait for the delivery then you should be purchasing a second spare controller for yourself, regardless of who made it.
If you setup as a RAID 1, you can sometimes take that one of those drives and drop them on a normal controller to recover the data. If that is important to you, confirm/test this with your controller before you are in a critial situation.
Hardware RAID saved my butt 2x. Once in an email server one of the drives failed, I got the email alert from the raid monitoring software on that machine, called up dell and had a new drive the next day, poped it in and it rebuild all on its own. ZERO downtime on that one
Second one, had a drive fail in an old file server that was scheduled for replacement in 6 months. The controller kept it running and we moved the replacement of the server up to that week. Saved buying a new drive (since it was out of warrenty) and again ZERO downtime.
I've used software raids before and they just don't recover as nicely as hardware based one. You have to test your setup, software or hardware to be sure it works and know what to do when the brown stuff hits the fan.
Always. Disks are cheap, your information is not. But use software RAID, so you have the flexibility to move forward or change hardware later on (trust me, you will need it). And also use a checksumming filesystem like ZFS, to protect against silent data corruption (which is very likely with large disks nowadays).
Harddrive failures are much more likely to happen in a server than a desktop workstation...
You can't just say "adding more points of failure" without taking into account the likelihood of that failure. Especially since these less likely points of failure are specifically in place to subvert the more likely hard disk drive crash. As you've put it, you've basically created a Pascal's Wager-like fallacy.
Most RAID systems on desktop motherboards are cheapo software/hardware hybrids with most of the work done in its software driver. IMHO they are peices of crap used to sell to power-users.
On the other hand, a good actual hardware RAID is quite reliable, and it has the hardware to do its thing without (despite?) the operating system. But those get expensive, because real hardware usually has battery backups, and a complete XOR'ing array to calculate checksums, etc. Even more expensive if it's done using SCSI.
Summary: If you are running the motherboard based RAID systems, then no, it isn't worth the trouble.
Although backups and RAID are solutions to different problems, most "RAID problems" are very similar to the most common backup problem (ie. nobody tests a restore) -- nobody tests system recovery. Other RAID problems are often a direct result of people not understanding what it does and doesn't do. For example, many people think that RAID guarantees the integrity of their data -- it does not.
For workstations, if you're using RAID-0 to improve performance of IO-bound applications, or RAID-1/5/6 to keep to $100/hour scientist working when her $80 hard disk fails, you're using RAID appropriately. Just don't confuse disk redundancy with backup, and have tested procedures in place to ensure that your IT guys handle recovery.
RAID is great for uptime, but it's not a substitute for backup. As a colleague once commented, "You know that 'Oh, sh!t' moment when you deleted something accidentally? RAID just means you get to 'Oh, sh!t' more than one drive at the same time."
That said, that day when you pop your head into your boss's office and tell her, "By the way, the database server had a hard drive crash last night-- we never went down, it finished rebuilding onto the spare at 5 AM and I've sent the bad drive off under warranty" -- that's when RAID is priceless.
There are two types of RAID
Some operating systems has good software raid solution (this has nothing to do with the crappy cards mentioned above). Linux software raid is especially good, its performance is really good.
Raid can only improve reliability it is not a backup solution. Files can be deleted accidentally, faulty disk can return (and duplicate) bad data to other disks in a raid array, so a real backup solution still needed.
It seems that a lot of the above posts are forgetting the original question and are just debating about RAID 1. The question was "When is RAID worth the trouble?" Well, it depends... If your developers do a lot of data read & writes with their workstations than a RAID 0 configuration would be worth it. Adding more drives to this RAID 0 is of course going to boost speed and performance BUT will increase the likelihood of a failure (disk or controller).
I work for a Nursing School with about 500 Dell machines deployed and almost none of them utilize any sort of RAID. It seems to me that my type of users won't see enough of a benefit to add the complexity of a RAID system on each machine. I worry more about data recovery and disk imaging than the speed of RAID 0 or the redundancy of RAID 1. Of course, I'm not talking about our production servers, that's another story. Data recovery being crucial, we rely on other backup methods to account for more than just disk redundancy. Any sort of RAID wont help you if a user accidentally deletes a file.
So to answer your question IMHO... RAID 0 on a workstation is worth it when the user needs the performance. (Just make sure that all importa data is backed up.) I'm sure you can check into the data throughput on the existing setup to see if it's adequate. RAID 1 should be used in the server environment where higher class RAID controllers are available. It's not worth the hassel on a workstation because it complicates deployment, disk imaging, and repairs. Many of these workstations come with RAID controllers built on the motherboard.It's a good feeling to know if a motherboard goes out on a machine I can always put the drive in another system to get the data.
Linux software RAID is excellent, and it actually beats low-end hardware RAID hands down. It also has a few optimizations that can be useful for a workstation. For example, it can read different things on each disk at the same time, effectively doubling random access read times, which is a common use case unlike transfer rate-bound operations optimized by RAID 0.
As for reliability, it's a very well maintained part of the Linux kernel, used by millions, it handles hardware failures very well, so it's clearly a win as far as availability is concerned. I have used it on my personal workstations as well as a few dozen low-end servers for years, some pretty loaded, and never could attribute it any fault. I've experienced a good dozen broken disks in the meantime, however.
(Higher end hardware RAID cards have other features though, such as battery-backed write cache. It basically multiplies random synchronized disk write speed by ten. It is absolutely necessary for databases, probably pretty useless for workstations.)