I recently picked up a bunch of hardware to build a new home media server. When fully configured, it'll have 20 1TB hotpluggable SATA-II drives running under a Linux system. To date, I've used RAID5 and RAID6 (not in conjunction) in smaller servers spanning up to eight drives, but I'm wondering if that's still a good idea.
More specifically, I'll have six SATA cards in the system - four PCI cards with four SATA ports each, and two PCI-Express x1 cards with two SATA ports each. (This hardware isn't entirely for sure yet - as an aside, let me know if I can improve on it. Those slots are the only ones available on the current motherboard.)
I'm primarily looking for suggestions as to what low-level software system (RAID, LVM, a combination, something else entirely) I should use to implement this system. Requirements:
- It has to be scalable up to the 20 drives - I'm starting with four, and working my way up (most likely one or two drives at a time) to the full capacity.
- It has to run on Gentoo Linux - I'm very flexible as to the software I use, but not to the point of switching the entire operating system for it.
- It has to be reliable enough to suffer drive losses - at least two of the twenty at any given time. The server will be some hundreds of miles away from me most of the time, and I can't get anyone there to swap out drives as soon as they go bad, so it needs to be able to live with a drive or two down for a little while. Bonus if an entire controller card can fail and the array stays up.
- It has to have reasonable capacity - I'd like at least 15TB, out of the 20, actually available for data storage (as opposed to RAID parity or similar). More capacity, as long as it's not sacrificing too much integrity, is better.
- It has to present a single unified filesystem to the OS - 20 separate 1TB drives, all with separate filesystems and mounted separately, won't be manageable (even ignoring the fact that a drive failure in this kind of setup would destroy a terabyte of data).
Keep in mind when making suggestions that I don't mind putting a fair amount of work into this - there's no requirement for easy or instant setup, as long as it works and is reliable in the future. Suggestions as to the filesystem to layer on top of it would also be welcome. I'm currently using JFS, because it seems to perform well and is growable while mounted read-write, but if there's an improvement I can make I'm open to it.
I'd go with (ultimately) two 9-disk RAID6 arrays, with two hot spares (given that you'll be quite a distance from the hardware, you want to minimise the window of opportunity for more disks to fail) on Linux software RAID, with LVM on top. LVM allows you to easily grow the storage, unify multiple RAID arrays into a single volume group, and gives flexibility in allocating storage (if you've got a suitable filesystem on top that allows online resizing).
Linux software RAID allows you to add more disks to an existing array easily, which satisfies your need to slowly add disks over time.
At some point, the reliability of the rest of the machine will be lower than that of the discs.
As you claim this is a "Home media server", I am assuming you will be using consumer-grade parts. The discs may fail, but at some point the rest of the machine may fail too.
If you want it to be reliable, use redundant power supplies (20 discs is going to need quite a bit of power anyway) at the very least. I don't know how reliable all these SATA cards are, I guess they probably won't fail a lot.
Also, you'll want ECC RAM, otherwise the amount of data you're pushing around means data errors are almost guaranteed sooner or later.
In my experience motherboards fail occasionally, but power supplies fairly often.
If you're definite on Linux then look at ZFS-FUSE, but if you can be more flexible consider NexentaStor, it's the Solaris kernel with a more (Debian) Linux style userland done up as a file storage appliance.
In a year or two BTRFS might be interesting, but not yet.
If you still want a unified filesystem your other option would be to have LVM present a single PV based on two RAID6 sets and use XFS on top of that.
Also at the moment 1.5TB drives seem to be the best point in the price/performance curve (here in Australia at least)
You're basically describing Tahoe with a FUSE wrapper (though Tahoe is also distributed and secure in ways that would be overkill for your needs). Tahoe is a bit of a pain to set up and not the fastest thing out there, but it's basically the direction you should be looking in. You can configure Tahoe (or a similar system like XtreemFS, which I don't have any experience with) for whatever level of redundancy you are comfortable with. I would set it up with one Tahoe node per drive on the machine. Then configure it so a file placed onto the Tahoe FS is split into 20 shares such that it can be recovered with any 15 of them. That would give you a little less than the 15TB out of 20 that you want, but could survive five simultaneous drive failures. If you're less conservative, you could get more capacity with a little more risk.
IMO, the other big advantage of Tahoe is that it sets you up to expand to a truly distributed setup. As others have pointed out, you might have 20 drives, but there are still a lot of single points of failure in a setup with only one chassis. With Tahoe, you can expand it securely to drives all over creation.