Are there any guidelines for which storage scheme(s) makes most sense for a multiple-disk home server?
I am assuming a separate boot/OS disk (so bootability is not a concern, this is for data storage only) and 4-6 storage disks of 1-2 TB each, for a total storage capacity in the range 4-12 TB.
The file system is ext4, I expect there will be only one big partition spanning all disks.
As far as I can tell, the alternatives are
individual disks
- pros: works with any combination of disk sizes; losing a disk loses only the data on that disk; no need for volume management.
- cons: data management is clumsy when logical units (like a "movies" folder) are larger than the capacity of any single drive.
JBOD span
- pros: can merge disks of any size.
- cons: losing a disk loses all data on all disks
LVM
- pros: can merge disks of any size; relatively simple to add and remove disks.
- cons: losing a disk loses all data on all disks
RAID 0
- pros: speed
- cons: losing one drive loses all data; disks must be same size
RAID 5
- pros: data survives losing one disk
- cons: gives up one disk worth of capacity; disks must be same size
RAID 6
- pros: data survives losing two disks
- cons: gives up two disks worth of capacity; disks must be same size
I'm primarily considering either LVM or JBOD span simply because it will let me reuse older, smaller-capacity disks when I upgrade the system. The runner-up is RAID 0 for speed.
I'm planning on having full backups to a separate system, so I expect the extra redundancy from RAID levels 5 or 6 won't be important.
Is this a fair representation of the alternatives? Are there other considerations or alternatives I have missed? And what would you recommend?
Like you I'm going through a rationalisation process with the disks in my home server. I too have a mix of disk sizes resulting from the organic growth of the JBOD setup I have.
I am taking the LVM route for the following reasons.
For me the clinching factors are #3 & #4.
I'm using Greyhole and it fits almost perfectly to my use case:
Limitations:
well on raid systems not the disks must have the same size...
just the partitions you want to add to the raid, need to have the same size to create a raid...
the strengths of lvm are, that you can easily grow your virtual disk by adding more partitions to it. and you have a snapshotting feature!
you can also combine lvm with raid... so that you have data security and the flexibility of lvm :)
You can stack block devices in Linux and mixin the value of both Software RAID and LVM which should address all your needs. This can all be accomplished from the non-gui installer.
[1] I encountered a very nasty fault once on SATA disks that had lots of bad blocks. After using the vendor tool to reconstitute the disk. My once identical set of disks was now unique, the bad drive now had a few blocks less than before the low level format had begun, which of course ruined my partition table and prevented the drive from rejoined the MD RAID set.
Hard drives usually have a "free list" of backup blocks used for just an occasion. My theory is that that list must have been exhausted, and since this wasn't an enterprise disk, instead of failing safe and allowing me the opportunity to send it off for data recovery, it decided to truncate my data.
[2] Never deploy LVM without a fault tolerant backing store. LVM doesn't excel at disaster recovery, you're just asking for heartache and if you get it wrong, data loss. The only time it makes sense is if the VG group is confined to a single disk, like an external usb disk or perhaps an external eSATA RAID. The point is try to deploy your VG around backing stores that can hot plugged as a single unit, or as a virtual single unit which is demonstrated in the MD example above.
What about http://zfsonlinux.org/
It has the notion of disk pools that you can attach detach drives, I don't know if its production ready but still worth checking out.
How about MHDDFS, it's already available in most distro's, and works like JBOD, however if drive's die, you only loose the data on that drive, not all of them. It get's seen as one logical drive pool, so for example you can copy the logical drive pool to another larger capacity disk as you upgrade down the track. Minimal downtime and minimal hassle, and looks easy to implement. Check out how to use it here: http://zornsoftware.codenature.info/blog/why-i-ditched-raid-and-greyhole-for-mhddfs.html