I run several Linux machines at home and at work. Most of my computing is I/O-limited, e.g., large compiler regression suites and the like. At home I also have one machine that captures high-definition video from a terrestrial antenna. One sporting event goes into a single file of about 30GB; I have over 4TB of storage devoted to HD video.
My question is under what circumstances should I stripe? My current setup for my home directory and root filesystem is to stripe XFS over two drives using LVM. Sample output from lvdisplay -m
:
Logical extent 0 to 117247:
Type striped
Stripes 2
Stripe size 64 KB
Stripe 0:
Physical volume /dev/sdb2
Physical extents 0 to 58623
Stripe 1:
Physical volume /dev/sda2
Physical extents 0 to 58623
However, I've just read articles with titles like RAID: Not Such a Clever Idea for Your Home PC , Are two drives better than one?, and a piece from storagereview.com—all of which say that striping with RAID-0 is a waste of money and effort for a single-user desktop machine. But
These studies use Windows, not Linux
They use hardware RAID, not LVM
They're not dealing with 30GB video files
I'd like other people's experience on the question for what sorts of configurations, if any, does LVM striping across two disks improve the single-user Linux desktop experience?
If you're talking about a "mirror" stripe, you would use it to migrate data from one drive to another. You would not use it as a form of RAID-1, as it is no where as efficient as the regular
md
driver.If you're talking about a stripe that extends drive space, well, that's what it's used for: when you run out of space.
Otherwise, I wouldn't bother with the LVM stripe options.
Well, I've turned on LVM mirroring for a busy server, with two drives in a eSATA enclosure. The behavior I observed was that it would "burst" the mirroring writes to the other drive, and often lagged behind the primary drive by a second or two. This made me very nervous, as there could be potential consistency issues during a primary drive failure. The performance was also abysmal under heavy write loads, and read performance was not improved because it did not use the other drive for reading. I scrapped that setup and switched it to a RAID-1 via the
md
driver and the speed increase was immediately visible, with no "bursty" behavior.Some recommendations:
RAID-0 striping exists for a reason: you're out of storage capacity and you temporarily need to increase it. The key word here is "temporarily", the idea being that you'll soon remedy the problem with a permanent fix.
If you're talking about some gaming box that you can wipe and reinstall, then yeah, it's a waste of time.
If you're talking about a box that handles your livelihood, and the only way to make things work is to extend the volume onto another drive, then that's a different story. You need to do what you need to do.
Striping will probably not help your everyday desktop activities. Striping is best used when you have large and (relatively) long IO operations. On a regular desktop, you usually have a lot of small, random IO operations, which will not benefit from striping.
IMHO, striping is best used:
Considering that HD broadcast uses a max bit rate of 19.4Mbit/sec. That's less than 2.5MB/s. You could record 4-5 HD broadcasts at once and a single hard drive would be able to keep up. So I would say that the risks outweigh any benefits in this case.
The only real situations that you may notice a difference is if you're copying these 30GB files to another system that also had a raid of some sort over Gigabit ethernet or possibly editing these videos. But my guess is that we are talking about a mythtv box in this case which is probably flagging commercials as well. So the commercial flagging may be slightly faster but mythtv does this as a background process once the stream has been captured. I don't see any improvements there from an end user point of view though.
I say - always use LVM. You never know when you might want to extend the LV you're using, so why not have the option to do that just by adding another drive, or by dd-ing the entire thing to a larger drive and adding a PV on the empty space.
Not to mention the possibility of snapshots, and the rest of the features LVM provides
And as for those articles, raid and LVM are different things, and are there to achieve different purposes. Raid is mainly to increate the robustness of the storage, or to improve it's speed. LVM is about flexible space management, and some extra goodies like snapshots, which is only possible on high end storage systems otherwise.
Striping is only useful for where you can risk losing the data. Your block device will be half as reliable as a single drive. If you can live with that in return for extra performance, then striping might work out well for you.
Whether you get the extra performance will depend on the rest of your system. Are there bottlenecks between the SATA chips and the rest of the system. Are you really I/O bound etc.
The only easy way to see if you will gain anything is by doing the experiment with the job load you want to use the system with.
As others have said always use LVM because of the flexibility it gives you. I would not use un-mirrored stripes at home unless it was for tmp space.
In general, the "hardware RAID" adapters on workstation-class devices are junk. There is a wealth of StackOverflow questions on that topic, so I'll leave it at that.
Whatever route you'll take, striping (aka RAID-0) is a good technique to improve the performance of IO bound applications, which you have. The price of that performance is a higher, unknown additional risk of catastrophic failure.
Is that a big deal? For me, it isn't -- I'm using a Mac Pro with 4 striped drives, all backed up with Time Machine. I can live with being out of commission for a few hours while I restore from backup. For you, I would recommend employing a backup/restore strategy that you can live with before proceeding with striping.
IMO, you should use striping by itself only when you don't care at all about your data.
when you stripe data across multiple drives, you increase the risk of losing everything on all drives because the failure of any one drive means complete loss of the entire volume...and the more drives you stripe over, the greater the risk.
accordingly, IMO, the answer to "when should i stripe?" is either "never" or "when you're striping mirrored volumes, as in RAID-10".
if both the data itself and IO performance is important to you, then get a good hardware SAS RAID card (e.g. an adaptec 3805 or 5805 or similar) with a large battery-backed write cache, and make a RAID-6 volume. to get 4TB with RAID-6, you'll need 6 x 1TB drives. plus one more as a hot or cold spare.
SAS controllers support both SAS drives and SATA drives. the models mentioned above support up to 8 drives directly but can support more through the use of SAS expanders (at the cost of performance - more drives mean less IO bandwidth per drive, but you could probably expand up to 16 drives or so without noticing any real performance hit. 3Gbps per SATA channel gives you maybe 250MB/s of IO, and current good non-SSD drives can use about 100-120MB/s or so each).
alternatively, use software RAID-10 (a striped array of mirrored volumes). a 4TB array would require 8x1TB drives. e.g. 4 x RAID-1 arrays striped together with RAID-0 (or LVM) for a single 4TB volume.
you can use LVM on top of these RAID arrays to manage the space. If you're going the RAID-10 route, then the striping can be done with LVM rather than RAID-0.
one other thing to consider is to separate the IO-consuming applications so that they don't compete for IO. e.g. keep your OS on one smallish drive, say 80GB (or a RAID-1 mirrored pair), your source code for compiler regression on another drive or RAID-1 pair, and your video data on either sw RAID-10 or hw RAID-6.
and install as much memory as you possibly can into the machine as linux will use it all for disk buffering. most common motherboards support up to 4 DDR-2 or 6 DDR-3 memory sticks, so with 2GB sticks being far cheaper than 4GB sticks you can install a maximum of 8GB or 12GB at a reasonable price. if you need more than that, it's more cost-effective to replace the motherboard with a server MB (from Tyan or SuperMicro etc) with more RAM sockets than it is to use 4GB sticks.
oh, and hot swap bays are a good idea - when (not if, when) a drive fails you need to be able to replace it as quickly as possible. RAID-6 can cope with any two drives failing simultaneously, so when 1 drive fails it will only take one more drive dying to take everything with it. RAID-10 can cope with more drives failing (up to half of the drives can fail as long as 1 of each mirrored pair survives).
and, finally, backup. RAID, as has been mentioned many times before by many people, is NOT a substitute for backup. The only tape medium currently capable of backing up the quantity of data you have in a reasonable time without spending days swapping cartridges is LTO-4. The drives for this are expensive and the tape cartridges seem expensive (but are actually cheaper than hard drives when you calculate the cost per gigabyte). if your budget doesn't stretch to that, then you could use multiple extra drives (connected via e-sata, firewire, a spare hot-swap bay...or even USB) to backup to - insert drive, run backup, remove drive, store on shelf or off-site...current drive capacities are up to 2TB, and will get larger and cheaper over time. BTW, at current prices (approx $1500-$2000 for a bare LTO drive vs approx $100 for a 1TB hard disk - approx. current australian dollar prices), the cost of an LTO drive would buy you 15 to 20 hard drives for backup...and you could buy them as you need them rather than all at once, with prices dropping noticably each time.
It might be because the drive all my games were installed on is a bit older than the drive my main system/work partitions are on, but I noticed a definite improvement in loading and inter-level times for some games when I added a second drive (of the same lineage) and set then up as as RAID0 array. Not close to the theoretical twice-the-performance, but enough to be quite noticeable. This is of course quite subjective: I have not perform any statistically sound benchmarks.
Those articles are fairly old, and the storagereview one specifically talks about the limitations of the PCI bus (meaning that it would be the bottleneck in terms of bulk throughput, not modern drives, limiting the bulk performance of RAID0). But if your controller is PCI-E based then you will have greater throughput to play with.
Having said that, I only setup RAID0 on my desktop box as an experiment to see what difference it would actually make. Were I to put together a brand new desktop PC at home with nce new shiney fast drives I'm not sure if I would do the same again or not. For desktop use generally I would recommend using a pair of drives as a RAID1 array to get the redundancy instead of using them in a RAID0 configuration for performance reasons.
LVM striping is handy for mixing striped and non-striped volumes in single volume group, making it easy to move space around (shrink one volume, extend another). This is great if you don't know in advance what you need.
Another use-case I had before was to make it less obvious to guests using the machine that striping is in use. With md RAID, one sees
/dev/md0
indf
output and finds a lot of detail in/proc/mdstat
.For video editing, taking the sequential read/write performance from https://www.linuxtoday.com/blog/raid-vs-lvm/ as a guide, md RAID-0 may give you slightly better performance than LVM striping. I'd think though that the choice of filesystem (ext4/xfs/btrfs etc.) is more important.