This is hopefully a simple question. Right now we are deploying servers which will serve as data warehouses. I know with RAID 5 the best practice is 6 disks per RAID 5. However, our plan is to use RAID 10 (both for performance and safety). We have a total of 14 disks (16 actually, but two are being used for OS). Keeping in mind that performance is very much an issue, which is better - doing several RAID 1's? Do one large RAID 10? One large RAID 10 had been our original plan, but I want to see if anyone has any opinions I haven't thought of.
Please note: This system was designed for using RAID 1+0, so losing half of the raw storage capacity is not an issue. Sorry I hadn't mentioned that initially. The concern is more whether or not we want to use one large RAID 1+0 containing all 14 disks, or several smaller RAID 1+0's and then stripe across them using LVM. I know the best practice for higher RAID levels is to never use more than 6 disks in an array.
Take a look at this discussion detailing the disk layout for a RAID 1+0 setup on an HP ProLiant server:
6 Disk Raid 1+0
A Smart Array controller configured in RAID 1+0 is a stripe across mirrored pairs. Depending on how you've arranged your drive cages and which controller you're using, the disks will likely be paired across controller channels.
E.g. in a 4-disk setup:
physicaldrive 1I:1:1 pairs to physicaldrive 1I:1:3
physicaldrive 1I:1:2 pairs to physicaldrive 1I:1:4
With that number of disks, there's no downside to leaving them in a single logical drive. You'll get the benefits of more (MOAR) spindles for sequential workloads and increased random workload capabilities. I'd recommend tuning the controller cache to bias towards writes (lower-latency) and possibly make some choices at the OS level regarding filesystem choice (XFS!), I/O elevators (deadline) and block device tuning.
Which operating system distribution will this be running on?
Matthew - I'm a BIG Splunk client, we use R10 exclusively - whether it's on SAS disks for our low end boxes, Enterprise SSDs for medium sized systems or FusionIO cards for our busiest machines. You've been smart and sized for R10, trust your instincts, you're on the right path.
We just create one big PV/VG/LV for all of /splunkdata, leaving /opt/splunk on the boot disks by the way.
400GB/day is ~270 MB/min.
RAID5 might work. RAID 50 is probably best in terms of storage efficiency and write performance. RAID 10 will give you the best write performance but at the cost of 50% storage efficiency. I worry about heavy writes plus the random reads... that is going to cut into overall performance.
The type of disk you use will be critical. If you can use 10k or 15k drives performance will greatly increase, but of course these disks are more expensive and lesser capacity than enterprise SATA/SAS 7.2k drives, which go up to 3+TB at current time.
Ultimately, no one can tell you what is best for your application, so you need to test it yourself. My recommendation is to a RAID card that has a large write cache (512MB or above) and a decent sized read cache as well. Then, test various RAID combinations (I would suggest RAID5-6 and RAID5-10, as well as RAID50 and RAID10). See what performs best. Tweak the RAID card settings. You will find the optimal configuration.
"The concern is more whether or not we want to use one large Raid 1+0 containing all 14 disks, or several smaller raid 1+0's and then stripe across them using LVM."
Well. If you think about it, you're basically asking us wether you should let your raid controller do the RAID0-part of RAID10, or if you should let LVM do it.
I guess if you have the worst raid controller in the world then LVM could probably outperform it.. otherwise I think you're safe letting the raid controller do all the work.
RAID-10 will cut your available space in half. I'd recommend RAID-50 instead, which requires at least 6 drives; it will give you great fault tolerance and performance.
You'll want to check out and benchmark several RAID cards, as they don't all perform the same. In case you don't know this already, make sure to use enterprise SATA drives, not desktop ones, and not "green" drives.