First my apologies if this questions has been asked. I've googled… and googled some more and can't seem to find what I'm looking for.
Anyone know of a software or web based calculator that will let you plug in a RAID configuration (example below) and output expected R/W speeds, hopefully in MBs
Number of disk, size, spin, type , Raid type
EX. (8, 73Gb, 15k, SAS, Raid 1/0)
Or
EX. (6, 146Gb, 10k, FC, Raid 5)
I found severely that calculate available space. Some that give some speed info, but they can't be realistic because they don’t take spin or type in the consideration.
There are quite a few variables that can affect speed, but here's some basic ideas to get a feel for what a given raid set should be capable of.
Raw disk throughput
Assuming that a random seek completes an average of 1/2 of a rotation (180 degrees) away from the sector you want, the average random access time is one average seek plus the time the disk takes to rotate 180 degrees.
On a 10K RPM disk 1/2 of a rotation takes approximately 3ms.
On a 15K RPM disk 1/2 of a rotation takes approximately 2ms.
Average seek time for a Seagate Cheetah 15K6 is quoted at 3.5ms for reads and 3.9ms for writes (I presume the writes include a period to align the head on the servo tracks). A 10K disk is slightly longer.
RAID-1
On a non-striped RAID-1, reads can be split between the two disks, but writes must go to both drives. Random operations will give you twice the throughput of a single disk for reads and approximately the throughput of a single disk for writes. Sequential I/O tends to peak at the maximum throughput of a single disk. Interface cables may or may not present a bottleneck.
Striped RAID sets
RAID-5, RAID-10 or RAID-50 disks have the data split up into chunks spread in a round-robin fashion amongst the members of the RAID set. Assuming no read-ahead optimisation a disk can read at most one stripe per revolution of the disk. A 10K disk revolves about 170 times per second and a 15K disk revolves about 250 times per second.
Thus, an array with 14 15K disks using 64K stripes would have a theoretical streaming throughput of around 210MB/sec assuming no other constraints. If the controller is not fast enough the practical rate may be lower (for example, I could never get a dell PV660 (Mylex DAC-FFX) to get more than one read per two revolutions of the disks). A heavily random access workload would also be somewhat slower because the disk accesses will average less than one per revolution of the disk. Some reads will also be used on parity data so the actual application data throughput would be a bit slower.
Write bottlenecks
The fastest possible write on a RAID-5 involves two reads and two writes. The controller has to read the old block and corresponding parity block, XOR the old and new data with the parity block to recalculate the parity and write out the new block and parity. Caching can reduce the amount of disk activity if the old block and parity block are in cache. The same applies to a RAID-50.
A RAID-10 needs two disk accesses per write - one to the main and the other to the mirror. Read performance is roughly equivalent to a RAID-5.
Controller bottlenecks
In some cases (fibre channel is prone to this) the connections to the physical disk subsystem are of somewhat lower bandwidth than the disks are theoretically capable of delivering. Also, disk controllers can perform poorly. In many cases this is a more significant limitation than the disks themselves. High-end SAN hardware often has large multiprocessor machines as controllers - they may also have custom hardware for fast parity calculations. The controller for an EMC DMX takes up half a rack by itself - before you put any disks on it.
Tuning the disk itself
Caching and read-ahead parameters on the disks themselves may also affect performace for certain workloads. For example, disks using Seagate's 'V' firmware might be set up for fewer larger cache segments and agressive read-ahead to optimise for streaming throughput of media data. The same physical disk configured for use in a Clariion would be configured with more, smaller cache segments in order to support a larger number of smaller writes from many clients on a SAN.
I don't know of a calcualtor that can tell you that, in part because there are so many other factors besides just the disk and connection type factors. The RAID controllers make a huge difference, as can the firmware on those controllers, the type of data, as does the ability of the motherboard to push data. Your best bet is benchmarking on your own. I can't even think of a way to write a calculator do do that sort of thing. Also I believe that probably for most operations the network will bottleneck before the RAID.
The sort of speed that you will see with vary a lot depending on the drives, the controller, and your workload, so you are not going to fine a nice easy calculator that will give good accurate+precise results.
You may already realise this, but..
Besides all of the drive characteristics, the speed is going to be largely governed by the performance of any given RAID card. Which will depend not only on obvious things like it's interface (eg PCI-X). But more dramatically the quality and performance of it's chipset routines.
As others have said, I don't think this can be done in the terms you've stated. I think the best you could do is work out the relative performance of different raid options i.e. treat the hardware as a constant. It would still be inaccurate, but may give some guidance.
But I think you also need to consider why there are different raid configurations. One usually chooses by judging the trade-offs between performance, capacity, data protection and cost.
If you're not familiar with the trade-offs, take a look at a comparison chart to see the relative merits.
It sounds like performance is your main criteria here, so you probably know what raid level you want; you just need to find the best performing hardware.
Here is an example : I made benchmarks with the same drives (7x750GB seagate barracuda ES2), same RAID configuration (stripe size, etc), same motherboard (Supermicro H8DMe), same CPU (dual Opteron 2214), same RAM (8GB ECC) and same operating system (Linux), same filesystem (XFS, nobarrier option) and different RAID controllers. Appreciate the results :
Of course these are the optimal results after fine-setting all software parameters for each controller (read-ahead, caching options, request queue length, request size...) by doing long and repeated benchmarks while adjusting the various knobs.
One of the funny thing I discovered by careful benchmarking is that the settings are entirely different if you use Barracuda ES2 (32MB cache) and Barracuda ES (16MB cache) drives, though the top performance is about the same.
Unfortunately, storage and RAID is hard. That's why you won't find an easy-to-go performance calculator.
I found a calculator that will give you multipliers of speed.
It boils down to
If those calculators exist, they will be on the vendor web-sites. So many things can affect throughput speeds that a simple calculator would be pretty worthless. Especially for any RAID that includes parity, as those tend to bottleneck more on the RAID Controller's CPU than anything else. The best you'll find is, "rule of thumb, your mileage may vary," type estimators.
There's a lot more involved to the speed than the underlying raid layout so I doubt you'll find such a calculator.
Things that make can make a difference:
Raid Type
What type of bus is the controller in...what is it sharing this bus with. Most desktop class motherboards share PCI buses with multiple slots.
File system type, block size, and it's alignment with the chunksize on the underlaying raid also come into play.
Drive type, rotational speed, cache size
Finally the workload will also interact with all of these things. So the more important question is actually what disk and raid layout are a good match for your workload and data availability goals.