I've been trying to really figure out what my IOPS are on my DB server array and see if it's just too much. The array is four 72.6gb 15k rpm drives in RAID 5. To calculate IOPS for RAID 5 the following formula is used: (reads + (4 * Writes)) / Number of disks = total IOPS
. The formula is from MSDN. I also want to calculate the Avg Queue Length but I'm not sure where they are getting the formula from, but i think it reads on that page as avg que length/number of disks = actual queue
.
To populate that formula I used the perfmon to gather the needed information. I came up with this, under normal production load: (873.982 + (4 * 28.999)) / 4 = 247.495
. Also the disk queue lengh of 14.454/4 = 3.614
.
So to the question, am I wrong in thinking this array has a very high disk IO?
Edit
I got the chance to review it again this morning under normal/high load. This time with even bigger numbers and IOPS in excess of 600 for about 5 minutes then it died down again. But I also took a look at the Avg sec/Transfer
, %Disk Time
, and %Idle Time
. These number were taken when the reads/writes per sec were only 332.997/17.999 respectively.
%Disk Time: 219.436
%Idle Time: 0.300
Avg Disk Queue Length: 2.194
Avg Disk sec/Transfer: 0.006
Pages/sec: 2927.802
% Processor Time: 21.877
Edit (again)
Looks like I have that issue solved. Thanks for the help. Also for a pretty slick parser I found this: http://pal.codeplex.com/ It works pretty well for breaking down the data into something usable.
I've been doing a bit of research in this area lately myself. There's a great calculator here that people on this site have pointed me at. Throwing some basic numbers into it with a 97:3 read to write ratio and not factoring in cache hits, it looks like your array should be able to mechanically deliver about 675 IOPS at 4k. You're doing 902 transfers per second which would be high, as your queue length per disk also indicates.
You might also measure sec/Transfer and %Disk time. I've found %Disk time to be rather odd with RAID arrays, and I've found it more accurate to measure %Idle time and then use the formula 100 - %Idle to calculate busy time. I bet you'll find that your disk is very busy and you'll see a lot of 20ms+ transfer times. IMO these numbers are more clear cut if you want to conclude that you've got a disk problem.
Here's a great article that gets referenced quite a bit. It's dated but relevant, and is especially useful for explaining why %Disk time and Avg. Queue Length can be difficult to interpret.