I've been running a series of load tests on a dedicated DB SAN from a pre-production cluster (Dell R710 connecting to a dedicated RAID10 SAN over 2 gigabit ethernet connections), and I'm not sure if I'm correctly interpreting the data.
For reference, here's the raw data.
Test 1
sqlio v1.5.SG
using system counter for latency timings, 2727587 counts per second
parameter file used: paramD100.txt
file d:\tmp\testfile.dat with 2 threads (0-1) using mask 0x0 (0)
2 threads reading for 120 secs from file d:\tmp\testfile.dat
using 64KB random IOs
enabling multiple I/Os per thread with 2 outstanding
buffering set to use hardware disk cache (but not file cache)
using specified size: 20480 MB for file: d:\tmp\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 372.12
MBs/sec: 23.25
latency metrics:
Min_Latency(ms): 1
Avg_Latency(ms): 10
Max_Latency(ms): 159
Test 2
sqlio v1.5.SG
using system counter for latency timings, 2727587 counts per second
parameter file used: paramD100.txt
file d:\tmp\testfile.dat with 2 threads (0-1) using mask 0x0 (0)
2 threads reading for 120 secs from file d:\tmp\testfile.dat
using 64KB random IOs
enabling multiple I/Os per thread with 2 outstanding
buffering set to use hardware disk cache (but not file cache)
using specified size: 20480 MB for file: d:\tmp\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 358.26
MBs/sec: 22.39
latency metrics:
Min_Latency(ms): 1
Avg_Latency(ms): 10
Max_Latency(ms): 169
In order to reduce the difference between test results, these tests were run at 11:30am on 2 consecutive days.
Given this load pattern, should I be expecting as low a MBPS throughput as I'm getting, or am I interpreting this correctly and believe that there's either an issue with the network or the SAN (or the whole lot)?
Thanks.
Update #1
To give specifics, the setup is as follows.
Production DB cluster
Dell R710, with 2 x Broadcom 5709's (iSCSI and TOE offload capable, using Dell's Multipathing IO software). And yes, I've seen the 'Broadcom - die mutha' post :S
Switch
2 Juniper EX4200-48T's acting as a single virtual switch
One connection from each Broadcom NIC on each Cluster connects to one switch And there are 2 Gigabit connections from each Switch to the SAN.
SAN
Dell EqualLogic PS6000E iSCSI SAN, packed out with 16 (14 + 2 hotspare) 2tb 7200rpm drives
As far as I know, and from how I think this should work, we should theoretically be getting 200mbps, which as you can see, we're not.
Update 2
To give a bit more context, here's a graph showing the average mbps for 4 separate runs.
For reference, the Y axis is MBPS, and the X axis is the IO type (random or sequential), Pending IO's and the operation (read vs write).
Images disabled, so here's a link - Graph showing average results for 4 SQLIO runs
There are 2 things concerning me here -
- Firstly, the random read throughput is lower than I'd have expected
- And secondly, random write IO's plateau out at 110mbps, whereas this suggests the array should be capable of more than that.
Is this a roughly expected pattern for this type of setup? And is there anything else that looks out of place or wrong here?