Scott Hawkins

Asked: 2012-02-01 02:33:52 +0800 CST

Interpreting SQLIO test results

I've been running a series of load tests on a dedicated DB SAN from a pre-production cluster (Dell R710 connecting to a dedicated RAID10 SAN over 2 gigabit ethernet connections), and I'm not sure if I'm correctly interpreting the data.

For reference, here's the raw data.

Test 1

sqlio v1.5.SG
using system counter for latency timings, 2727587 counts per second
parameter file used: paramD100.txt
    file d:\tmp\testfile.dat with 2 threads (0-1) using mask 0x0 (0)
2 threads reading for 120 secs from file d:\tmp\testfile.dat
    using 64KB random IOs
    enabling multiple I/Os per thread with 2 outstanding
    buffering set to use hardware disk cache (but not file cache)
using specified size: 20480 MB for file: d:\tmp\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:   372.12
MBs/sec:    23.25
latency metrics:
Min_Latency(ms): 1
Avg_Latency(ms): 10
Max_Latency(ms): 159

Test 2

sqlio v1.5.SG
using system counter for latency timings, 2727587 counts per second
parameter file used: paramD100.txt
    file d:\tmp\testfile.dat with 2 threads (0-1) using mask 0x0 (0)
2 threads reading for 120 secs from file d:\tmp\testfile.dat
    using 64KB random IOs
    enabling multiple I/Os per thread with 2 outstanding
    buffering set to use hardware disk cache (but not file cache)
using specified size: 20480 MB for file: d:\tmp\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec:   358.26
MBs/sec:    22.39
latency metrics:
Min_Latency(ms): 1
Avg_Latency(ms): 10
Max_Latency(ms): 169

In order to reduce the difference between test results, these tests were run at 11:30am on 2 consecutive days.

Given this load pattern, should I be expecting as low a MBPS throughput as I'm getting, or am I interpreting this correctly and believe that there's either an issue with the network or the SAN (or the whole lot)?

Thanks.

Update #1

To give specifics, the setup is as follows.

Production DB cluster

Dell R710, with 2 x Broadcom 5709's (iSCSI and TOE offload capable, using Dell's Multipathing IO software). And yes, I've seen the 'Broadcom - die mutha' post :S

Switch

2 Juniper EX4200-48T's acting as a single virtual switch

One connection from each Broadcom NIC on each Cluster connects to one switch And there are 2 Gigabit connections from each Switch to the SAN.

SAN

Dell EqualLogic PS6000E iSCSI SAN, packed out with 16 (14 + 2 hotspare) 2tb 7200rpm drives

As far as I know, and from how I think this should work, we should theoretically be getting 200mbps, which as you can see, we're not.

Update 2

To give a bit more context, here's a graph showing the average mbps for 4 separate runs.

For reference, the Y axis is MBPS, and the X axis is the IO type (random or sequential), Pending IO's and the operation (read vs write).

Images disabled, so here's a link - Graph showing average results for 4 SQLIO runs

There are 2 things concerning me here -

Firstly, the random read throughput is lower than I'd have expected
And secondly, random write IO's plateau out at 110mbps, whereas this suggests the array should be capable of more than that.

Is this a roughly expected pattern for this type of setup? And is there anything else that looks out of place or wrong here?

Interpreting SQLIO test results

Update #1

Production DB cluster

Switch

SAN

Update 2

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?

Scott Hawkins's questions

Update #1

Production DB cluster

Switch

SAN

Update 2