I have run 2 tests with different setup as explained below. I'm using Azure VM Standard D64s v3.
For the first setup I have mounted 4 Premium SSD disks of 4 TB each, and am performing IO operations with 1 thread writing to each disk. the FIO job is
[global]
size=30g
direct=1
iodepth=256
ioengine=libaio
bs=8k
numjobs=1
[writer1]
rw=randwrite
directory=/datadrive
[writer2]
rw=randwrite
directory=/datadrive2
[writer4]
rw=randwrite
directory=/datadrive4
[writer5]
rw=randwrite
directory=/datadrive5
The output is
writer1: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer2: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer4: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer5: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
fio-3.1
Starting 4 processes
writer1: Laying out IO file (1 file / 30720MiB)
writer2: Laying out IO file (1 file / 30720MiB)
writer4: Laying out IO file (1 file / 30720MiB)
writer5: Laying out IO file (1 file / 30720MiB)
Jobs: 1 (f=1): [w(1),_(3)][26.1%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 02m:58s]
writer1: (groupid=0, jobs=1): err= 0: pid=58194: Tue Feb 2 13:12:50 2021
write: IOPS=6041, BW=47.2MiB/s (49.5MB/s)(2978MiB/63096msec)
slat (usec): min=3, max=118643, avg=26.05, stdev=350.12
clat (usec): min=3, max=32369k, avg=42333.21, stdev=659122.27
lat (usec): min=46, max=32369k, avg=42359.46, stdev=659122.00
clat percentiles (usec):
| 1.00th=[ 76], 5.00th=[ 108], 10.00th=[ 174],
| 20.00th=[ 1254], 30.00th=[ 3392], 40.00th=[ 5014],
| 50.00th=[ 6521], 60.00th=[ 8848], 70.00th=[ 13042],
| 80.00th=[ 20055], 90.00th=[ 30540], 95.00th=[ 39584],
| 99.00th=[ 120062], 99.50th=[ 851444], 99.90th=[ 7012877],
| 99.95th=[16575890], 99.99th=[17112761]
bw ( KiB/s): min= 10, max=153154, per=25.59%, avg=50460.17, stdev=63267.92, samples=121
iops : min= 1, max=19144, avg=6307.31, stdev=7908.42, samples=121
lat (usec) : 4=0.01%, 20=0.01%, 50=0.02%, 100=4.10%, 250=8.44%
lat (usec) : 500=3.32%, 750=1.71%, 1000=1.33%
lat (msec) : 2=4.04%, 4=10.16%, 10=30.03%, 20=16.68%, 50=17.47%
lat (msec) : 100=1.60%, 250=0.44%, 500=0.07%, 750=0.07%, 1000=0.06%
lat (msec) : 2000=0.11%, >=2000=0.35%
cpu : usr=2.82%, sys=11.92%, ctx=32608, majf=0, minf=15
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwt: total=0,381220,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
writer2: (groupid=0, jobs=1): err= 0: pid=58195: Tue Feb 2 13:12:50 2021
write: IOPS=6476, BW=50.6MiB/s (53.1MB/s)(3080MiB/60882msec)
slat (usec): min=3, max=160087, avg=29.46, stdev=435.82
clat (usec): min=15, max=36835k, avg=39494.66, stdev=876736.78
lat (usec): min=57, max=36835k, avg=39524.33, stdev=876736.58
clat percentiles (usec):
| 1.00th=[ 176], 5.00th=[ 1647], 10.00th=[ 3097],
| 20.00th=[ 4686], 30.00th=[ 5669], 40.00th=[ 6849],
| 50.00th=[ 8586], 60.00th=[ 11338], 70.00th=[ 16057],
| 80.00th=[ 23987], 90.00th=[ 34341], 95.00th=[ 43779],
| 99.00th=[ 88605], 99.50th=[ 130548], 99.90th=[ 1400898],
| 99.95th=[17112761], 99.99th=[17112761]
bw ( KiB/s): min= 10, max=155414, per=30.24%, avg=59631.85, stdev=67499.68, samples=106
iops : min= 1, max=19426, avg=7453.75, stdev=8437.28, samples=106
lat (usec) : 20=0.01%, 50=0.01%, 100=0.39%, 250=1.15%, 500=1.21%
lat (usec) : 750=0.55%, 1000=0.54%
lat (msec) : 2=1.70%, 4=9.61%, 10=40.86%, 20=19.14%, 50=21.81%
lat (msec) : 100=2.09%, 250=0.78%, 500=0.01%, 750=0.01%, 1000=0.03%
lat (msec) : 2000=0.04%, >=2000=0.09%
cpu : usr=2.84%, sys=12.67%, ctx=8384, majf=0, minf=15
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwt: total=0,394287,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
writer4: (groupid=0, jobs=1): err= 0: pid=58196: Tue Feb 2 13:12:50 2021
write: IOPS=6522, BW=50.0MiB/s (53.4MB/s)(3110MiB/61022msec)
slat (usec): min=3, max=291305, avg=20.22, stdev=486.94
clat (nsec): min=1500, max=11376M, avg=39221241.22, stdev=371308305.44
lat (usec): min=39, max=11376k, avg=39241.67, stdev=371308.58
clat percentiles (usec):
| 1.00th=[ 53], 5.00th=[ 67], 10.00th=[ 77],
| 20.00th=[ 97], 30.00th=[ 135], 40.00th=[ 343],
| 50.00th=[ 1172], 60.00th=[ 2442], 70.00th=[ 4424],
| 80.00th=[ 7439], 90.00th=[ 24249], 95.00th=[ 36963],
| 99.00th=[ 943719], 99.50th=[ 2466251], 99.90th=[ 5939135],
| 99.95th=[ 7079986], 99.99th=[11207181]
bw ( KiB/s): min= 10, max=153250, per=26.70%, avg=52653.34, stdev=55397.61, samples=121
iops : min= 1, max=19156, avg=6581.43, stdev=6924.70, samples=121
lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.58%
lat (usec) : 100=20.60%, 250=16.69%, 500=4.34%, 750=3.05%, 1000=2.67%
lat (msec) : 2=9.20%, 4=10.93%, 10=15.50%, 20=4.87%, 50=9.05%
lat (msec) : 100=0.87%, 250=0.21%, 500=0.22%, 750=0.13%, 1000=0.11%
lat (msec) : 2000=0.36%, >=2000=0.61%
cpu : usr=3.07%, sys=12.42%, ctx=88453, majf=0, minf=1392
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwt: total=0,398034,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
writer5: (groupid=0, jobs=1): err= 0: pid=58197: Tue Feb 2 13:12:50 2021
write: IOPS=6347, BW=49.6MiB/s (51.0MB/s)(2984MiB/60181msec)
slat (usec): min=3, max=623424, avg=25.05, stdev=1061.43
clat (usec): min=38, max=36942k, avg=40305.49, stdev=896283.77
lat (usec): min=54, max=36942k, avg=40330.71, stdev=896284.15
clat percentiles (usec):
| 1.00th=[ 81], 5.00th=[ 293], 10.00th=[ 2057],
| 20.00th=[ 3556], 30.00th=[ 4686], 40.00th=[ 5473],
| 50.00th=[ 6652], 60.00th=[ 9110], 70.00th=[ 12911],
| 80.00th=[ 25822], 90.00th=[ 35390], 95.00th=[ 45351],
| 99.00th=[ 102237], 99.50th=[ 164627], 99.90th=[ 2004878],
| 99.95th=[17112761], 99.99th=[17112761]
bw ( KiB/s): min= 16, max=153843, per=31.67%, avg=62458.55, stdev=67494.99, samples=98
iops : min= 2, max=19230, avg=7807.10, stdev=8436.72, samples=98
lat (usec) : 50=0.01%, 100=2.27%, 250=2.52%, 500=1.31%, 750=0.92%
lat (usec) : 1000=0.57%
lat (msec) : 2=2.22%, 4=13.76%, 10=39.16%, 20=13.02%, 50=20.89%
lat (msec) : 100=2.29%, 250=0.76%, 500=0.04%, 750=0.09%, 1000=0.02%
lat (msec) : 2000=0.05%, >=2000=0.10%
cpu : usr=2.42%, sys=10.44%, ctx=13976, majf=0, minf=704
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwt: total=0,381975,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
In the second run, I created a RAID 0 array of 4 4 TB disks and mounted it. Now I run the following job
[global]
size=30g
direct=1
iodepth=256
ioengine=libaio
bs=8k
numjobs=1
[writer1]
rw=randwrite
directory=/datadrive9
[writer2]
rw=randwrite
directory=/datadrive9
[writer4]
rw=randwrite
directory=/datadrive9
[writer5]
rw=randwrite
directory=/datadrive9
Output is
writer1: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer2: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer4: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer5: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
fio-3.1
Starting 4 processes
writer2: Laying out IO file (1 file / 30720MiB)
writer4: Laying out IO file (1 file / 30720MiB)
writer5: Laying out IO file (1 file / 30720MiB)
Jobs: 4 (f=4): [w(4)][10.4%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 09m:43s]
writer1: (groupid=0, jobs=1): err= 0: pid=22209: Tue Feb 2 13:53:40 2021
write: IOPS=6011, BW=46.0MiB/s (49.2MB/s)(3210MiB/68338msec)
slat (usec): min=4, max=19116, avg=10.13, stdev=52.29
clat (usec): min=21, max=13267k, avg=42549.89, stdev=517341.32
lat (usec): min=42, max=13267k, avg=42560.22, stdev=517341.29
clat percentiles (usec):
| 1.00th=[ 69], 5.00th=[ 103], 10.00th=[ 223],
| 20.00th=[ 578], 30.00th=[ 873], 40.00th=[ 1303],
| 50.00th=[ 1680], 60.00th=[ 2147], 70.00th=[ 2737],
| 80.00th=[ 4490], 90.00th=[ 15270], 95.00th=[ 27919],
| 99.00th=[ 333448], 99.50th=[ 2038432], 99.90th=[ 9328133],
| 99.95th=[11744052], 99.99th=[12549358]
bw ( KiB/s): min= 16, max=347020, per=35.88%, avg=85557.71, stdev=105200.55, samples=77
iops : min= 2, max=43377, avg=10694.30, stdev=13150.00, samples=77
lat (usec) : 50=0.07%, 100=4.68%, 250=5.81%, 500=6.68%, 750=8.72%
lat (usec) : 1000=7.22%
lat (msec) : 2=23.63%, 4=21.15%, 10=9.07%, 20=4.43%, 50=6.50%
lat (msec) : 100=0.52%, 250=0.37%, 500=0.33%, 750=0.12%, 1000=0.10%
lat (msec) : 2000=0.10%, >=2000=0.50%
cpu : usr=2.32%, sys=7.07%, ctx=47306, majf=0, minf=699
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwt: total=0,410837,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
writer2: (groupid=0, jobs=1): err= 0: pid=22210: Tue Feb 2 13:53:40 2021
write: IOPS=6393, BW=49.9MiB/s (52.4MB/s)(3412MiB/68310msec)
slat (usec): min=3, max=325702, avg=31.82, stdev=630.62
clat (usec): min=32, max=13313k, avg=40001.34, stdev=469309.55
lat (usec): min=54, max=13313k, avg=40033.41, stdev=469313.55
clat percentiles (usec):
| 1.00th=[ 100], 5.00th=[ 196], 10.00th=[ 453],
| 20.00th=[ 1045], 30.00th=[ 1549], 40.00th=[ 2180],
| 50.00th=[ 3195], 60.00th=[ 4686], 70.00th=[ 7242],
| 80.00th=[ 11863], 90.00th=[ 24249], 95.00th=[ 35390],
| 99.00th=[ 156238], 99.50th=[ 2105541], 99.90th=[ 9462350],
| 99.95th=[ 9865004], 99.99th=[13086229]
bw ( KiB/s): min= 16, max=243815, per=31.19%, avg=74378.34, stdev=68597.51, samples=94
iops : min= 2, max=30476, avg=9297.04, stdev=8574.66, samples=94
lat (usec) : 50=0.01%, 100=1.02%, 250=5.44%, 500=4.23%, 750=3.57%
lat (usec) : 1000=4.77%
lat (msec) : 2=18.39%, 4=18.58%, 10=20.83%, 20=10.44%, 50=10.46%
lat (msec) : 100=1.01%, 250=0.39%, 500=0.14%, 750=0.05%, 1000=0.05%
lat (msec) : 2000=0.12%, >=2000=0.51%
cpu : usr=3.29%, sys=16.53%, ctx=25760, majf=0, minf=705
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwt: total=0,436710,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
writer4: (groupid=0, jobs=1): err= 0: pid=22211: Tue Feb 2 13:53:40 2021
write: IOPS=8253, BW=64.5MiB/s (67.6MB/s)(4405MiB/68309msec)
slat (usec): min=3, max=984211, avg=29.89, stdev=1823.60
clat (usec): min=39, max=13365k, avg=30969.61, stdev=406333.66
lat (usec): min=60, max=13365k, avg=30999.67, stdev=406337.80
clat percentiles (usec):
| 1.00th=[ 94], 5.00th=[ 196], 10.00th=[ 457],
| 20.00th=[ 1254], 30.00th=[ 1860], 40.00th=[ 2376],
| 50.00th=[ 2868], 60.00th=[ 3425], 70.00th=[ 4359],
| 80.00th=[ 6718], 90.00th=[ 17957], 95.00th=[ 28967],
| 99.00th=[ 154141], 99.50th=[ 994051], 99.90th=[ 8355054],
| 99.95th=[ 9328133], 99.99th=[13086229]
bw ( KiB/s): min= 16, max=325328, per=39.05%, avg=93104.42, stdev=94687.86, samples=97
iops : min= 2, max=40666, avg=11637.75, stdev=11835.95, samples=97
lat (usec) : 50=0.01%, 100=1.36%, 250=5.02%, 500=4.31%, 750=3.65%
lat (usec) : 1000=2.97%
lat (msec) : 2=15.06%, 4=34.67%, 10=17.73%, 20=6.39%, 50=7.02%
lat (msec) : 100=0.64%, 250=0.33%, 500=0.17%, 750=0.07%, 1000=0.11%
lat (msec) : 2000=0.14%, >=2000=0.34%
cpu : usr=3.06%, sys=17.32%, ctx=27740, majf=0, minf=700
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwt: total=0,563804,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
writer5: (groupid=0, jobs=1): err= 0: pid=22212: Tue Feb 2 13:53:40 2021
write: IOPS=9155, BW=71.5MiB/s (75.0MB/s)(4886MiB/68308msec)
slat (usec): min=3, max=1253.7k, avg=33.07, stdev=2124.81
clat (usec): min=41, max=13312k, avg=27912.10, stdev=371043.54
lat (usec): min=56, max=13312k, avg=27945.40, stdev=371050.01
clat percentiles (usec):
| 1.00th=[ 116], 5.00th=[ 302], 10.00th=[ 619],
| 20.00th=[ 1680], 30.00th=[ 2769], 40.00th=[ 3851],
| 50.00th=[ 4752], 60.00th=[ 5997], 70.00th=[ 7963],
| 80.00th=[ 11076], 90.00th=[ 21627], 95.00th=[ 33817],
| 99.00th=[ 164627], 99.50th=[ 497026], 99.90th=[ 7616857],
| 99.95th=[ 9596568], 99.99th=[13086229]
bw ( KiB/s): min= 16, max=363341, per=36.27%, avg=86484.03, stdev=82997.59, samples=116
iops : min= 2, max=45417, avg=10810.06, stdev=10374.71, samples=116
lat (usec) : 50=0.01%, 100=0.62%, 250=3.40%, 500=4.25%, 750=3.26%
lat (usec) : 1000=2.37%
lat (msec) : 2=8.40%, 4=19.23%, 10=35.58%, 20=11.98%, 50=8.25%
lat (msec) : 100=1.27%, 250=0.64%, 500=0.24%, 750=0.06%, 1000=0.07%
lat (msec) : 2000=0.16%, >=2000=0.21%
cpu : usr=3.77%, sys=21.57%, ctx=26577, majf=0, minf=17
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwt: total=0,625402,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
WRITE: bw=233MiB/s (244MB/s), 46.0MiB/s-71.5MiB/s (49.2MB/s-75.0MB/s), io=15.5GiB (16.7GB), run=68308-68338msec
Disk stats (read/write):
md0: ios=0/2036771, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/509201, aggrmerge=0/0, aggrticks=0/14988260, aggrin_queue=14611762, aggrutil=76.96%
sdm: ios=0/508620, merge=0/0, ticks=0/16830518, in_queue=16437572, util=75.02%
sdl: ios=0/508786, merge=0/0, ticks=0/13055450, in_queue=12693024, util=76.96%
sdk: ios=0/508787, merge=0/0, ticks=0/14998518, in_queue=14628168, util=75.91%
sdc: ios=0/510612, merge=0/0, ticks=0/15068554, in_queue=14688284, util=74.71%
I am noticing difference in latency distribution in these two runs. In non-RAID setup( first run), the latencies 4=10.16%, 10=30.03%, 20=16.68%, 50=17.47%
and in second run - RAID setup
2=15.06%, 4=34.67%, 10=17.73%, 20=6.39%, 50=7.02%
Noticing that distribution is higher with lower latencies with the RAID setup.
Why does this difference exist?
Sorry for the long question. I'm hoping somebody with knowledge how disk writes work can help with this question.
You inject extra layer into storage stack and it brings extra processing latency. TL;DR: Nothing unexpected. If you’ll do queue_depth=1 you’ll get a better understanding of how much you lose.
RAID0 (or as some of my friends like to call it "JBOD" because where's the redundancy?) allows striping of writes - that is I can distribute inflight writes of different areas to multiple disks simultaneously rather than queuing them up against the same disk/all disks etc.
Let's say a given disk can support 32 in flight writes at a given moment and I choose to queue up 64 writes. That means the latency of the writes that have to wait for a slot (33 onwards) will be much higher. However, in the RAID0 case, those overflow writes might be serviceable by another disk that can support another 32 inflight writes before they have to be backed up.
Further, we don't know how latency increases as the amount of simultaneous writes increases on just a single one of your disks... Who's to say that write latency with a depth of 1 is the same as with a depth of 2, 4 or 8? If more I/O queued simultaneously leads to a higher queue depth then a RAID0 being able to spread the I/Os out among the disks could mean each disk's queue is low(er) and thus the latencies are better. This last point would depend massively on the I/O size versus the stripe size - e.g. at what boundary can RAID0 distribute the writes?