I have four NVMe drives in a RAID 0 configuration.
I am attempting to determine how many IOPS the array is handling.
When I run iostat
, it appears that one drive is handling more IO than the other three drives.
Is this an error with the way that iostat
collects data, a known issue with mdadm, or have I misconfigured the array?
Usage Details.
# iostat
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
nvme0n1 1669.12 22706.35 13975.13 63422465065 39034761844
nvme3n1 753.28 13228.56 12185.39 36949483692 34035736524
nvme1n1 635.93 13781.47 14014.10 38493855272 39143630456
nvme2n1 744.35 14704.94 14283.13 41073264648 39895068820
md0 4291.15 72863.78 56468.04 203520212237 157724286024
Software RAID device details
# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri Feb 19 22:45:06 2021
Raid Level : raid0
Array Size : 8001060864 (7630.41 GiB 8193.09 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Fri Feb 19 22:45:06 2021
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Consistency Policy : none
Name : eth1:0
UUID : 2e672c70:de98a756:160877d2:d8fe2c94
Events : 0
Number Major Minor RaidDevice State
0 259 1 0 active sync /dev/nvme0n1p1
1 259 5 1 active sync /dev/nvme1n1p1
2 259 7 2 active sync /dev/nvme2n1p1
3 259 3 3 active sync /dev/nvme3n1p1
Block Devices
# lsblk
nvme0n1 259:0 0 1.9T 0 disk
└─nvme0n1p1 259:1 0 1.9T 0 part
└─md0 9:0 0 7.5T 0 raid0 /mnt/raid0
nvme3n1 259:2 0 1.9T 0 disk
└─nvme3n1p1 259:3 0 1.9T 0 part
└─md0 9:0 0 7.5T 0 raid0 /mnt/raid0
nvme1n1 259:4 0 1.9T 0 disk
└─nvme1n1p1 259:5 0 1.9T 0 part
└─md0 9:0 0 7.5T 0 raid0 /mnt/raid0
nvme2n1 259:6 0 1.9T 0 disk
└─nvme2n1p1 259:7 0 1.9T 0 part
└─md0 9:0 0 7.5T 0 raid0 /mnt/raid0
File System Details
# dumpe2fs -h /dev/md0
dumpe2fs 1.44.5 (15-Dec-2018)
Filesystem volume name: QuadSSD
Last mounted on: /mnt/raid0
Filesystem UUID: 8b33fb9d-1f98-44ff-a012-38ac10ffece3
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 250036224
Block count: 2000265216
Reserved block count: 100013260
Free blocks: 1759673576
Free inodes: 249676044
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 1024
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 4096
Inode blocks per group: 256
RAID stride: 128
RAID stripe width: 512
Flex block group size: 16
Filesystem created: Tue Mar 2 22:54:32 2021
Last mount time: Sun Mar 14 15:55:16 2021
Last write time: Sun Mar 14 15:55:16 2021
Mount count: 4
Maximum mount count: -1
Last checked: Tue Mar 2 22:54:32 2021
Check interval: 0 (<none>)
Lifetime writes: 14 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: f8a38f43-4d67-4137-972d-db2f8650ffad
Journal backup: inode blocks
Checksum type: crc32c
Checksum: 0x3f3be24d
Journal features: journal_incompat_revoke journal_64bit journal_checksum_v3
Journal size: 1024M
Journal length: 262144
Journal sequence: 0x06a3502a
Journal start: 154915
Journal checksum type: crc32c
Journal checksum: 0x963b1ac7
Notes:
- I also see similar results running
iostat 10
(nvme0n1 consistently has higher usage than the other drives) - The array/drive was never as a root partition.
- Some output has been abbreviated. For example, other block devices are in the system.
The apparent bigger usage of the first device probably is an artifact of read alignment.
You have a 4x 512K chunk RAID0, meaning that the first device hits for any read aligned at 2 MB boundary. Both 2 MB and 4 MB are common alignment values for applications (ie: LVM physical chunks are 4 MB big by default), so the first drive can appear as more stressed than the others.
For a more in-depth (and correct) evaluation, you should observe your drives behavior during a typical real world test (or a reasonable approximation done via
fio
).