I run a new CentOS 7 machine. Linux runs on 2x SSD setup, and I also have 4x SAS drives setup in software RAID10. The RAID10 array is large, 4x 12TB drives, so 24TB usable.
File system is: ext4
Now I finished copying some files to it, and I'm doing a raid check (very first one).
Every 2.0s: cat /proc/mdstat Mon Oct 14 06:28:38 2019
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md127 : active raid10 sdf1[3] sdd1[1] sde1[2] sdc1[0]
23437503488 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
[======>..............] check = 32.6% (7649123136/23437503488) finish=3402.6min speed=77333K/sec
bitmap: 0/175 pages [0KB], 65536KB chunk
md2 : active raid1 sdb2[1] sda2[0]
20478912 blocks [2/2] [UU]
md3 : active raid1 sdb3[1] sda3[0]
447318976 blocks [2/2] [UU]
bitmap: 3/4 pages [12KB], 65536KB chunk
unused devices: <none>
It started around 250,000K/sec but it keeps getting slower, and it now it's around 75,000K/s
The drives in the RAID10 array are not being use by anything else at the moment.
I already tweaked the speed limit settings.
dev.raid.speed_limit_min = 100000
dev.raid.speed_limit_max = 1000000
CPU usage is on like 2%, I got tons of RAM free, and the 4 drives in the RAID array are reporting about 25% drive utilization per drive, so they are not being pushed hard by resync.
My question:
What can I do to speed this up?
And what could be causing it to slow down?
Your
message
file show exactly what I expected: a disk/enclosure continuously aborting commands and resetting. The affected disk seems always to besdc
, so it is probably the culprit.The obvious action to solve the problem is to replace it. However, I would first try to:
sdc
with another disk (to change SAS cable/power cord) and check if the errors follows the drive or remain bound to the very same slot/port;dd if=/dev/sdc of=/dev/null bs=1M iflag=direct
to gain additional debug data.If you can't, for some reason, replace the drive, you can try forcing bad blocks reallocation by completely rewrite the device via
dd if=/dev/zero of=/dev/sdc bs=1M oflag=direct
. BIG WARNING: this will completely and irreversibly destroy all data onsdc
. Try it only if you really can't replace the drive.