Recently, I had to resize a linux software RAID array. It has been a little bit complex as I was forced to do many steps to grow the device size while shrinking the number of members from 14 to 6. It nearly took a week. However, everything went fine and the LVM inside has not been harmed. Now the array seems to be fine, but shows only 4/6 devices as active:
[root@kvm24 ~]# cat /proc/mdstat
Personalities : [raid10] [raid0]
md3 : active raid10 sdh3[7] sdn3[6] sdl3[10] sda3[17] sdf3[19] sdc3[18]
5559542784 blocks super 1.2 128K chunks 2 near-copies [6/4] [UUUUUU]
In the details, I can't see a problem:
[root@kvm24 ~]# mdadm --detail /dev/md3
/dev/md3:
Version : 1.2
Creation Time : Wed Nov 1 23:53:09 2017
Raid Level : raid10
Array Size : 5559542784 (5301.99 GiB 5692.97 GB)
Used Dev Size : 1853180928 (1767.33 GiB 1897.66 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Update Time : Tue May 7 13:28:06 2019
State : active
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 128K
Consistency Policy : unknown
Name : kvm24:3 (local to host kvm24)
UUID : 35833398:1c8291c5:8e817efc:6f99d541
Events : 582653
Number Major Minor RaidDevice State
17 8 3 0 active sync set-A /dev/sda3
10 8 179 1 active sync set-B /dev/sdl3
18 8 35 2 active sync set-A /dev/sdc3
7 8 115 3 active sync set-B /dev/sdh3
19 8 83 4 active sync set-A /dev/sdf3
6 8 211 5 active sync set-B /dev/sdn3
To be honest, I didn't even realize that, until Check_MK told me CRIT - disk state is [6/4] [UUUUUU] (expected 4 disks to be up).
What might be the problem with the array?
Everything is alright with your RAID, at least according to these logs. Check
dmesg
for more details on what happend with your RAID earlier. I don't know what is your Check_Ml, but RAID will re-sync from time to time and it's something normal. If you're refering to RAID state, this could be:clean
, but alsoactive
oractive sync
. If you see[UUUUUU]
, then eachU
here means that each drive is synchronized. You should be worried if one of them will be replaced with_
, meaning that one drive is inactive/failed. You can read about it at https://www.kernel.org/doc/html/v4.16/admin-guide/md.html