Start of the problem
I have a dedicated server on hosting provider and recently my node exporter detected high disk io saturation on my RAID 1 array /dev/md3. I have checked smartctl for my hard drives and both drives in my array were showing high number of read errors:
[root@ovh-ds03 ~]# smartctl /dev/sda -a | grep Err
Error logging capability: (0x01) Error logging supported.
SCT Error Recovery Control supported.
1 Raw_Read_Error_Rate 0x000b 099 099 016 Pre-fail Always - 65538
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
[root@ovh-ds03 ~]# smartctl /dev/sdb -a | grep Err
Error logging capability: (0x01) Error logging supported.
SCT Error Recovery Control supported.
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 65536
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
I asked throw support ticket to replace 2 disks, but instead of replace 2 more disks were added and array was rebuilt on those 2 new disks. Everything was fine, but now array in degraded state and I had alert because of it named ️NodeRAIDDegraded
, checking on server yep it's in degraded state:
[root@ovh-ds03 ~]# mdadm --detail /dev/md3
/dev/md3:
Version : 1.2
Creation Time : Sat Mar 30 18:18:26 2024
Raid Level : raid1
Array Size : 1951283200 (1860.89 GiB 1998.11 GB)
Used Dev Size : 1951283200 (1860.89 GiB 1998.11 GB)
Raid Devices : 4
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sat Sep 14 19:30:44 2024
State : active, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : md3
UUID : 939ad077:07c22e9e:ae62fbf9:4df58cf9
Events : 55337
Number Major Minor RaidDevice State
- 0 0 0 removed
- 0 0 1 removed
2 8 35 2 active sync /dev/sdc3
3 8 51 3 active sync /dev/sdd3
How do I fix it?
I have tried to test various solution on rebuilding array from scratch and so on
mdadm --assemble --scan