I have a SW RAID 1 with 3 partitions and one of them seem not to be able to re-sync after a HDD failure/replacement.
Here's some info:
more /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sdb3[1] sda3[2]
2862630207 blocks super 1.2 [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[2]
524276 blocks super 1.2 [2/2] [UU]
md0 : active raid1 sda1[2](F) sdb1[1]
67107768 blocks super 1.2 [2/1] [_U]
unused devices: <none>
I tried to set as faulty /dev/sda1 and then remove it but I get an error, please see below:
# mdadm --manage --set-faulty /dev/md0 /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md0
# mdadm --manage --remove /dev/md0 /dev/sda1
mdadm: hot remove failed for /dev/sda1: Device or resource busy
Do you have any suggestions on what else I may try?
Why are you trying to remove a failed drive? It already failed, shut down (if possible, avoids unplanned shutdowns if you pull the wrong drive) and pull the failed drive.
Add the new as a hot spare and it will rebuild.
I know that this brings down the other two raids, but if you have a failure on partition 1 of a drive, it's only a matter of time before the drive goes completely. You need to rebuild all three raids.
I only solved this problem by reinstalling the system.