I have a Linux software raid 10 setup consisting of 5 RAID 1s (Two drives per mirrored setup) and a RAID 0 across all 5 RAID 1 pairs. To test that none of the drives were going to fail quickly under load I used badblocks across the RAID 0 with a destructive read/write mode.
Badblocks command: badblocks -b 4096 -c 98304 -p 0 -w -s /dev/md13
One of the devices failed and instead of the badblocks program happily moving on it hung. If I run a sync command this also hangs. First I would assume this isn't standard behavior for a RAID 1 device. If one of the drives fails it should still be able to write to the virtual device that the two drives make up without a problem.
So I proceeded to force fail the drive and try to remove it. I can set the drive to faulty without any problem (However the IO operations are still hung). I cannot remove the device entirely from the raid it says it is busy. My assumption is that if I can kick it out of the raid entirely the IO will continue but that is just an assumption and I do think I am dealing with a bug of sorts.
What is going on here exactly? Am I in an unrecoverable spot due to a bug?
The system is running kernel 2.6.18 so it isn't exactly new but I would think given that software raid has been around for so long issues like these would not happen.
Any insight is greatly appreciated.
mdadm --detail /dev/md13
/dev/md13:
Version : 00.90.03 Creation Time : Thu Jan 21 14:21:57 2010 Raid Level : raid0 Array Size : 2441919360 (2328.80 GiB 2500.53 GB) Raid Devices : 5
Total Devices : 5 Preferred Minor : 13 Persistence : Superblock is persistent
Update Time : Thu Jan 21 14:21:57 2010 State : clean Active Devices : 5 Working Devices : 5
Failed Devices : 0 Spare Devices : 0
Chunk Size : 64K UUID : cfabfaee:06cf0cb2:22929c7b:7b037984 Events : 0.3 Number Major Minor RaidDevice State 0 9 7 0 active sync /dev/md7 1 9 8 1 active sync /dev/md8 2 9 9 2 active sync /dev/md9 3 9 10 3 active sync /dev/md10 4 9 11 4 active sync /dev/md11
The failing raid output:
/dev/md8: Version : 00.90.03 Creation Time : Thu Jan 21 14:20:47 2010 Raid Level : raid1 Array Size : 488383936 (465.76 GiB 500.11 GB) Device Size : 488383936 (465.76 GiB 500.11 GB) Raid Devices : 2
Total Devices : 2 Preferred Minor : 8 Persistence : Superblock is persistentUpdate Time : Mon Jan 25 04:52:25 2010 State : active, degraded Active Devices : 1 Working Devices : 1
Failed Devices : 1 Spare Devices : 0
UUID : 2865aefa:ab6358d8:8f82caf4:1663e806 Events : 0.11 Number Major Minor RaidDevice State 0 65 17 0 active sync /dev/sdr1 1 8 209 1 faulty /dev/sdn1