One of my server's drives failed and so I removed the failed drive from all three relevant arrays, had the drive swapped out, and then added the new drive to the arrays. Two of the arrays worked perfectly. The third added the drive back as a spare, and there's an odd "removed" entry in the mdadm
details.
I tried both
mdadm /dev/md2 --remove failed
and
mdadm /dev/md2 --remove detached
as suggested here and here, neither of which complained, but neither of which had any effect, either.
Does anyone know how I can get rid of that entry and get the drive added back properly? (Ideally without resyncing a third time, I've already had to do it twice and it takes hours. But if that's what it takes, that's what it takes.) The new drive is /dev/sda
, the relevant partition is /dev/sda3
.
Here's the detail on the array:
# mdadm --detail /dev/md2 /dev/md2: Version : 0.90 Creation Time : Wed Oct 26 12:27:49 2011 Raid Level : raid1 Array Size : 729952192 (696.14 GiB 747.47 GB) Used Dev Size : 729952192 (696.14 GiB 747.47 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Tue Nov 12 17:48:53 2013 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 UUID : 2fdbf68c:d572d905:776c2c25:004bd7b2 (local to host blah) Events : 0.34665 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 19 1 active sync /dev/sdb3 2 8 3 - spare /dev/sda3
If it's relevant, it's a 64-bit server. It normally runs Ubuntu, but right now I'm in the data centre's "rescue" OS, which is Debian 7 (wheezy). The "removed" entry was there the last time I was in Ubuntu (it won't, currently, boot from the disk), so I don't think that's not some Ubuntu/Debian conflict (and they are, of course, closely related).
Update:
Having done extensive tests with test devices on a local machine, I'm just plain getting anomalous behavior from mdadm
with this array. For instance, with /dev/sda3
removed from the array again, I did this:
mdadm /dev/md2 --grow --force --raid-devices=1
And that got rid of the "removed" device, leaving me just with /dev/sdb3
. Then I nuked /dev/sda3
(wrote a file system to it, so it didn't have the raid fs anymore), then:
mdadm /dev/md2 --grow --raid-devices=2
...which gave me an array with /dev/sdb3
in slot 0 and "removed" in slot 1 as you'd expect. Then
mdadm /dev/md2 --add /dev/sda3
...added it — as a spare again. (Another 3.5 hours down the drain.)
So with the rebuilt spare in the array, given that mdadm
's man page says
RAID-DEVICES CHANGES
...
When the number of devices is increased, any hot spares that are present will be activated immediately.
...I grew the array to three devices, to try to activate the "spare":
mdadm /dev/md2 --grow --raid-devices=3
What did I get? Two "removed" devices, and the spare. And yet when I do this with a test array, I don't get this behavior.
So I nuked /dev/sda3
again, used it to create a brand-new array, and am copying the data from the old array to the new one:
rsync -r -t -v --exclude 'lost+found' --progress /mnt/oldarray/* /mnt/newarray
This will, of course, take hours. Hopefully when I'm done, I can stop the old array entirely, nuke /dev/sdb3
, and add it to the new array. Hopefully, it won't get added as a spare!
Well all of the usual options (listed in my question) failed, I had no choice but to:
Remove
/dev/sda3
from the arrayNuke it
Create a new degraded array containing it and an empty slot
rsync
the files from the old array to the new oneStop the old array
Nuke
/dev/sdb3
Add
/dev/sdb3
to the new arrayIt started off saying "spare, rebuilding" but once it was rebuilt, it got added to the array as an active drive.
Of course, this meant dealing with the knock-on effects of the array having changed (and as this was the root file system, those were a royal pain).
As far as I can tell, something had got corrupted in the definition of the previous array, because:
A) Adding the drive should have Just Worked(tm) like it did with the other two,
and
B) If not, shrinking and growing the array should have worked.