I have (had, it's already been swapped out, asking this for future use) a drive that is indicating pending failure with internal SMART tests and bad block remaps.
It is straightforward to mdadm --fail
the soon to be bad drive and rebuild to a hot spare, or to pull the drive and put a new one in, then rebuild to that drive.
The problem is this takes the array to degraded state for the entire period of that resync, incurring both the additional failure risk and the performance overhead of running degraded. That's expected if you actually have a drive failure, but it is an unnecessary exposure if the drive hasn't actually failed yet.
How can I pre-emptively replace/rebuild that single drive to a hot spare without taking it out of service first?
I'm not sure how resilient this technique is, but it "should work". I'd want to give this procedure some test runs on other drives before doing it for real.
If you have a two disk RAID-1, you can use
mdadm --grow
to transform it to a three disk RAID-1. This is a triple mirror, not a RAID-1E. Then, you can fail out the drive you're worried about and--grow
it back to two disks. Something like this:If you do this, you'll always have at least one mirrored copy of your data.
Reportedly, you can
--grow
an array from RAID-5 to RAID-6, but I'm have never heard of anyone going back to a RAID-5 afterwards. At any rate, that approach is much riskier, because you'll have to rewrite all your data on all the disks.