I have an 8-bay NAS running Fedora 29 (kernel version 4.20.8) and zfs version 0.7.12. All of the drive bays are used for a zfs pool named “tank.” Here is the zpool layout:
tank
mirror-0
sda
sdb
mirror-1
sdc
sdd
mirror-2
sde
sdf
cache
sdg
spare
sdh
One of the drives (sdb) is failing SMART tests with uncorrected offline reallocated sectors, but still shows as “ONLINE” by ‘zpool status.’ I want to physically replace sdb with a new drive (sdi).
Since there are no available physical bays, I plan to use the following to replace the drive:
zpool offline tank sdb
zpool replace tank sdb sdh
zpool detach tank sdb
echo 1 | sudo tee /sys/block/sdb/device/delete
# Remove the physical hard drive associated with sdb and plug in new physical drive mapped to sdi
I do not know how to best proceed from here. Is it better to:
(a) just add sdi as a new spare (leaving sdh as a permanent replacement for sdb)
zpool add tank spare sdi
(b) replace sdh with sdi and have sdh go back the spare drive pool?
zpool replace tank sdi sdh
zpool detach tank sdh
In this case “better” means less administrative complexity going forward (e.g. if sdh goes bad when applying option (a), would a ‘detach,’ or other, command fail or produce unexpected results since sdh used to be a spare?)? Also, I’m uncertain if I am missing/ incorrect in steps under option (b).
Notes:
- pool names simplified (e.g vdevs are mapped ids)
- know the kernel/ zfs are ancient, but solving this failing drive ahead of upgrade
- Controller card and bays support hot swapping
- Searched and read topics on ZoL disk replacement as well as Oracle’s docs, but haven’t seen a topic on best practice (sorry if I missed it)
First of all create a good backup.
Then
1:(Offline the failing drive:)
zpool offline tank sdb
2:(Replace sdb with sdi:)
zpool replace tank sdb sdi
3:Detach the old drive (sdh):
zpool detach tank sdh
There's no difference between the options when all is said and done — drives are interchangeable. "Just add
sdi
as the new spare" requires fewer steps and minimizes the amount of time that you spend in a resyncing state, so it's the natural choice.