I have a 5 x 3TB raidz1 array on a ubuntu 14.04.1 server. Last month, one of the drives died (audible clicking). I was able to replace the drive with zpool replace RAID <dead drive> <new drive>
. That finished without issue and the pool was online and healthy again. Then another drive died. I attempted the same thing, but the pool is stuck in the following status
# zpool status
pool: RAID
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: resilvered 29.1G in 6h3m with 1028 errors on Mon Jan 5 05:35:35 2015
config:
NAME STATE READ WRITE CKSUM
RAID DEGRADED 0 0 1.00K
raidz1-0 DEGRADED 0 0 2.01K
ata-ST3000DM001-9YN166_Z1F15FAV ONLINE 0 0 0
ata-ST3000DM001-9YN166_Z1F15FCJ ONLINE 0 0 0
replacing-2 DEGRADED 0 0 4
17164957131155215254 UNAVAIL 0 0 0 was /dev/disk/by-id/ata-ST3000DM001-9YN166_Z1F15TBH-part1
ata-ST3000DM001-1ER166_W500JFME ONLINE 0 0 0
ata-ST3000DM001-1ER166_Z500765Z ONLINE 0 0 3
ata-ST3000DM001-1CH166_W1F1M2C6 ONLINE 0 0 0
errors: 1028 data errors, use '-v' for a list
The good news is the data is non-essential. I am not worried about the errors (the files are videos and still play fine). I have tried the following actions to remedy this, as suggested by other questions and forums.
# zpool offline RAID ata-ST3000DM001-9YN166_Z1F15TBH
cannot offline ata-ST3000DM001-9YN166_Z1F15TBH: no valid replicas
# zpool offline RAID 17164957131155215254
cannot offline 17164957131155215254: no valid replicas
# zpool detach RAID ata-ST3000DM001-9YN166_Z1F15TBH
cannot detach ata-ST3000DM001-9YN166_Z1F15TBH: no valid replicas
# zpool detach RAID 17164957131155215254
cannot detach 17164957131155215254: no valid replicas
I have also run a zpool clear RAID
and zpool scrub
which triggered resilvers but left the pool in the same status as above. I then tried to remove the new disk, but oddly got the same no valid replicas error.
# zpool offline RAID ata-ST3000DM001-1ER166_W500JFME
cannot offline ata-ST3000DM001-1ER166_W500JFME: no valid replicas
I am at a loss for how to proceed. It appears that the replace was successful, but zfs won't let go of the original disk.
# dkms status -v
spl, 0.6.3, 3.13.0-43-generic, x86_64: installed
zfs, 0.6.3, 3.13.0-43-generic, x86_64: installed
Update: I removed the zpool cache at /etc/zfs/zpool.cache
and rebooted. Resilvering again, will report back.
Update 2: Still in the same status as above. If there is no way to finish the replace, is there any way to rebuild the pool without loosing any data?
Update 3: Here is the most recent status:
# zpool status
pool: RAID
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: resilvered 29.1G in 6h1m with 1028 errors on Wed Jan 7 03:49:13 2015
config:
NAME STATE READ WRITE CKSUM
RAID DEGRADED 0 0 1.00K
raidz1-0 DEGRADED 0 0 2.01K
ata-ST3000DM001-9YN166_Z1F15FAV ONLINE 0 0 0
ata-ST3000DM001-9YN166_Z1F15FCJ ONLINE 0 0 1
replacing-2 DEGRADED 0 0 0
17164957131155215254 UNAVAIL 0 0 0 was /dev/disk/by-id/ata-ST3000DM001-9YN166_Z1F15TBH-part1
ata-ST3000DM001-1ER166_W500JFME ONLINE 0 0 0
ata-ST3000DM001-1ER166_Z500765Z ONLINE 0 0 0
ata-ST3000DM001-1CH166_W1F1M2C6 ONLINE 0 0 0
errors: 1028 data errors, use '-v' for a list
The smartctl data for all 5 drives is here.