Today I have found this on a server (FreeBSD 8.2 STABLE):
NAME STATE READ WRITE CKSUM
data DEGRADED 1.38K 0 0
raidz1-0 DEGRADED 1.38K 0 0
ad10 ONLINE 1.38K 0 0
ad12 ONLINE 0 0 0
ad14 ONLINE 0 0 0
ad16 REMOVED 0 0 0
I quickly pulled down the wrong HDD and put in a new one. After that, I typed in this unlucky command:
zpool add data ad16
The result was that a new ad16 device appeared in the pool:
NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ad10 ONLINE 0 0 0
ad12 ONLINE 0 0 0
ad14 ONLINE 0 0 0
ad16 FAULTED 0 0 0 corrupted data
ad16 ONLINE 0 0 0
The first ad16 device is FAULTED and part of data/raidz1-0 volume. The second ad16 device is ONLINE, and not part of any volume. The problem is that they have the same name so the replace command does not work:
gw# zpool replace -f data ad16 ad16
invalid vdev specification
the following errors must be manually repaired:
/dev/ad16 is part of active pool 'data'
I think that I should remove the ONLINE ad16 disk before I could replace the FAULTED ad16 disk. But this doesn't work because I'm not able to put it offline, nor remove it:
gw# zpool offline data ad16
gw# zpool status
pool: data
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub in progress since Thu Apr 18 03:23:06 2013
26.1G scanned out of 3.13T at 50.7M/s, 17h52m to go
0 repaired, 0.81% done
config:
NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ad10 ONLINE 0 0 0
ad12 ONLINE 0 0 0
ad14 ONLINE 0 0 0
ad16 OFFLINE 0 0 0
ad16 ONLINE 0 0 0
errors: No known data errors
gw# zpool remove data ad16
cannot remove ad16: only inactive hot spares, cache, top-level, or log devices can be removed
I guess that the "offline ad16" command is targeting the FAULTED device. But I would like to offline the other one. I have also tried to boot the system into single user mode, with this new disk removed, but it resulted in both ad16 devices being UNAVAIL, and the whole pool unusable (which is strange, because there are enough disks to make it work...)
The, 'zpool add' command is for adding new devices (vdevs) to pools. When you originally ran that command, you added a new vdev (consisting of only ad16) to the pool. You started with a 4 disk raidz, with one failed disk, but now you have a pool where data is striped between the raidz and ad16. Losing that ONLINE ad16 disk will FAULT the entire pool.
As it's not possible to remove ad16 now (you cannot remove a vdev from a pool), and I doubt you want your data striped between a 4-disk raidz and a single disk, I don't think it's worth your time trying to sort out that FAULTED disk. You'd be better off looking at getting your data off onto a separate pool/disk/server and destroy/recreate this pool.
Suggestion: Reference the underlying device by the UUID vs the logical name.
See: http://forums.freebsd.org/showthread.php?t=37394