I just recently migrated a bulk data storage pool (ZFS On Linux 0.6.2, Debian Wheezy) from a single-device vdev configuration to a two-way mirror vdev configuration.
The previous pool configuration was:
NAME STATE READ WRITE CKSUM
akita ONLINE 0 0 0
ST4000NM0033-Z1Z1A0LQ ONLINE 0 0 0
Everything was fine after the resilver completed (I initiated a scrub after the resilver completed, just to have the system go over everything once again and make sure it was all good):
pool: akita
state: ONLINE
scan: scrub repaired 0 in 6h26m with 0 errors on Sat May 17 06:16:06 2014
config:
NAME STATE READ WRITE CKSUM
akita ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ST4000NM0033-Z1Z1A0LQ ONLINE 0 0 0
ST4000NM0033-Z1Z333ZA ONLINE 0 0 0
errors: No known data errors
However, after rebooting I got an email notifying me of the fact that the pool was not fine and dandy. I had a look and this is what I saw:
pool: akita
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub in progress since Sat May 17 14:20:15 2014
316G scanned out of 1,80T at 77,5M/s, 5h36m to go
0 repaired, 17,17% done
config:
NAME STATE READ WRITE CKSUM
akita DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ST4000NM0033-Z1Z1A0LQ ONLINE 0 0 0
ST4000NM0033-Z1Z333ZA UNAVAIL 0 0 0
errors: No known data errors
The scrub is expected; there is a cron job setup to initiate a full system scrub on reboot. However, I definitely wasn't expecting the new HDD to fall out of the mirror.
I define aliases that map to the /dev/disk/by-id/wwn-* names, and in case of both these disks have given ZFS free reign to use the full disk, including handling partitioning:
# zpool history akita | grep ST4000NM0033
2013-09-12.18:03:06 zpool create -f -o ashift=12 -o autoreplace=off -m none akita ST4000NM0033-Z1Z1A0LQ
2014-05-15.15:30:59 zpool attach -o ashift=12 -f akita ST4000NM0033-Z1Z1A0LQ ST4000NM0033-Z1Z333ZA
#
These are the relevant lines from /etc/zfs/vdev_id.conf (I do notice now that the Z1Z333ZA uses a tab character for separation whereas the Z1Z1A0LQ line uses only spaces, but I honestly don't see how that could be relevant here):
alias ST4000NM0033-Z1Z1A0LQ /dev/disk/by-id/wwn-0x5000c500645b0fec
alias ST4000NM0033-Z1Z333ZA /dev/disk/by-id/wwn-0x5000c50065e8414a
When I looked, /dev/disk/by-id/wwn-0x5000c50065e8414a*
were there as expected, but /dev/disk/by-vdev/ST4000NM0033-Z1Z333ZA*
were not.
Issuing sudo udevadm trigger
caused the symlinks to show up in /dev/disk/by-vdev. However, ZFS doesn't seem to just realize that they are there (Z1Z333ZA still shows as UNAVAIL
). That much I suppose can be expected.
I tried replacing the relevant device, but had no real luck:
# zpool replace akita ST4000NM0033-Z1Z333ZA
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-vdev/ST4000NM0033-Z1Z333ZA-part1 is part of active pool 'akita'
#
Both disks are detected during the boot process (dmesg log output showing the relevant drives):
[ 2.936065] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2.936137] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2.937446] ata4.00: ATA-9: ST4000NM0033-9ZM170, SN03, max UDMA/133
[ 2.937453] ata4.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 2.938516] ata4.00: configured for UDMA/133
[ 2.992080] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 3.104533] ata6.00: ATA-9: ST4000NM0033-9ZM170, SN03, max UDMA/133
[ 3.104540] ata6.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 3.105584] ata6.00: configured for UDMA/133
[ 3.105792] scsi 5:0:0:0: Direct-Access ATA ST4000NM0033-9ZM SN03 PQ: 0 ANSI: 5
[ 3.121245] sd 3:0:0:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
[ 3.121372] sd 3:0:0:0: [sdb] Write Protect is off
[ 3.121379] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 3.121426] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 3.122070] sd 5:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
[ 3.122176] sd 5:0:0:0: [sdc] Write Protect is off
[ 3.122183] sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 3.122235] sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Both drives are connected directly to the motherboard; there is no off-board controller involved.
On impulse, I did:
# zpool online akita ST4000NM0033-Z1Z333ZA
which appears to have worked; Z1Z333ZA is now at least ONLINE
and resilvering. At about an hour into the resilver it's scanned 180G and resilvered 24G with 9.77% done, which points to it not doing a full resilver but rather only transferring the dataset delta.
I'm honestly not sure if this issue is related to ZFS On Linux or to udev (it smells a bit like udev, but then why would one drive be detected just fine but not the other), but my question is how do I make sure the same thing doesn't happen again on the next reboot?
I'll be happy to provide more data on the setup if necessary; just let me know what's needed.
This is a udev issue that seems to be specific to Debian and Ubuntu variants. Most of my ZFS on Linux work is with CentOS/RHEL.
Similar threads on the ZFS discussion list have mentioned this.
See:
scsi and ata entries for same hard drive under /dev/disk/by-id
and
ZFS on Linux/Ubuntu: Help importing a zpool after Ubuntu upgrade from 13.04 to 13.10, device IDs have changed
I'm not sure what the most deterministic pool device approach for Debian/Ubuntu systems is. For RHEL, I prefer to use device WWNs on general pool devices. But other times, the device name/serial is useful, too. But udev should be able to keep all of this in check.