So ZFS is reporting some "read issues", so it would seem that this disk is failing, based on the fact nothing given in the ZFS-8000-9P document reports has occurred we are aware of. These disks are fairly new, the only issue we had recently was a full ZFS.
The ZFS runs on top of a LSI MegaRAID 9271-8i, all disks run "raid 0" per disk. I am not very familiar with this raid card, so I found a script that returns data derived from the megacli command line tool. I added 1 drive to show the setup, they are all setup the same. (system disks are different)
zpool status output
pool: data
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: none requested
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
br0c2 ONLINE 0 0 0
br1c2 ONLINE 0 0 0
br2c2 ONLINE 0 0 0
br0c3 ONLINE 0 0 0
br1c3 ONLINE 0 0 0
br2c3 ONLINE 0 0 0
r2c1 ONLINE 0 0 0
r1c2 ONLINE 0 0 0
r5c3 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
r3c1 ONLINE 0 0 0
r4c1 ONLINE 2 0 0
... cut raidz2-1 ...
errors: No known data errors
The output of LSI script
Virtual Drive: 32 (Target Id: 32)
Name :
RAID Level : Primary-0, Secondary-0, RAID Level Qualifier-0
Size : 3.637 TB
Sector Size : 512
Is VD emulated : No
Parity Size : 0
State : Optimal
Strip Size : 512 KB
Number Of Drives : 1
Span Depth : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
PI type: No PI
Is VD Cached: No
The script doesn't report any faulty disk, nor does the raidcontroller mark the drive as faulty. I found some other topics zpool error that gave the advice to clear the error and run a scrub. Now my question is, when is the threshold to run a scrub, how long would this take (assuming this zfs raid will take a performance hit for running scrub) Also when this disk is really fautly, will hot-swapping initialize a "rebuild" ? All the disks are "Western Digital RE 4TB, SAS II, 32MB, 7200rpm, enterprise 24/7/365". Is there a system that will check for zfs errors, since this was just a routine manual check ?
zfs version : 0.6.4.1 zfsonlinux
I know 2 read errors are not allot, but i'd prefer to be replacing disks to early then to late.
I'd do what ZFS tells you to do in this case. Please run a scrub.
I scrub my systems weekly on a schedule. I also use the zfswatcher daemon to monitor the health of Linux ZFS installs.
Your ZFS array is probably untuned, so there are some values that can help improve scrubbing performance, but at this point, you should just run it.
And for the other question, your hot swap probably won't do what you expect it to... See rant below.
rant:
Having a bunch of RAID-0 virtual drives behind a hardware controller is a bad idea!
You have the worst of both worlds. Recoverability and error checking is limited. A failed disk is essentially a failed virtual drive and there are hot-swap implications. Let's say you remove the disk(s) in question. You'd likely need to create a new virtual disk or may end up with different drive enumeration.
At a certain point, it's better to get a real HBA and run the disks as try passthrough devices (with no RAID metadata) or just run ZFS on top of vdevs protected by hardware arrays. E.g. run a RAID-6 on your controller and install ZFS on top. Or run multiple RAID-X groups and have ZFS mirror or stripe the resulting vdevs.
zfs scrub
is the "system that will check for zfs errors". It will take as long as it takes to read all data stored in the volume (going in sequential order of txg, so it can be seeking a lot, depending on how full the pool is and how the data was written). Once started,zfs status
will show some estimate. Running scrub can be stopped.If you want something to periodically check
zpool status
, the simplest way would be to run something likezpool status | grep -C 100 Status
periodically (once a 6 hours) and email the output if any. You could probably find a plugin for your favourite monitoring system, like nagios. Or it'd be pretty straightforward to write yourself.Just hot swapping the drive will not trigger resilver. You will have to run
zfs replace
for that to happen.The read error you are seeing may as well be some kind of controller mishap. Even though it's an enterprise hardware, these (HW RAID) controllers sometimes behave weird. And these errors may, for example, be a result of a command taking too long - controller being busy with whatever. That's why I try to stay away from those unless necessary.
I'd go with checking the SMART data on the drive (see
man smartctl
) and scrubbing the pool. If both look OK, clear the errors and do not mess with your pool. Because if the pool is near full reading all the data during resilver can actually trigger another error. Start panicing once you see errors on the same drive again ;).btw. for best performance you should use n^2+2 drives in RAIDZ2 vdevs.