On our FreeNAS server, zpool status
gives me:
pool: raid2
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
raid2 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gptid/5f3c0517-3ff2-11e2-9437-f46d049aaeca ONLINE 0 0 0
gptid/5fe33556-3ff2-11e2-9437-f46d049aaeca ONLINE 3 1.13M 0
gptid/60570005-3ff2-11e2-9437-f46d049aaeca ONLINE 0 0 0
gptid/60ebeaa5-3ff2-11e2-9437-f46d049aaeca ONLINE 0 0 0
gptid/61925b86-3ff2-11e2-9437-f46d049aaeca ONLINE 0 0 0
errors: No known data errors
What should I do? scrub
the pool?
Type
zpool clear raid2
to clear the errors and initiate a scrub.If the errors persist following that, replace the disk.
More details about the hardware would help, so this is generic advice. My recommendation for bunch of consumer disks connected to a PC motherboard are different than what I'd do for enterprise-level gear.
The tool tells you what you need to do: "Determine if the device needs to be replaced".
The tools are only so intelligent and need you, as the human administrator, to figure some things. The steps required are specific to your hardware and your set up, so you will need to make some decisions based on your knowledge of the system.
Take a look at the output from the command. It looks like device
gptid/5fe33556-3ff2-11e2-9437-f46d049aaeca
is experiencing 'WRITE' errors. '1.13M' is a very high error rate and I suspect the problem has been occurring for a while without you noticing. See if you can figure out why and then replace the disk.If you have a hardware controller, that controller might have additional tools to help you determine the nature of the failure.
ZFS can deal with corrupt sectors, so there is no need to panic. But don't ignore the problem either.
As a preventative measure, you should also run a ZFS scrub regularly. See http://doc.freenas.org/index.php/ZFS_Scrubs . This will alert you when ZFS first encounters a problem, well before you hit the 1.13M mark.
Use the following command change out /dev/adaX for your drives.
[blackout@freenas ~]# smartctl -a /dev/ada0 | grep "Serial"
Serial Number: WD-WCC4EXXXXXXXX
also a helpful commant [blackout@freenas ~]# glabel status
Although the question is old, it might be looked at by other people.
If so, remember, the output of
zpool status
andzpool status -v
relate to all errors experienced. That includes errors due to your motherboard SATA ports (if used), the HBA card (if used), the SATA cables themselves..... not just the disks.Three quick diagnostic tests are - check the disk quickly using
smartctl
, check the card is correctly seated and not loose, and try a different port or SATA cable (the cable is a common cause of read/write errors).