some presentation declares that zfs has:
zfs can detect and correct silent data corruption.
e.g. from here http://www.eecis.udel.edu/~bmiller/DE-OSUG/ECECIS-ZFS.pdf
- But do you need to allocate some spare disk or zfs pool to do it mannually? or this is intrinsic by zfs?
- Does single disk zfs file system have this feature or you have to get RAIDZ?
1. But do you need to allocate some spare disk or zfs pool to do it manually? or this is intrinsic by zfs?
The affected data need to be redundant for this to happen. This redundancy can be achieved without extra disks. Multiple disks doesn't imply redundancy either.
ZFS supports spare devices but they are here to replace other devices that are in failed state. They are not used for data redundancy.
2. Does single disk zfs file system have this feature or you have to get RAIDZ?
Whatever the pool configuration, corrupted data is always detected with ZFS unless you explicitly disable checksums, but that would be a very bad idea.
A single disk pools can recover a rotten block when it contains metadata. Blocks containing file data can only be recovered if the copies property is set to 2 or higher.
Multiple disk pools in a striped configuration are similar to single disk pools, i.e. metadata can survive disk rotting, ditto blocks presence is a requirement for file data self healing.
Multiple disk pools in a redundant configuration (mirror, raidz, raidz2, raidz3) can recover any disk rotting issue (unless of course a massive error situation like multiple disks failing).
Errors are detected when the affected file (or metadata or zvol block) is read. If ZFS can recover the error, the error is fixed transparently and correct data is returned. Otherwise, a read error is reported. Note that the checksum isn't an ECC so it cannot be used to recover broken blocks, only to detect them.
Should you want to verify a whole pool without waiting for a read to occur, you can use the scrub mechanism. ZFS will check all the used blocks and self heal those rotten when possible.
To detect such rotting on an entire disk, you have to run a periodical data srcubbing. Some distribution do it in a cron job, so have a look at it.
The command is
zpool scrub techrx
. You can run only one scrub process at a time.Each time datas are read from the disk, zfs check for rotting for the read datas. So you are pretty sure to read clean datas. It is advised to run a complete check once a week up to once a month (as we do for most RAID).
1 you don't need to allocate disk space. The data correction is done within the available space (I remember have seen it's first done by rewriting the erroneous sectors, and in case of failure they are rewriten elsewhere, but this may be inaccurate). Of course, if your disk is already 100% full, this may not be possible
2 the process is based on checksums already integrated into zfs
3 most of the time the scrubbing correct the datas, because the checksum is in fact an error correcting code. if the damaged datas are too big, then zfs can't recover them, but your disk can already considered as dead
You can have more informations here:
run a periodical data scrubbing: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
zfs self healing again silent corruptions: http://hub.opensolaris.org/bin/view/Community+Group+zfs/selfheal