I am backing up data stored in a zpool consisting of a single raidz vdev with 2 hard disks. During this operation, I got checksum errors, and now the status looks as follows:
pool: tmp_zpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tmp_zpool ONLINE 0 0 2
raidz1-0 ONLINE 0 0 4
tmp_cont_0 ONLINE 0 0 0
tmp_cont_1 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
/some/file
What I find confusing is that the checksum error appears at vdev level, but not at disk level. Perhaps I should note, one of the hard disks is internal and the other is external (this is a temporary situation). Can this be an issue with the hard drive controllers?
Is there anything I could try to do to get back the affected file? Like clearing the error and importing the vdev degrade with only one of the disks? I didn't even try to read the file again to see what happens. (Not sure if it would affect anything.)
Update: I gave up waiting for an explanation of what might go wrong if I clear the errors and retry, so I went ahead and tried that. I first did zpool clear
, then zpool status
showed no errors. Then, I tried to read the files with errors (2 of them in the end), but the respective blocks were still being reported as bad/unreadable. This time, zpool status
no longer showed increasing checksum errors. Next, I tried to offline one of the disks in the raidz1 vdev and repeat the process, but the results did not change. In total, I lost 2 128K blocks out of 1.6T.
Answer Status: Currently, I find there is no comprehensive answer to this question. If somebody wants to write one up or edit an existing one, please address the following:
- What could have caused this situation.
- What could be done about it.
- How it could have been prevented.
For 1, the theories and their problems seem to be:
Choice of
raidz1
overraidz2
. Problem: one needs a minimum of 4 disks forraidz2
. While the need for redundancy is clear, it is not useful to repeatedly suggest that the cure for failing redundancy is more redundancy. It would be much more useful to understand how to best use the redundancy you have.Choice of
raidz1
overmirror
. Problem: At first sight, the difference between these seems to be efficiency, not redundancy. This might be wrong, though. Why: zfs saves a checksum with each block on each disk, but neither disk reported individual checksum errors. This seems to suggest that for every bad block, the 2 disks contained different block payloads, each with a matching checksum, and zfs was unable to tell which is correct. This suggests there were 2 different checksum calculations, and that the payload somehow changed between them. This could be explained by RAM corruption, and maybe (need confirmation) with a choice ofmirror
overraidz1
, only one checksum would have been needed.RAM corruption during writing, not reading. As explained above, this seems plausible. Problem: why was this not be detected as an error at write time? Can it be that zfs doesn't check what it writes? Or rather, that the block payloads written to the different disks are the same?
For 2:
- Since the disks have no individual checksum errors, is there some low-level way in zfs to gain access to the 2 different copies of such bad blocks?
For 3:
Is it clear that
mirror
overraidz1
would have prevented this situation?I assume a scrub of this zpool have detected the problem. In my case, I was moving some data around, and I destroyed the source data before I actually read this zpool, thinking that I have a 2 disk redundancy. Would the moral here be to scrub a zpool before trusting its contents? Surely scrubbing is useful, but is it necessary? For instance, would a scrub be necessary with
mirror
instead ofraidz1
?
This is the problem with raidz1 (and also RAID5). If the data on the disk changes but no drive fault occurs to let ZFS or the RAID controller know which drive caused the error, then it can't know which drive is correct. With raidz2 (and higher) or RAID6, you get a quorum of drives that can decide which drive to ignore for reconstruction.
Your only solution here is to overwrite the file, either by restoring a backup copy or writing
/dev/null
to the file.I'm running into a similar issue. I'm not sure if it's helpful, but I found this relevant post about vdev-level checksum errors from a FreeBSD developer.
https://lists.freebsd.org/pipermail/freebsd-hackers/2014-October/046330.html
I myself am considering deleting my
zpool.cache
file and importing my pool to regenerate thatzpool.cache
file.