I've been looking at my raid setups and I am beginning to really hate raid 1. During a drive failure, you don't know if the data on the other drive is correct or not. And what if one drive reads 1 and the other 0 without failure? How do you know which drive is correct?
Sure you could go with raid 6, but its a minimum of 4 drives. I think you can do the same with just 2 drives.
I've come up with a few raid levels, but why don't they exist?
- Raid on a single drive that also uses forward error correction like par2
- Like #1 but also mirrored (now you can ensure data is correct during failure)
This would require some custom hardware to perform the par2 calculation quickly. Also since its par2, for each drive you add to the array the par2 files can be smaller and smaller, since the amount of redundancy is the sum of the total size of the par2 files. See this to learn more about par2: http://www.quickpar.org.uk/AboutPAR2.htm
You just need a ZFS mirror. You're guaranteed consistent data based on COW and constant checksumming.
If the question is "why can you not have RAID with a single drive that has error correction", the answer is in the "R" of RAID (Redundant Array of Independent Disks)... There would be no redundancy for disk failure. RAID is not designed to protect against data corruption (as par2 is), it is designed to protect against disk failure. A disk failure on a single disk with par2 would take the error correcting checksum down with the data, leaving you with no data at all.
RAID by definition can't be done on a single drive as RAID is "Redundant Array of Independent Disks" or "Redundant Array of Inexpensive Disks" depending on who you ask.
A proper raid controller won't write different data to each drive. As the data is written to one disk, it is also written to the other disk. If one disk doesn't accept the write then the block should be marked as bad. If the disk still isn't usable it should be marked as failed.
As smearp wrote RAID isn't designed to protect against data problems. It is a hardware redundancy solution.
You're right: with a mirror, it's hard to know which side is correct if they disagree.
There's an analogous problem with RAID: parity inconsistency. If a data block is damaged, you want to be able to reconstruct from parity — but what if a parity block is damaged? When a drive fails, the data in the matching reconstructed block will become damaged.
Strong checksums over blocks spanning multiple sectors can help: if the data block checksum fails but the parity block checksum succeeds, you can confidently reconstruct the data block. Strong checksums are no guarantee by themselves, however: if a drive misses out on a whole block write, it'll still carry its last validating checksum despite being out of date.
After a decade working for an enterprise storage vendor, I'm under no illusions: RAID is harder than it looks. It's easy to survive basic failure modes like full drive loss. It takes considerable more work and experience to survive more obscure failure modes like drives discarding writes or putting them in the wrong place.
Finally: RAID6 requires at least three drives, as it's designed to survive two simultaneous drive failures. It also protects you from the more common problem of media errors preventing full RAID5 reconstruction.