I have been looking into RAID5 Vs RAID6 lately and I keep seeing that RAID5 is not secure enough anymore because of the URE ratings and increasing size of the drives. Basically, most of the content I found says that in RAID5, in case you have a disk failure, if the rest of your array is 12TB, then you have almost 100% chance to meet a URE and to lose your data.
The 12TB figure comes from the fact that disks are rated at 10^14 bits read to reach one URE.
Well, there is something I do not get here. A read is done by the head going on the sector, what can make the reading failed is either the head dies or the sector dies. it can also be that the reading does not work for some other reason (I don't know, like a vibration made the head jumps...). so, let me address all 3 situations :
- the reading does not work : that is not unrecoverable, right? it can be tried again.
- the head dies : this would for sure be unrecoverable, but, that also means the full platter (or at least the side) would be unreadable, it would be more alarming, no?
- the sector dies : as well totally unrecoverable, but here I do not understand why the 4TB disk is rated at 10^14 for the URE and the 8TB is as well rated at 10^14 for the URE, that would mean the sectors on the 8TB (most likely newer tech) are half as reliable as the ones on the 4TB, that does not make sense.
As you see, from the 3 failure points I identify, none makes sense. So what exactly is an URE, I mean concretely?
Is there somebody who can explain that to me?
Edit 1
After first wave of answers, it seems the reason is the sector failing. Good thing is that firmware, RAID controller and OS + filesystem have procedure in place to early detect that and reallocate sectors.
Well, I now know what is a URE (actually, the name is quite self-explanatory :) ).
I am still puzzled by the underlying causes and mostly the stable rating they give.
Some attributed the failing sector to external sources (cosmic waves), I am then surprised that the URE rate is then based on the reading count and not on the age, the cosmic waves should indeed impact more an older disk simply because it has been exposed more, I think this is more of a fantasy though I might be wrong.
Now comes the other reason that relates to the wear of the disk and some pointed out that higher densities give weaker magnetic domains, that totally makes sense and I would follow the explanation. But As it is nicely explained here, the newer disks different sizes are obtained mostly by putting more or less of the same platter (and then same density) in the HDD chassis. The sectors are the same and all should have the very same reliability, so bigger disks should then have a higher rating than smaller disks, the sectors being read less, this is not the case, Why? That would though explain why the newer disks with newer tech get no better rating than the old ones, simply because the better tech gain is offseted by the loss due to higher density.
A URE is an Unrecoverable Read Error. Something has happened that has caused the reading of a sector to fail that the drive cannot fix. The drive electronics are sophisticated, they will only pass the data up if they have been able to read it correctly from the disk. The drive electronics will try multiple times to read a bad sector before declaring it damaged.
What causes the read error - I'm not an expert here (arm waving ensues) but drive aging can cause manufacturing tolerances to become relevant. Magnetic domains can become weakened. Cosmic rays can cause damage etc. Essentially it is a random failure.
How does this affect RAID 5?
A RAID 5 consists of block level striping with distributed parity. The parity blocks are calculated by XORing the bits from the data blocks together. The XOR function basically says, if all the bits are the same the result is 0 otherwise it is 1. When calculating parity you take the first 2 bits and XOR them then XOR the result with the next bit and so on e.g.
The nature of the XOR function is such that if any disk dies and is replaced, the data that should be on it can be reconstructed from the remaining disks.
As you can see the damaged data can be reconstructed by XORing the remaining data and parity.
How does a URE affect this?
A URE is only significant during a RAID 5 rebuild.
When you reconstruct a RAID 5 there is a large amount of reading to be done. Every data block needs to be read in order to reconstruct the data on the new disk. If a URE occurs then the data for the relevant block cannot be recovered so your data is inconsistent. For sufficiently large disks in a sufficiently large R5 the number of bits read to reconstruct the replaced disk exceeds the URE value of for example 1 bit in 10^14 read.
Hard disks do not simply store the data that you ask them to. Because of the ever-decreasing magnetic domain sizes, and the fact that hard disks store data in an analog rather than binary fashion (the hard disk firmware gets an analog signal from the platter, which is translated into a binary signal, and this translation is part of the manufacturer's secret sauce), there is virtually always some degree of error in a read, which must be compensated for.
To ensure that data can be read back, the hard disk also stores forward error correction data along with the data you asked it to store.
Under normal operations, the FEC data is sufficient to correct the errors in the signal that is read back from the platter. The firmware can then reconstruct the original data, and all is well. This is a recoverable read error which is exposed in SMART as the read error rate attribute (SMART attribute 0x01) and/or Hardware ECC Recovered (SMART attribute 0xc3).
If for some reason the signal degrades below a certain point, the FEC data is no longer sufficient to reconstruct the original data. At that point, the theory goes, the firmware will still be able to detect that the data could not be read back reliably, but it can't do anything about it. If multiple such reads fail, the disk has to somehow inform the rest of the computer that the read couldn't be performed successfully. It does so by signalling an unrecoverable read error. This also increases the Reported Uncorrectable Errors (SMART attribute 0xbb) counter.
An unrecoverable read error, or URE, is simply a report that for whatever reason, the payload data plus the FEC data was insufficient to reconstruct the originally stored data.
Keep in mind that URE rates are statistical. You won't encounter any hard disk where you can read exactly 10^14 (or 10^15) - 1 bits successfully and then the next bit fails. Rather, it's a statement by the manufacturer that on average, if you read (say) 10^14 bits, then at some point during that process you will encounter one unreadable sector.
Also, following on the last few words above, keep in mind that URE rates are given in terms of sectors per bits read. Because of how data is stored on the platters, the disk cannot tell which part of a sector is bad, so if a sector fails the FEC check, then the entire sector is considered to be bad.
The specification is usually "on average 1 error is detected while reading n bits", so the drive size does not matter. It matters if you calculate your risk that an error will happen on your drive and workload, but the manufacturer only states that it takes n bits read to find an error (on average, not guaranteed).
Example: If you buy a 1TB drive, you would have to read it about 12 times to find an error, while an 8TB drive might experience it on the second read - but the number of bits read is the same both times, so the quality of the magnetic spindles is roughly the same.
What you pay for in increased price are other factors, ability to cram 8TB into the physical space of 1TB, greatly reduced energy consumption, fewer headcrashes while moving the drive etc.
I think @Michael Kjörling answered clearly.
When the disk read, the head detecting the direction of the magnetic domain, then send out some eletronic signal, which is analog. We assume the firmware should give an 1 when it receive a voltage higher than 0.5V, but the magnetic field is too weak, so the head send a signal with 0.499V only, an error encounterd. We need the FEC to correct this error.
Here's an example: a sector data should be 0x0F23, we encode it with 0*1+F*2+2*3+3*4=0x30. now we get the FEC, and write it after the sector. When we read, we read 0x0E23 and FEC 0x30, it dosen't match. After some calculate, we found it should be 0x0F23. But if we got 0x0E13 and 0x30, OR we got 0x0E23 and 0x32, we can't calculate the correct one.
This rating is so low, maybe unless the hdd manufactory read PBs ever EBs data could get a stable value. So they give out the probability value: when you read 10^14 bit data, you may encountered once. Since it's a probability value, maybe you encoutered after you read just 1 sector data, maybe you encountered until you read 50TB data. And this value had nothing with the disk capacity, it just a chance concern with the data size you read. If you read a 4TB disk full of data 6 time, this chance will equal to read a 6TB disk 4 time, or read a 8TB disk 3 times.