Bonus Tip - Unrecoverable Errors in RAID 5

There is a known and widely discussed issue with RAID 5. If one drive in the array fails completely, then during the rebuild there may be a data loss if one of the remaining drives encounters an unrecoverable read error (URE). These errors are relatively rare, but sheer size of modern arrays leads to speculation that one cannot even read the entire array reliably (without encountering a read error).

There are some pretty scary calculations available on the Internet. Some are concluding that there is as much as 50% probability of failing the rebuild on the 12TB (6x2TB) RAID 5.

The calculation goes like this: let's say we have a probability p of not being able to read the bit off the drive. Then, q = 1 - p is the probability of the successful read, per bit. To be able to rebuild the RAID 5 array of N disks C terabytes each, one needs to read C*(N-1) terabytes of data. Let's denote the number of bits to read as b = C * (N-1) * 8 * 1012 and we arrive at the probability of successfully completing the rebuild P = qb.

The value of p is provided in the hard drive specification sheet, typically around 10-15 errors per bit read.

Specified UREProbability of rebuild failure
for 6x 2TB drives

These calculations are based on somewhat naive assumptions, making the problem look worse than it actually is. The silent assumptions behind these calculations are that:

  • read errors are distributed uniformly over hard drives and over time,
  • the single read error during the rebuild kills the entire array.

Both of these are not true, making the result useless. Moreover, the whole concept of specifying a bit-level stream-based error rate for a block-based device which cannot read less than 512 bytes of data per transaction seems doubtful.

The original statement can be transformed into something more practical:

The statement "There is a 50% probability of being not able to rebuild a 12TB RAID 5" is the same as "If you have a 10TB RAID 0 array, there is a 50% probability of not getting back what you write to it, even if you write the data and then read it back immediately." That's assuming the same amount of user data on both arrays and 2TB hard drives. Still, nobody declares RAID 0 dead.

This can be reformulated even further: assuming 100MB/sec sustained read speed, we can say "There is a 50% chance that a hard drive cannot sustain a continuous sequential read operation for 30 hours non-stop", which just does not look right. 30 hours is the approximate time to read 10TB of data at 100MB/sec.

Copyright © 2011