While the performance improvements that result from the use of RAID are of course
important, the primary goal of the use of RAID for most businesses is the protection of
both their critical data and their employees' productivity. In a properly-implemented RAID
system, down-time due to hardware problems can be dramatically reduced, and data loss due
to drive failures all but eliminated.
In order to understand how RAID really affects important reliability considerations, it
is necessary to first understand the various issues that are related to that somewhat
nebulous term. For example, a commonly-heard phrase is that "RAID improves hard disk
reliability", but that's not an accurate statement. The truth depends to some extent
on how you define reliability: do you mean the reliability of the individual drives, or
the whole system? Are you talking about the data, or the hardware itself? And of course,
not all RAID implementations improve reliability in any way.
In this section, I take a closer look at the key issues related to the general topic of
reliability under RAID. This includes a more thorough look at the concepts of fault
tolerance, reliability and availability, a discussion of the reliability of other system
components that affect the reliability of the system as a whole, and a discussion of
backups and data recovery.
Next: Reliability