All of the discussions concerning RAID performance--both here and elsewhere--are based
on the assumption that the array is operating normally, with all drives functioning.
Redundancy-enabled RAID solutions provide the ability for the system to continue even one
of the drives in the array has failed. However, when this occurs, performance is
negatively affected; the array is said to be operating in a degraded state when
this happens (some manufacturers may call this a critical state or use another
synonym). The impact on performance depends on the type of RAID used by the array, and
also how the RAID controller reacts to the drive failure.
When an array enters a degraded state, performance is reduced for two main reasons. The
first is that one of the drives is no longer available, and the array must compensate for
this loss of hardware. In a two-drive mirrored array, you
are left with an "array of one drive", and therefore, performance becomes the
same as it would be for a single drive. In a striped array
with parity, performance is degraded due to the loss of a
drive and the need to regenerate its lost information from the parity data, on the fly, as
data is read back from the array.
The second reason for degraded operation after a drive failure is that after the
toasted drive is replaced, the data that was removed from the array with its departure
must be regenerated on the new disk. This process is called rebuilding. A
mirrored array must copy the contents of the good drive over to the replacement drive. A
striped array with parity must have the entire contents of the replacement drive replaced
by determining new parity information (and/or replacement data calculated from parity
information) for all the data on the good drives. Clearly, these procedures are
going to be time-consuming and also relatively slow--they can take several hours. During
this time, the array will function properly, but its performance will be greatly
diminished. The impact on performance of rebuilding depends entirely on the RAID
level and the nature of the controller, but it usually affects it significantly. Hardware
RAID will generally do a faster job of rebuilding than software RAID. Fortunately,
rebuilding doesn't happen often (or at least, it shouldn't!)
Many RAID systems give the administrator control over whether the system does automatic
or manual rebuilds. In an automatic configuration, the array will detect the
replacement of the dead drive and begin rebuilding automatically on the new one--or it may
start the rebuild as soon as the bad drive fails if the array is equipped with hot spares. In manual mode, the administrator must tell
the system when to do the rebuild. Manual is not necessarily "worse" than
automatic, because if the system is not one that runs 24 hours a day, 7 days a week, the
administrator will often prefer to wait until after hours to rebuild the array, thereby
avoiding the performance hit associated with the rebuild. However, take the following
warning into account as well...
Warning: Most regular
RAID arrays using mirroring or striping with parity are in a vulnerable state when they
are running in degraded mode. Until the offending drive is replaced and rebuilt, they
provide no data protection. Do not excessively procrastinate rebuilding a
degraded array; even if you are just waiting for the end of the day, recognize that you
are taking risks in doing so.
Note: None of this
section applies when you are doing striping without parity (RAID 0). When a drive in a
RAID 0 array fails, performance doesn't degrade, it comes to a screeching halt.
The
reason is that RAID 0 includes no redundancy information, so the failure of any drive in
the array means all the data in the array is lost, short of heroics.
See here for more details.
Next: RAID
Reliability Issues