No discussion of managing a hardware system would be complete without mentioning
maintenance. At least, it shouldn't be!
Here I will only discuss maintenance as it applies specifically to RAID arrays.
That said, there isn't a lot to say.
RAID arrays don't generally require a lot in
the way of regular preventive maintenance. You do need to maintain your server hardware,
and the array should be part of that, but that's really all you need to do under normal
circumstances: No special maintenance is required for RAID controllers or drives. At the
same time, many high-end controllers do offer advanced maintenance features which can be
useful, such as the following:
- Consistency Checking: This important feature will proactively check the
data on a RAID array to ensure that it is consistent, meaning that the array data is
correct and has not become corrupted. It is especially useful for RAID levels that use
striping with parity, as it will check for any situations where the parity information in
a stripe has become "out of sync" with the data it is supposed to match (which
shouldn't happen in practice, but you know how Mr. Murphy works...) It will of course also
correct any problems it discovers.
- Spare Drive Verification: If you are using hot spares, they will tend
to sit there for weeks or months on end unused. This feature, if present, will check them
to ensure they are in good working order.
- Internal Diagnostics: Some better RAID controllers may include routines
to periodically check their own internal functions and ensure that they are working
properly.
Now, let's take a look at service and support issues with RAID.
In fact, there isn't anything different about service and support of RAID hardware
than any other hardware. The difference is that RAID arrays are usually employed in
servers used by many people, or in other critical situations which require a minimum of
down-time. This means a failure that takes down an array can quickly cost a lot of money,
lending an urgency to RAID array service that may not be present for other PCs.
There is no way to avoid down-time entirely unless you spend a truly staggering amount
of money (and even then, you should "expect the unexpected".) If a failure
occurs, you want to get it corrected as soon as possible, and that means you are reliant
to some extent on whatever company is supporting your hardware. If uptime is critical to
your application, then in addition to incorporating fault tolerance into your RAID setup,
you should purchase an on-site service contract covering your system(s). Be sure to look
at all the conditions of a service contract to be sure you understand what it covers--and
what it doesn't. If you need immediate response in the event of a hardware fault be sure
that the contract specifies that--you'll certainly pay more for it, but you have to weigh
that against the cost of an entire company "waiting for the system to come back
up".
Note: Another
important issue to keep in mind when considering service and support of your RAID array,
is that some arrays "insist" upon having failed drives replaced with identical
models. You want the company that supplies you with hardware to be able to provide you
with new drives of the appropriate type for the life of the array.
Next: Advanced
RAID Features