Much the way different RAID implementations vary in the degree to which they improve read and write performance, they also differ in
how much they affect different types of accesses to data, both read and write.
Data access to a single hard disk involves two discrete steps: first, getting the heads of
the hard disk to exactly where the data is located, which I call positioning; and
second, transporting the data to or from the hard disk platters, which I call transfer.
Hard disk performance depends on both of these; the importance of one relative to the
other depends entirely on the types of files being used and how the data is organized on
the disk. For more on positioning and transfer, and related issues, see this section (if you don't understand these two
concepts reasonably well, what follows may not mean as much to you as it should...)
The use of multiple hard disks in a RAID environment means that in some cases
positioning or transfer performance can be improved, and sometimes both. RAID levels are
often differentiated in part by the degree to which they improve either positioning or
transfer performance compared to other levels; this becomes another factor to be weighed
in matching the choice of RAID level to a particular application. It's important also to
realize that the relative performance of positioning or transfer accesses depends also on
whether you are doing reads or writes; often writes have overheads that can greatly reduce
the performance of random writes by requiring additional reads and/or writes to disks that
normally wouldn't be involved in a random write. See
here for more details.
In general terms, here's how the basic storage techniques used in RAID vary with
respect to positioning and transfer performance, for both writes and reads:
- Mirroring: During a read, only one drive is typically accessed, but the
controller can use both drives to perform two independent accesses. Thus, mirroring
improves positioning performance. However, once the data is found, it will be read off one
drive; therefore, mirroring will not really improve sequential performance. During a
write, both hard disks are used, and performance is generally worse than it would be for a
single drive.
- Striping: Large files that are split into enough blocks to span every
drive in the array require each drive to position to a particular spot, so positioning
performance is not improved; once the heads are all in place however, data is read from
all the drives at once, greatly improving transfer performance. On reads,
small files that don't require reading from all the disks in the array can allow a smart
controller to actually run two or more accesses in parallel (and if the files are in the
same stripe block, then it will be even faster). This improves both positioning and
transfer performance, though the increase in transfer performance is relatively small.
Performance improvement in a striping environment also depends on stripe size and stripe width. Random writes are
often degraded by the need to read all the data in a stripe to recalculate parity; this
can essentially turn a random write into enough operations to make it more resemble a
sequential operation.
Which is more important: positioning or transfer? This is a controversial subject in
the world of PCs, and one I am not going to be able to answer for you. This discussion of ranking performance specifications
is one place to start, but really, you have to look at your particular application. A
simple (but often too simple) rule of thumb is that the larger the files you are
working with, the more important transfer performance is; the smaller the files, the more
important positioning is. I would also say that too many people overvalue the importance
of greatly increasing sequential transfer rates through the use of striping. A lot of
people seem to think that implementing RAID 0 is an easy way to storage nirvana, but for
"average use" it may not help performance nearly as much as you might think.
Next: Stripe Width and Stripe Size