May 16th, 2007 by eugene
Stripe Width and Stripe Size
RAID arrays that use striping improve performance by splitting up files into small pieces and distributing them to multiple hard disks. Most striping implementations allow the creator of the array control over two critical parameters that define the way that the data is broken into chunks and sent to the various disks. Each of these factors has an important impact on the performance of a striped array.
The first key parameter is the stripe width of the array. Stripe width refers to the number of parallel stripes that can be written to or read from simultaneously. This is of course equal to the number of disks in the array. So a four-disk striped array would have a stripe width of four. Read and write performance of a striped array increases as stripe width increases, all else being equal. The reason is that adding more drives to the array increases the parallelism of the array, allowing access to more drives simultaneously. You will generally have superior transfer performance from an array of eight 18 GB drives than from an array of four 36 GB of the same drive family, all else being equal. Of course, the cost of eight 18 GB drives is higher than that of four 36 GB drives, and there are other concerns such as power supply to be dealt with.
The second important parameter is the stripe size of the array, sometimes also referred to by terms such as block size, chunk size, stripe length or granularity. This term refers to the size of the stripes written to each disk. RAID arrays that stripe in blocks typically allow the selection of block sizes in kiB ranging from 2 kiB to 512 kiB (or even higher) in powers of two (meaning 2 kiB, 4 kiB, 8 kiB and so on.) Byte-level striping (as in RAID 3) uses a stripe size of one byte or perhaps a small number like 512, usually not selectable by the user.
Warning: Watch out for
sloppy tech writers and marketing droids who use the term "stripe width" when
they really mean "stripe size". Since stripe size is a user-defined parameter
that can be changed easily--and about which there is lots of argument
--it is far more
often discussed than stripe width (which, once an array has been set up, is really a
static value unless you add hardware.) Also, watch out for people who refer to stripe size
as being the combined size of all the blocks in a single stripe. Normally, an 8
kiB stripe size means that each block of each stripe on each disk is 8 kiB. Some people,
however, will refer to a four-drive array as having a stripe size of 8 kiB, and mean that
each drive has a 2 kiB block, with the total making up 8 kiB. This latter meaning
is not commonly used.
The impact of stripe size upon performance is more difficult to quantify than the effect of stripe width:
- Decreasing Stripe Size: As stripe size is decreased, files are broken into smaller and smaller pieces. This increases the number of drives that an average file will use to hold all the blocks containing the data of that file, theoretically increasing transfer performance, but decreasing positioning performance.
- Increasing Stripe Size: Increasing the stripe size of the array does the opposite of decreasing it, of course. Fewer drives are required to store files of a given size, so transfer performance decreases. However, if the controller is optimized to allow it, the requirement for fewer drives allows the drives not needed for a particular access to be used for another one, improving positioning performance.
Tip: For a graphical
illustration showing how different stripe sizes work, see the discussion of RAID 0.
Obviously, there is no "optimal stripe size" for everyone; it depends on your performance needs, the types of applications you run, and in fact, even the characteristics of your drives to some extent. (That's why controller manufacturers reserve it as a user-definable value!) There are many "rules of thumb" that are thrown around to tell people how they should choose stripe size, but unfortunately they are all, at best, oversimplified. For example, some say to match the stripe size to the cluster size of FAT file system logical volumes. The theory is that by doing this you can fit an entire cluster in one stripe. Nice theory, but there's no practical way to ensure that each stripe contains exactly one cluster. Even if you could, this optimization only makes sense if you value positioning performance over transfer performance; many people do striping specifically for transfer performance.
|
A comparison of different stripe sizes. On the left, a
four-disk RAID 0 array with a stripe size |
So what should you use for a stripe size? The best way to find out is to try different
values: empirical evidence is the best for this particular problem. Also, as with most
"performance optimizing endeavors", don't overestimate the difference in
performance between different stripe sizes; it can be significant, particularly if
contrasting values from opposite ends of the spectrum like 4 kiB and 256 kiB, but the
difference often isn't all that large between similar values. And if you must
have a rule of thumb, I'd say this: transactional environments where you have large
numbers of small reads and writes are probably better off with larger stripe sizes (but
only to a point); applications where smaller numbers of larger files need to be read
quickly will likely prefer smaller stripes. Obviously, if you need to balance these
requirements, choose something in the middle. ![]()
Note: The improvement
in positioning performance that results from increasing stripe size to allow multiple
parallel accesses to different disks in the array depends entirely on the controller's
smarts (as do a lot of other things in RAID). For example, some controllers are designed
to not do any writes to a striped array until they have enough data to fill an entire
stripe across all the disks in the array. Clearly, this controller will not
improve positioning performance as much as one that doesn't have this limitation. Also,
striping with parity often requires extra reads and writes to maintain the integrity of
the parity information, as described here.
