Benchmarking hardware is a form of objective performance measurement; it is
measurement based on logic and analysis, as opposed to subjective measurement, which is more of a "feel" method of
gauging performance. Benchmarking is typically done using benchmark programs specifically
developed for the purpose of measuring hard disk performance.
There are many different programs used to ways to benchmark hard drives, and they
generally fall into the following different categories:
- High-Level (Application-Derived) Benchmarks: These are programs that
use code from popular applications software--usually office applications, web browsers and
the like--to simulate the impact of hard disk performance on the use of those applications
in the real world. The basic idea is to run a suite of tests that are comprised of typical
actions that a user would take while using those applications, and time how much time
elapses with the hardware in question. Then the hardware is changed and the test run
again. This is generally a good concept for benchmarking, but only has relevance if you
are actually using the types of applications around which the benchmarks is designed. If
you are primarily a gamer for example, what do you care about the performance of
spreadsheet software? Also, since the benchmark is running at a high level, there is a lot
of room for interference from operating system and file system issues. One of the most
common benchmarks of this type is the ZDNet WinBench series.
- Low-Level (Synthetic) Benchmarks: These programs attempt to test the
hard disk directly, isolating it as much as possible from the rest of the system. They are
often called "synthetic" because they don't try to reproduce the access patterns
of real applications, instead using artificial patterns created by the programmers
specifically for the benchmark, to test different types of hard disk use. They are often
derided as being unrealistic because of their synthetic nature, and much of this criticism
is in my opinion accurate. At the same time however, they provide much more control over
the test process than application-derived benchmarks. This control lets you better
"focus in" on one particular aspect of performance and more accurately compare
different hardware units in a number of different areas. Common disk benchmarks of this
variety include Adaptec's Threadmark and Intel's IOMeter.
- "Real-World" Benchmarks: These are not "formally"
benchmarks, but are commonly used by hardware enthusiasts to compare real-world
performance of hard disks. The idea is simple: take something that you do often, measure
how long it takes with one drive, and then how long it takes with another. For example, if
you have a system with an old hard disk that is very slow to boot up, measure how long it
takes and then repeat the process with a newer disk to see how much things improve. In
some ways these are the most realistic, and also the most relevant benchmarks.
However, they are entirely system-dependent and therefore of no use whatsoever in
communicating much in objective terms about the power of the hardware in question: the
improvement you see between hard disk "A" and hard disk "B" on your
system may be very different than the same hardware used on a friend's PC. Also, these
measurements are usually fairly crude and can't be done on activities that take relatively
little time, since the timing is often done with a regular clock or wristwatch.
As I've said elsewhere, I'm not a big
fan of benchmarks, especially when it comes to hard disks. While they have their uses,
it's too easy to succumb to the temptation to view them as absolute indicators of
performance, to overvalue them and not consider what they really mean, bottom line, for
the typical end user. Small differences in hard disk performance have virtually no
impact on the typical hard disk user. Some people really get carried away, sweating
over every percentage point of their favorite benchmark, as if it were a competition of
some sort (and for some people, I suppose it is--a competition for bragging rights.) Even
leaving the matter of over-emphasizing benchmarks aside, there are some common
"benchmark traps" I see all the time:
- Poor Control Of Environmental Factors: The only way to properly compare
two pieces of hardware is to test them under identical conditions. Even seemingly
irrelevant issues can influence the outcome. Most better hardware sites understand this,
but many individual enthusiasts do not. The exact number you get from testing one drive on
your system can be very different from the number someone else gets with the same drive,
without this meaning anything is "wrong".
- Small Sample Size: All benchmarks have a tendency to produce different
numbers if you run them more than once. To properly use a benchmark it must be run several
times and the results averaged. It's even better to run at least five times and discard
both the highest and lowest score for each piece of hardware.
- Paying No Attention To Cost: You will frequently see people talk about
the "benchmark X" score of one drive versus another, but when's the last time
you saw anyone take the ratio of two drives' respective benchmarks to their current market
prices? I've seen people recommend "drive A" over "drive B" due to a
difference in performance of well under 10% despite "drive A" costing 50% more
than "drive B". That's rarely money well-spent.
- Benchmark (In)Validity: It's not uncommon to see a particular benchmark
be used for a long time by many people... and then it is discovered that due to a flaw in
how it is written, or the manner in which it interacts with the hardware, operating system
or drivers, that its results were inaccurate or misleading. Another reason to use
benchmarks only as guidelines.
Next: Subjective Performance Measurement