Operating Systems and Benchmarks - Part 2
  March 13, 2000 Author: Eugene Ra  

The Importance of Benchmarks

We've often received this question, whether through e-mail, survey forms, or the Discussion Forum:

"Guys, why don't you use some real-world tests like [insert your choice of: bootup, application load times, or file copy tests]?"

The implication being, of course, that somehow, those tests are more reflective of "real world usage" than, say, WinBench 99's Disk Suites.

Since when have simplistic operations such as file copies ever represented "real world performance?" File copies rely heavily on the OS's caching and read/write strategy to the point where actual drive performance is eclipsed. Bootup is a procedure heavy into driver initialization. In the course of booting Windows 2k, there are many pauses where the drive obviously is undergoing little activity. Drive speed is not the dominant factor. Often, the same thing is true when loading up an application. Any application that truly relies on pure drive speed opens up so fast these days that human reflexes on a stopwatch would become significant.

The best tests, of course, are common operations performed in a given application set. Doing so, again with human reflex/stopwatch is not as easily done as folks initially perceive. In addition to standardizing human influence, normalizing the tests themselves so that the exact same operations occur under the exact same conditions is daunting, and not feasible in our opinion. The choice of applications and weighting of scores into an aggregate total would also no doubt start a fiery discussion.

Macroing a standardized set of applications with a competent program for playback in the same sequence under the same conditions would be the best way to measure total system performance. After all, we assume that folks want to know how fast their machine will run applications and not just copy files. Believe it or not, pre-packaged programs doing just that are available for use: Content Creation Winstone 2k and Winstone 99. Yes, the benchmarks much maligned by file-copy advocates.

One can take it a step further: While Winstone 99 offers high-level benchmarks, it measures total system responsiveness, somewhat obscuring a drive's individual impact on a larger gestalt. It can be argued that this is the only way to measure the addition of any component. Many, however, wish to see the individual component isolated for ideal comparison. After all, a few minutes of drive activity diluted over an hour of application use will result in disk performance differences being washed away. Those same few minutes, on the other hand, will be quite noticed by the user when disk access occurs. A program such as Intel's IPEAK Toolbox can be used on an application sequence to isolate disk activity for standardized playback later. Of course, something like this already exists too: WinBench 99's Disk WinMarks, which are simply the disk access patterns of the Winstone WinMarks in isolation.

Thus, while far from perfect, WinBench 99's Disk WinMarks are among the best-available scientific, standardized approaches to drive testing. Over the last two years, we've used over 90 ATA and SCSI drives in our personal systems. We can attest through this sheer experience that performance and responsiveness as a whole correlate much more to WinBench than it does to file copies or other so-called "real world" measures.

That said, there are some legitimate concerns raised that WinBench, while accurate for testing the application workload it purports to measure, provides too light of a load on the system to represent performance on a more general basis. Further, as a given release of WinBench ages, manufacturers become better at "tuning" drive performance to reflect high numbers without a corresponding increase in actual performance. It is for this reason that we've decided to deploy comprehensive IOMeter tests along side the more traditional WinBench 99.

