At long last, StorageReview.com finally debuts its third-generation testbed! This comprehensive overhaul brings changes to the hardware, operating systems, benchmarks, and methodology behind SR. Consistent test platforms featuring a minimum of change have been the hallmark of SR over the last three years. This is as close as it gets to required reading for both site regulars as well as the occasional visitor. How will future drives be judged? What motives exist behind the changes? What do the new benchmarks measure? How do more than 20 of today’s drives stack up under this new scrutiny? All this and more is answered in this sweeping article. You can’t afford to miss this one!
When we initially opened up shop here at StorageReview.com, our “testbed,” the constant machine that we center our hard drive reviews around, was a 266 MHz Pentium II on 440LX chipset equipped with 64 MB of PC66 SDRAM. At that time (1Q 1998), the 333 MHz P2 was just hitting the streets as the premiere desktop CPU. ATA-33 and Ultra2 SCSI were new standards… we were blissfully on our way. This machine, affectionately called “Hoss” due to its incredibly heavy (and ridiculously expensive) steel case, carried us through 1998 and much of the next year. As autumn ’99 approached, however, it became clear that “Testbed1” was getting a bit long in the tooth. Since we prefer to change as many variables as possible when a single switch is necessary, we decided to hold off upgrading the hardware in anticipation of three key elements: the OS that was to merge Microsoft’s Windows 9x and NT series (Windows 2000), Intel’s i820 chipset (the first to feature Intel’s ATA-66 controller) and WinBench 2000. Our projected deployment was set for late 1999.
Things didn’t turn out quite the way we planned. In addition to a February 2000 delay, Win2k ended up being an NT-successor rather than the unifier of kernels as Redmond originally touted. The i820 chipset, designed from the ground up to use outrageously expensive RDRAM memory, was incredibly buggy. And perhaps worst of all, there was no WinBench 2000. These setbacks delayed the introduction of our second-generation platform. Hushed whispers spoke of SR being too set in its ways to upgrade, that SR didn’t care and was too lazy to bother. True? No way! As it became increasingly clear that WinBench 2000 was non-existent and that the i820 would never mature into a viable platform, we decided to center Testbed2’s release on Windows 2000’s commercial debut. The site’s second anniversary, March 12 2000, set a perfect target.
This second-generation testbed, dubbed “Millennium,” was built on a 700 MHz Pentium III. Oh sure, to counter AMD’s impending 1 GHz Athlon, Intel had “released” the 1 GHz P3 around then, but it was no where to be seen… nor would it be for half a year. Even the 800 MHz part, ostensibly available, was in short supply. We paid dearly for 700 MHz, the best we could obtain at the time while still prepping for a March 12 release. This testbed was supported by a 440BX based motherboard and 128 MB of PC100 SDRAM. The BX, unfortunately, remained equipped with a southbridge that only supported ATA-33 standards. Thus, we turned to Promise’s Ultra66, the most popular add-on solution, for our reviews. On the SCSI side of things, an upgrade to Adaptec’s Ultra160 card kept us rolling.
A more significant change, however, arrived in our software and methodology. Windows 2000, while not the unified OS it was hyped to be, represented the future of PC computing. To streamline tests and maintain consistency we chose to disregard Windows 98 SE and the impending Windows ME. Since we (as well as many readers) felt that the aging, industry-standard WinBench 99 was getting poorer at assessing hard drive performance, we set out with an additional tool: Intel’s IOMeter.
We regarded TB2 as something that would hopefully carry us for another two full years after which, centered on SR’s fourth anniversary, we would debut the third-generation testbed… or testbeds. The name “Testbed3” carried a dual meaning. It would not only be the third successive hard disk platform but would also be the time where we deployed identical configurations and installations to all our reviewers (along with myself in hard disks, associate editors Tim Zakharov and Terry Baranski later joined us to cover CD/CDR/DVD and controller performance respectively).
Setup-wise, a platform upgrade again hinged upon several key features. The first, of course, was the release of MS’s next OS, Windows XP… this time (really!) purported to unify their consumer and professional kernels. Second was the availability of consumer level motherboards that supported 64-bit cards and perhaps 66 MHz operation. Finally, the ever-tantalizing Serial ATA and Ultra320 interface upgrades made up criterion number three.
About eight months ago, right around SR’s third anniversary, we embarked on an extended research project. What factors are involved in hard drive speed? Were the requirements for optimum desktop and server performance more similar to or more different from each other? Were SR’s current tools delivering dependable performance predictions? We (as well as many readers) believed we already had the answers to this… and boy, were we off. Let us state right up front that we’ve gained a tremendous amount of knowledge in hard drive performance assessment over these past eight months. In the interests of truth, objectivity, and accuracy, we’ve thrown aside pride, preconceptions, and biases to get to the bottom of drive performance. We hope readers will be just as willing to accept new measurement ideas founded in the pool of evidence that follows later in this article.
As our experimental research progressed, it became clear that we had a duty to revise our methodologies even if the hardware and OS to support Testbed3 wasn’t quite there yet. We hatched a plan for “Testbed 2.5” where we’d refresh our methodologies while keeping hardware and possibly the OS the same. As it would be, however, fate intervened. A month ago, the imaged software installation that we came to depend on over the course of more than 1.5 years became corrupted. That image contained updates, driver revisions, etc. that were no longer available and thus contained a software installation that could no longer be replicated. Then, incredibly, while flashing the machine’s motherboard in preparation for a Testbed 2.5 update, TB2’s power supply failed, leaving the motherboard’s flash ROM in an unsalvageable state. The message is clear: The future is now. It’s time for Testbed3!
Due to both financial realities and industry delays, TB3 is not the originally-intended unified platform deployed across three different reviewers. Rather, it represents a necessary and overdue migration to a new hard drive test platform. Revised optical and controller testbeds are to follow in the near future.
We’re very proud of the evaluation suite that we’ve assembled for Testbed3. The research, methodology, design, and programs that we’re debuting represent over 1000 man-hours of effort between just two individuals. As you read over the changes we’ve made, we hope you’ll agree that SR and readers alike are destined for a new era in hard drive judgment.
Oh, the name of our third-generation Testbed — “Renaissance.”
A qualified apology to the enthusiasts within our community- Testbed3 is centered on a 2 GHz Pentium 4. Believe us, at the time of this writing, StorageReview.com is financially strapped and it’s painful to shell out the big bucks required to assemble a platform that may not be the fastest around. These same enthusiasts, however, should realize that they do not make up the majority of SR readers and that in the corporate IT world, Intel’s platforms command a dominant market share. Regardless of its comparative performance (or lack thereof) against AMD’s Athlon XP, the fastest P4-based system stands the best chance of being most representative of the majority of our readers over the next two years. While the Intel southbridge undoubtedly makes its presence felt, it’s otherwise hard to believe that a decision on a top Intel CPU rather than an AMD processor creates a noticeable divergence in results. With that said, let’s take a closer look at Testbed3’s components.
Motherboard: Intel Desktop Board D850MV; Bios Revision P03; Native Windows XP Driver
At the time of purchase, Intel was the only manufacturer offering a Socket-478 motherboard that utilized the i850 chipset. Since we’re interested in stability rather than overclocking and other enthusiast-related features, Intel’s board harbors no drawbacks. As is the case with all of TB3’s hardware, the motherboard utilizes Windows XP’s native chipset driver.
Processor: Intel Pentium 4 2.0 GHz, Retail Heatsink and Fan
The 2 GHz Pentium 4 is currently the fastest non-server chip available from Intel. Yes, the Athlon XP is faster. Yes, we paid a bundle for this chip. Yes, the Northwood core is right around the corner. Since the bulk of StorageReview.com’s readership comes from the IT/enterprise sector, and for whatever it’s worth, since Intel still rules in that field, the fastest Intel chip available today will hopefully carry us through Testbed4 while remaining as representative of reader systems as possible.
RAM: 2 256 MB PC800 RDRAM RIMMs
Two expensive sticks of RDRAM equip Testbed3 with 512 MB of RAM, a standard quantity for today’s mid- to high-level machines.
Display Adapter: Visiontek GeForce2 MX200; Native Windows XP Driver
Though we were planning to stick with Testbed2’s ancient Voodoo3 AGP display adapter, readers pointed out that Intel’s board requires a more contemporary 1.5V video card. We went with Visiontek’s MX200, about as inexpensive as display adapters get these days.
ATA Host Adapter: Intel 82801BA; Native Windows XP Driver
Built into the motherboard, Intel’s standard southbridge includes two ATA-100 channels. This updates Testbed3 to the latest currently available and universally recognized ATA interface. Important Note: We’ve stuck with Windows XP’s native driver rather than installing Intel’s “Application Accelerator,” i.e., Intel’s busmastering drivers.
SCSI Host Adapter: Adaptec ASC-29160; Bios Revision v3.10; Native Windows XP Driver
The trusty Adaptec 29160 that carried us through Testbed2 returns in TB3. We’ve utilized the break to update the adapter to the currently available BIOS revision.
Sound Card: Creative Labs SoundBlaster Live!; Native Windows XP Driver
Though the Intel motherboard features built-in AC97 sound, we stuffed in a spare SB Live!. Perhaps it’ll be useful in downtimes where the testbed isn’t active and the machine is used for other purposes.
Network Interface Card: Intel Pro/100+ Management Adapter; Bios v2.2; Native Windows XP Driver
Again, while not actually used in system setup or benchmark tests, an installed NIC best simulates a machine built for use in the typical IT workplace and is useful in non-testbed situations.
Boot Drive: Seagate Barracuda ATA IV ST380021
Let’s face it: while it may not be the performance star that Seagate’s press release’s claims, the ‘Cuda IV lives up to every bit of hype when it comes to noise, or lack of it. Subjectively speaking, it’s the quietest drive in our possession.
CD-ROM: Plextor UltraPlex Wide, Firmware Revision v1.04
Another TB2 holdover, the UltraPlex Wide remains today’s premiere SCSI reader. It serves its purpose of installing Windows and a few benchmarks well enough.
Floppy Drive: Sony FDD 1.44
Originally Testbed2’s floppy drive. It’s primarily used to transfer data files generated by various benchmarks.
Keyboard: NMB RT8255TW+
This one isn’t a remnant from TB2… it’s from Testbed1! Its “clicky” feel can’t be beat.
Mouse: Microsoft IntelliMouse w/IntelliEye
The basic ball-less USB mouse. And yes, it was used on TB2 also.
Power Supply: Antec PP412X
Pentium 4 CPUs require, among other things, a new power supply. We chose Antec’s unit for its reputation of silence. With its variable-speed fan, it’s not much louder than the ultra-quiet supply used in Testbed2.
Case: PC Power & Cooling Personal Mid-Tower
Though a bit cramped, we decided to stick with the PCP&C’s Personal Mid-Tower case used for Testbed2. Keeping the same case preserves as seamless as possible a setup for our subjective heat and noise judgments.
Drive Cooler: California PC Products 5.25″ Bay Cooler
Though we originally tried to obtain more PC Power & Cooling BayCool units, astoundingly bad customer service issues drove us to California PC Products, one of PCP&C’s OEM manufacturers. The coolers we received from them sport fans that are a bit louder than the originals.
Assembly and Installation
Assembling Testbed3 was a piece of cake. The two RIMMs went into slots 0 and 1 on the D850MV motherboard. Visiontek’s GeForce2 MX was, of course, installed in the AGP slot. Into the second PCI slot down from the AGP card went the Adaptec 29160. Four slots down, our Intel NIC was installed. Finally, the SoundBlaster Live found a home in the fifth slot.
Testbed3’s boot drive, a Barracuda ATA IV, was installed in the lower-most 3.5″ drive bay. We divided its 80 gigs of capacity into two partitions: a 27 GB NTFS installation partition and a 53 GB FAT32 archives partition set to hold backup images of the machine’s software install as well as miscellaneous data files. Both partitions were formatted with default cluster sizes. The machine’s floppy drive rests two bays above the ‘Cuda. To effect consistent drive temperature measurements (more on this later), the machine’s CD-ROM was moved from the top-most 5.25″ bay to the bottom of the three. Evaluation hard disks, installed in our CalPC cooler, go into the top slot.
After rebooting, we disabled virtual memory. Three boots later (i.e., the fifth bootup), we installed Ziff-Davis’ WinBench 99, Intel’s IOMeter, and SCSI Explorer, a component of Adaptec’s EZ-SCSI 5.0. Upon the 6th bootup, we set Windows Explorer to unhide file extensions and to show hidden files. Finally, one reboot later, we defragmented the installation.
This final install was then sector-imaged utilizing PowerQuest’s DriveImage 5.0 to the archives partition. Though it’s been touched on before, we’d like to emphasize the following key decision: while many updates were available, all devices utilize stock Windows XP drivers. Neither XP itself nor any peripherals were updated to newly available software. This permits easy recreation of TB3’s software installation… even a year or two later.
The Importance of Benchmarks
We’d like to take a moment to quote a piece that was originally found in Testbed2’s introduction:
We’ve often received this question, whether through e-mail, survey forms, or the Discussion Forum:
“Guys, why don’t you use some real-world tests like [insert your choice of: bootup, application load times, or file copy tests]?”
The implication being, of course, that somehow, those tests are more reflective of “real world usage” than, say, WinBench 99’s Disk Suites.
Macroing a standardized set of applications with a competent program for playback in the same sequence under the same conditions would be the best way to measure total system performance. After all, we assume that folks want to know how fast their machine will run applications and not just copy files. Believe it or not, pre-packaged programs doing just that are available for use: Content Creation Winstone 2k and Winstone 99. Yes, the benchmarks much maligned by file-copy advocates.
One can take it a step further: While Winstone 99 offers high-level benchmarks, it measures total system responsiveness, somewhat obscuring the drive’s impact on a larger gestalt. It can be argued that this is the only way to measure the addition of any component. Many, however, wish to see the individual component isolated for ideal comparison. After all, a few minutes of drive activity diluted over an hour of application use will result in disk performance differences being washed away. Those same few minutes, on the other hand, will certainly be noticed by the user when disk access occurs. A program such as Intel’s IPEAK Toolbox can be used on an application sequence to isolate disk activity for standardized playback later. Of course, something like this already exists too: WinBench 99’s Disk WinMarks, which are simply the disk access patterns of the Winstone WinMarks in isolation.
Thus, while far from perfect, WinBench 99’s Disk WinMarks are among the best-available scientific, standardized approaches to drive testing. Over the last two years, we’ve used over 90 ATA and SCSI drives in our personal systems. We can attest through this sheer experience that performance and responsiveness as a whole correlate much more to WinBench than it does to file copies or other so-called “real world” measures.
That said, there are some legitimate concerns raised that WinBench, while accurate for testing the application workload it purports to measure, provides too light of a load on the system to represent performance on a more general basis. Further, as a given release of WinBench ages, manufacturers become better at “tuning” drive performance to reflect high numbers without a corresponding increase in actual performance. It is for this reason that we’ve decided to deploy comprehensive IOMeter tests along side the more traditional WinBench 99.
With 20/20 hindsight, it’s easy for us to see that we were on the correct path all the way up to the final key paragraph that introduces IOMeter. But, painful as it is to admit, IOMeter suffers from a critical drawback that prevents it from accurately assessing single-user machine performance: lack of locality restrictions. “Locality” is a phenomenon of today’s modern operating systems and software that describes the tendency of very short disk accesses to far outweigh the occasional broad stroke. In other words, a drive’s actuator remains positioned within a very tight group of cylinders, completes as much business as it can in the area, and then and only then turns to servicing another request that demands significant repositioning.
What ramifications does locality have on drive performance? It means that short-seek performance and especially buffer size in conjunction with caching strategy have a much larger effect on net hard drive performance than most folks realize. In fact, a drive’s buffer and its accompanying read-ahead and write-back caching strategies exert more influence on net single-user performance than spindle speed, seek time, or transfer rates! A corollary accompanies this admission: If buffers and caching strategies effect significant performance increases, then buffer hits often occur. And if buffer hits occur often, buffer-to-host transfer rates, otherwise known as burst transfer rates, are significant. Whew, still with us?
Now then, where does this leave us with benchmarks? Surely there must be a tool out there that has good shot at appraising single-user hard drive performance in today’s environments. IOMeter, unfortunately, is out of the picture. Those who’ve used IOMeter are undoubtedly familiar with the various sliders that allow the user to set certain attributes of an access pattern, such as the percentage of sequential accesses vs. random accesses and the percentage of reads vs. writes. IOMeter unfortunately lacks one key setting: “Restrict xx% of access to within yyyyy sectors.” Such a parameter would permit, say, 50% of accesses to occur within 8 megabytes of the last request… effectively simulating locality. As it stands, however, IOMeter is tilted towards accesses that span the entire drive without regards to locality- a pattern that much more closely simulates multi-user server environments rather than a single-user desktop/workstation.
What about that old standby, WinBench 99? Drawn from actual high-level application runs, WB99 in many ways remains a compelling performance measure. Two key drawbacks, unfortunately, hamper WB99’s effectiveness in today’s settings. First, WinBench’s Disk WinMarks, while representing actual disk usage, were drawn from dated operating systems utilizing older Winstone tests (i.e., old applications). Further, and perhaps more alarming, WB99 is “double impacted” by system configuration in both its one-time record stage and every playback iteration. The Disk WinMarks, drawn from the Business and High-End Winstone 99s, were molded and shaped by the recording machine’s CPU, RAM, OS, cache, and file system. Then, when played back, the Disk WinMarks are again affected by the playback system’s CPU, RAM, OS, OS cache, and file system. This “double counting” maligns WB99’s well-meaning intentions of exacting application-level disk playback.
Where can we turn now? Coincidentally, another tool from Intel paves the way to a new level of storage subsystem performance measurement- the Intel Performance Evaluation and Analysis Kit: Storage Performance Toolbox v3.0… or IPEAK SPT for short. Though it was an expensive program ($800) and though it was discontinued in February 2001 (to be integrated in an upcoming new performance evaluation suite from Intel), IPEAK nevertheless opens up a new world of benchmarking. Let us now take a look at the latest SR tool…
Introduction to IPEAK SPT
IPEAK SPT delivers a wide-ranging suite of utilities that assist in assessing both drive performance and workload characteristics. It consists of five primary components:
WinTrace32 – IPEAK SPT’s fundamental tool is WinTrace32, a memory-resident background program that captures all OS and file system calls to a disk controller’s driver. It permits the capture of any arbitrary workload and dumps the results into a raw “trace file” that may then be utilized by several other IPEAK components.
AnalyzeTrace – Traces captured through WinTrace32 may be opened with AnalyzeTrace, a utility that delivers comprehensive reports on the various attributes of the given workload. Information on characteristics such as average transfer sizes, queue depths, seek distances, and more are all revealed by this nifty utility.
AnalyzeLocality – Raw trace files may also be run through another program, AnalyzeLocality, to reveal a given workload’s “Locality” characteristics- the relationship between disk access, their relative locations, and the time it takes for them to complete.
RankDisk – In our view, this is the most exciting tool of all. RankDisk takes raw trace files generated through WinTrace32 and systematically plays back the requests from the controller on downwards. By doing so, this benchmark permits comparative evaluation of the storage subsystem (driver, controller, and drive) while avoiding WinBench 99’s drawbacks.
AnalyzeDisk – A bit different from the above components in that it doesn’t utilize WinTrace32 captures, AnalyzeDisk is a benchmark that looks at various low-level characteristics of hard drives.
Note that IPEAK SPT, while delivering reliable results under Windows 2000 and Windows XP, nonetheless brings some quirks to the table. Most notable is that once IPEAK is installed within a given setup, a new drive can not be added without an inexplicable windows error. As a result, IPEAK SPT tests (as well as all other per-drive tests) are conducted with a clean re-image of the boot drive.
Let’s first take a look at AnalyzeDisk (AD). As a low-level benchmark, AD delivers some of the results that readers are used to seeing from WinBench 99’s Disk Inspection tests. We’ll cover AD’s more advanced results (not expressible as simple mean values) in an upcoming article. In the mean time, however, AD displaces WB99 as a measure for three key areas.
Distribution of Read Service Times (RST) – This test measures the time it takes for 25,000 random single-sector read requests to complete across the breadth of the entire drive. It is, in effect, an access time test and yields a nice, clean, single-number score through its “Average Value” statistic. It also features two key advantages over WB99. First, a 25,000 request base delivers a much larger sample size. Second, RST sorts the accesses into various “bins” depending on how long the accesses take. SR’s tests will standardize on a 0.5 ms breakdown. An example of a Read Service Time graph:
Western Digital Caviar WD1000BB-SE – Distribution of Read Service Times
Average Value = 13.6 ms
Note: Scores on top are better.
There are no big surprises here. As one would expect, read service times equate very closely to the access times reported by WinBench 99.
Distribution of Write Service Times (WST) – A counterpart to the RST test, WST conducts 25,000 random single-sector writes to deliver an average write access time… something WB99 doesn’t do. Unlike random reads, random writes are significantly impacted by today’s write-back caching strategies. To deliver a pure random write access time score, the test unit’s write caching is disabled through driver features in Windows XP’s system properties. An example of a Write Service Time graph:
Western Digital Caviar WD1000BB-SE – Distribution of Write Service Times
Average Value = 14.4 ms
Note: Scores on top are better.
Write access times tend to lag read access times by slight margins, though in the case of Seagate’s U6 the gap expands to a whopping 5 milliseconds. One drive, Quantum’s Fireball Plus AS, actually features an average write access time lower than its reads.
Read CPU Utilization – For Testbed2 CPU utilization tests, WinBench 99 by default issued consecutive 16KB read requests to maintain a rate of 4 MB/sec. The CPU’s average load was then sampled and presented as WB99’s CPU Utilization score. Now over 1.5 years later, today’s slowest drives still maintain over 15 MB/sec on their inner tracks. We were all set to raise WB99’s constant transfer rate during this test to 17.5 MB/sec when we thought- why stop there? In today’s DMA world, it’s merely the issuing of a request rather than the actual transfer of data that occupies the processor. Neither the size of the requested blocks nor the achieved transfer rate affects CPU utilization. AnalyzeDisk’s Read CPU Utilization test measures the CPU’s load at a user-specified request rate. We’ve standardized on 10,000 requests per second… a load equivalent to reading nearly 160 MB/sec in 16 KB blocks.
Note: Scores on top are better.
A marked contrast arises between SCSI and ATA drives here, with SCSI drives averaging less than half the CPU utilization of the average ATA drive. Interesting indeed! Note that we couldn’t obtain consistent results from Seagate’s U6 or Samsung’s drives.
Before running these tests, the system’s bootup process and startup sequence is cleaned up utilizing WinBench 99’s Startup Manager. This step prevents extraneous processes from interfering with the results.
Let’s turn now to the exhilarating possibilities delivered by another IPEAK component, RankDisk!
|WB99’s Disk WinMarks vs. IPEAK’s WinTrace32-RankDisk|
|Business Disk WinMark 99||SR’s IPEAK SPT Tests|
|Recorded Operating System||Windows 98 / Windows NT||Windows XP|
|Recorded File System||FAT-16 / NTFS4||NTFS5|
|Recorded Applications||MS Office 97 + assorted utilites||MS Office XP + assorted utilities|
|Recorded Partition Size||3 GB||30 GB|
|System affects Recording Stage?||Yes||Yes|
|System affects Playback Stage?||Yes||No|
Since our RankDisk tests span a 30 GB partition (formatted with 4 KB clusters), it’s not possible to reliably run the tests on smaller drives. We’ve had to regretfully exclude some current-generation drives (most notably Fujitsu’s 20 GB/platter units… the manufacturer sent us 20- rather than 40-gigabyte versions) from retests. It’s important to keep in mind that RankDisk (like any drive benchmark) measures the driver-controller-drive combo as a whole. Even so, drive performance differences may be isolated by simply keeping the driver and controller constant. SR will present performance differences between hard disks through four separate indices:
StorageReview.com Office DriveMark 2002 – this figure represents a score drawn from playback of a trace file that captured 30 minutes of typical PC use by yours truly. My average, every day use consists of various office and internet applications. These programs include: Outlook XP, Word XP, Excel XP, PowerPoint XP, Calypso (a freeware e-mail client), SecureCRT v3.3 (a telnet/SSH client), CuteFTP Pro v1.0 (an FTP/SSH client), ICQ 2000b (an instant messenger), Palm Hotsync 4.0 (a utility to update my PDA with info on my PC), Gravity 2.3 (a usenet/newsgroups client), PaintShop Pro v7.0 (an entry-level image editor), Media Player v8 for the occasional MP3, and last but certainly not least, Internet Explorer 6.0. These applications were task-switched/multitasked (take your pick between these terms… the distinction has blurred over the last several years) in a typical fashion that mirrors my activity and, we suspect, the activity of many users around the world.
Let’s take a closer look at some of the characteristics of everyday productivity usage when serviced by the recording drive, a Maxtor DiamondMax D740X:
|Average Value = 23.0 KB||Average Value = 1.34 I/Os|
The upper-left graphic represents the transfer sizes found in the SR Office DriveMark 2002. It illustrates that productivity use is highly characterized by small-block accesses, with 4k transfer sizes dominating the chart. Equally interesting information may be found in the upper-right graphic, a distribution of queue depths as a function of % during active drive time. In a finding that may surprise some folks, a drive’s queue depths remain relatively low even when a drive is quite busy. This finding, in fact, has resulted in some key changes in our IOMeter methodology, discussed later in this article. The lower two images are related. The first is a chart that reveals seek distance as a % of total accesses. Note that while nearly 20% of accesses took the actuator more than sixteen-million sectors (in other words, 8 GB) away from its current location, approximately half of all accesses occurred within sixteen-thousand sectors (8 MB) of the previous one… a small distance indeed given today’s areal densities. Further note that 16% of these accesses were serviced as “0” sectors away… in other words, in this representation of typical office/productivity work that features a high percentage of small-block accesses, 16% of requests were sequential in nature. The final chart, representing the percentage of data transferred vs. seek distance, extends on this a bit. Over 25% of requested data was sequential.
StorageReview.com High-End DriveMark 2002 – I personally do not regularly use heavy-duty image, sound, or video editing programs… so we can’t capture my “typical use” with these programs. Ziff Davis/etestinglabs.com, however, has been offering its free High-End (now “Content Creation”) Winstone suite for many years. We decided to capture the disk accesses from Content Creation Winstone 2001 v1.0.2 (the latest version available at TB3’s deployment) to represent our High-End DriveMark test. CCWS2001 delivers disk access from Adobe Photoshop v5.5, Adobe Premiere v5.1, Macromedia Director v8.0, Macromedia Dreamweaver v3.0, Netscape Navigator v4.73, and Sonic Foundry Sound Forge v4.5. The benchmark runs these applications in a predominately serial fashion… not too much of a concern as true high-end editing of large data files requires so many resources that one does not multitask with them as often as with everyday productivity apps.
|Average Value = 69.5 KB||Average Value = 1.40 I/Os|
There are a few notable differences between this “high-end” usage when contrasted to typical office use. First is the preponderance of sequential read and writes as evidenced by the large number of 128 sector accesses in the transfer size chart. Windows breaks up large file requests into 64k segments; the reading and saving of large data files within the applications that comprise CCWS2001 really makes an impact. Queue depths, in spite of the “heavier” access, remain similar. The percentages of data transferred vs. seek distance chart reveals that in these content-creation applications, nearly 40% of data transferred was sequential in nature.
StorageReview.com Bootup DriveMark 2002 – Since we founded SR, we’ve been constantly bombarded with e-mails requesting timings of an OS’s boot procedure, an allegedly “real-world” measure. Personally, we believe that timing a system’s startup just indicates, well, how fast a machine (and, to a lesser extent, a drive) is at bootup. Even so, Windows XP’s boot procedure involves significantly different access patterns and queue depths than found in other procedures. As a result, just for fun, we decided to utilize WinTrace32’s special “startup capture” option to trace my personal machine’s boot pattern. In this procedure, WinTrace32 places a driver at the earliest possible point within the startup procedure. The disk accesses that follow the driver’s load (i.e., virtually the whole startup process) are then logged to a buffer and written out upon the user’s request. The SR Bootup DriveMark 2002 is a capture of my personal machine’s startup into Windows XP after having been used for many days and having been defragmented several times in the period. This trace also includes the initialization and loading of the following memory-resident utilities: Dimension4 (a time synchronizer), Norton Antivirus 2002 AutoProtect, Palm Hotsync v4.0, and ICQ 2000b.
|Average Value = 33.3 KB||Average Value = 2.69 I/Os|
Despite the startup optimizations that Windows XP performs, transfer sizes remain predominately small-block in nature. What’s interesting here are the marked increases in queue depth. While it’s still a far cry from, say, 256 I/Os, the Bootup trace’s average of 2.69 doubles those of the Office and High-End patterns. According to the seek distance graphs, buffer hits come in a bit lower than in other patterns, with long-stroke seeks comprising a large percentage of both the number of seeks as well as the amount of data transferred.
StorageReview.com Gaming DriveMark 2002 – A significant number of readers enjoy applications that don’t fall into any of the categories above: Games! In earlier days we enjoyed sinking hours into the latest strategy game or RPG. These days, however, SR is more than a full time job… our days of avid gaming are sadly over. WinTrace32-RankDisk nonetheless offers an opportunity to examine drive performance in this neglected sector. Unlike the other DriveMarks presented above, the SR Gaming DriveMark 2002 is a normalized average of five different traces rather than the results of a single trace. We played through the following games for approximately half an hour each to obtain five distinct traces: Lionhead’s Black & White v1.1, Valve’s Half-Life: Counterstrike v1.3, Blizzard’s Diablo 2: Lord of Destruction v1.09b, Maxis’ The Sims: House Party v1.0, and Epic’s Unreal Tournament v4.36.
As the Gaming DriveMark consists of five unique traces that are averaged together, detailed analysis of each pattern is beyond the scope of this article. For now, assume that many of these gaming traces mix attributes found within the Office and High-End DriveMarks. Look for charts and comments in an upcoming article.
All of the above DriveMarks were sampled through WinTrace32 on my personal system and software installation built on the following main components:
- Processor: 800 MHz Pentium III
- Memory: 128 MB of PC100 SDRAM
- Motherboard: Abit BF6
- ATA Controller: Built-in ATA-33
- Hard Drive: 40 GB Maxtor DiamondMax D740X, 30 GB partition, NTFS, 4 KB clusters
- Operating System: Windows XP Professional
The amount of RAM will undoubtedly be a point of controversy. I, like most other enthusiasts, in regular use utilize far more than 128 MB of RAM. Even the higher-end machines in today’s department-store brands come equipped with 512 megabytes of memory. Nonetheless, we decided to go with 128 megs to increase the amount of disk accesses. Yes, a machine with more RAM will in all likelihood access the drive much less. Even so, the object here is to present relative performance differences between drives. As applications grow and begin to tax today’s 256 MB and 512 MB systems, the “overflow” disk usage should be similar to the patterns generated on this machine.
Playbacks of these patterns on test drives, of course, occur on Testbed3. RankDisk’s precision and repeatability necessitates only a single trial of each DriveMark. Though RankDisk reports results in terms of average service times (single-digit milliseconds), we’ve chosen to print results in IOs per second, the quotient of 1000 and a drive’s given score. These units feature two advantages. First, it maintains consistency with IOMeter’s primary metric, IOs/sec. Second, it perhaps makes the results easier to swallow for those who irrationally object that RankDisk output measured in “tiny, insignificant” milliseconds somehow can’t be applicable to real use.
Let’s turn to the results and see what we can learn from this new cornerstone of performance measurement!
Desktop DriveMarks, continued
First up, and arguably the most important of the four is the SR Office DriveMark 2002.
Note: Scores on top are better.
Even in the face of a new testbed, new operating system, and new benchmarks, Seagate’s mighty Cheetah X15-36LP retains its performance crown. Its score of 485 IO/sec keeps it at the top of the heap of all current-generation drives. Flexing it muscles and making strong headway is Maxtor’s Atlas 10k III. It’s DriveMark of 455 surpasses 15,000 RPM offerings from both Fujitsu and IBM.
On the ATA side of things, Western Digital proves once again that it remains today’s premiere ATA drive manufacturer. While the modernization of tests leave it trailing today’s top SCSI units, the WD1000BB-SE’s DriveMark of 397 blows away the next closest non-WD competitor, IBM’s venerable Deskstar 60GXP, by over 30%. Maxtor’s DiamondMax Plus D740X, a formidable drive in Testbed2’s Business Disk WinMark 99, loses major ground here. Under this new test the D740X slips into parity with the 60GXP and Seagate’s Barracuda ATA IV.
Overall, segregation in the SR Office DriveMark 2002 is pronounced. 15k RPM drives rule the roost with 10k RPM drives making up the next segment. The distinction between an average 10k drive and a 7200 RPM ATA drive blurs however. Finally, 7200 RPM SCSI drives and 5400 RPM ATA drives bring up the rear when it comes to contemporary desktop performance.
Next up is the SR High-End DriveMark 2002:
Note: Scores on top are better.
No earth-shattering differences between High-End and Office results. Once again Seagate’s monster exerts its dominance by claiming the top spot with its DriveMark of 422. Fujitsu’s 15k MAM slips by the Atlas 10k III to take second-place.
Again dominating within ATA drives is the Caviar WD1000BB-SE. Its score of 376 is untouchable by other ATA drives and even gives the top SCSI disks a run for their money. 7200 RPM SCSI drives along with 5400 RPM ATA units once more bring up the rear.
How do things change in the SR Bootup DriveMark 2002?
Note: Scores on top are better.
Changes smatter the chart here and there. Most interesting is the Fujitsu MAM, which displaces the X15-36LP to claim the Bootup crown. The best showing in ATA drives, unsurprisingly, remains the WD1000BB-SE.
Finally, the SR Gaming DriveMark 2002:
Note: Scores on top are better.
When it comes to games, the X15-36LP again is top dog, though by the narrowest of margins with the Atlas 10k III nipping at its heels. These two drives, along with Fujitsu MAM, stand head and shoulders above all comers. Fourth place goes to the ever-consistent ATA leader, WD’s Caviar WD1000BB-SE. We’re not hardcore gamers, so springing for an ultra-expensive SCSI drive to play games seems a bit excessive. Western Digital’s drive, with its significantly lower price tag, not to mention its high capacity, seems to be the ideal “gamer’s pick”.
Unfortunately we do not have any access to servers running Windows 2000 or XP. And if we did, it would be a risky proposition to install the idiosyncratic IPEAK SPT on a machine that’s more or less mission-critical in nature. Even so, we have a good tool to measure server performance: IOMeter. Though its value in assessing workstation and desktop performance is admittedly of dubious value, the highly random, non-localized patterns exhibited by servers can be simulated well through Intel’s other benchmark.
The StorageReview.com Server DriveMarks
IOMeter will remain SR’s choice to measure server performance…. with some changes. First, the ill-conceived Workstation and relatively arbitrary Database access patterns have been dropped. We’ll retain the Intel predefined File Server Pattern and add another, Web Server.
80% Read, 100% Random
100% Read, 100% Random
The Web Server access pattern differs from the File Server access pattern most significantly with its 100% read pattern (since, with enough memory to avoid swaps, a web server ostensibly does not write much when simply serving up HTML and graphics) and the inclusion of a small % of large block accesses (presumably to simulate large graphics).
We’re going to drop the 256 I/O load. Even busy servers achieve this figure only in burst moments. When considering steady-state loads, 64 I/Os easily represent a very heavy depth indeed.
Though IOMeter delivers a vast array of results, the fact is that most reports are redundant and probably serve to intimidate the casual reader more than they help. Therefore, we’re going report just one result in our standard performance database: average I/Os per second across the various load depths.
These “at a glance” normalized index figures will now incorporate the lighter I/O depths to paint a more balanced picture of server performance. Since heavier loads always turn out higher absolute IO/sec figures (OS-level as well as controller-level reordering permit shorter stroke seeks, though locality doesn’t even begin to approach the situation of a single-user machine), lower load scores will be multiplied up by the average difference exhibited between a given lighter load and the 64k figure. Consider our base of over 20 drives under a 64 I/O File Server load. The sum of all scores is 3943. Then take that base and add up the 1 I/O results… only 2065. Clearly then, if these varying loads were linearly averaged, the 64 I/O score of a given drive would impact the drive’s average score by nearly a 2:1 margin over the 1 I/O figure. The solution? Normalization. Multiply up the single I/O score by a coefficient that equalizes the average single I/O score with 64 I/O results. Taken to its logical conclusion, here are the coefficients (locked from this point forward) drawn from our foundation of 22 drives:
|File Server||Web Server|
Confusing? Don’t worry, we’ll be doing all the math for you. The resulting composite scores will be called the SR File Server DriveMark 2002 and the SR Web Server DriveMark 2002 respectively.
Lets see how drives stack up in the FS DriveMark:
Note: Scores on top are better.
Fujitsu’s 15k RPM MAM bests the Cheetah X15-36LP to claim the top spot in the SR File Server DriveMark 2002. Seagate’s drive nonetheless turns in a close second-place finish. These two drives together separate themselves from the pack in offering today’s premiere file server performance. On downwards, it’s easy to tell where 10k RPM drives end and 7200 RPM disks begin. Drives such as the Quantum Atlas V and the Seagate Barracuda 180 deliver file server performance a decent notch above that of 7200 RPM ATA drives. Only the Barracuda 36ES lags, delivering little more performance than top ATA drives. In the ATA realm, Maxtor’s DiamondMax D740X, a drive that delivered great IOMeter performance in Testbed2, is the best ATA drive you can buy for a file server today. Even the Maxtor is naught but a pale shadow when compared to the 10k RPM drives though.
Note: Scores on top are better.
Again by a close margin the MAM proves to be the best drive around in the SR Web Server DriveMark 2002. This time around, however, IBM’s Ultrastar 36Z15 joins the other 15k RPM drives in setting themselves apart from the pack. ATA drives, on the other hand, display remarkable similarity. While the IBM Deskstar 60GXP manages to pull ahead, every other current-generation 7200 RPM drive ends up in a tight 128-129 corridor.
An Undying Legacy: WinBench 99 v2.0
Now three years old, WinBench 99 was at one time our mainstay benchmark. With Testbed2, it (perhaps unjustly) fell out of favor as IOMeter took its place. A year later, however, we realized that WB99 still carried its relative merit vs. IOMeter and weighted it more equally against Intel’s benchmark. Now, with Testbed3, we believe that IPEAK’s WinTrace32 + RankDisk offer an unequivocally superior way to measure performance iver ZD’s venerable benchmark. With AnalyzeDisk delivering some low-level results and the SR DriveMarks, through RankDisk and IOMeter, definitely covering desktop and server performance, is there any room left for WB99?
The answer is a qualified yes. First and most importantly, despite all its strengths, AnalyzeDisk just does not deliver a tool to measure sequential transfer rates as reliably and simply as WinBench 99’s Disk Inspection suite. Thus, for both a graphical representation as well as quantitative outer- and inner-zone transfer rate measurements, Testbed3 will continue with WB99.
More importantly, for better or for worse, WinBench 99’s Disk WinMarks continue to enjoy an unshakable hold within the hard drive industry. Even those manufacturers who stumble and do poorly in the WinMarks have little bad to say about the tests. Though we firmly believe that the RankDisk-based SR DriveMarks are a much more accurate way to measure contemporary desktop drive performance, we’re going to acquiesce to WB99’s industry-standard status and continue to offer WinMark figures on Testbed3. WinMark results, however, will carry no weight in our final judgment of drives.
Due to its precision and stability, the Disk Transfer test will remain at three iterations, just as it was with Testbed2. To offset the variability often encountered in the Disk WinMarks, however, we’ve started to use ZD’s repetition manager to run a total of nine trials. The first two are discarded as “training runs” while the remaining seven are averaged together to yield a single score.
Since they’ve remained for all intents and purposes identical on TB3, we have no comments on transfer rate results. Instead, let’s take a brief look at the Business Disk WinMark 99.
Note: Scores on top are better.
Strangely, with Testbed3’s improved hardware, most SCSI drives exhibit large gains in the Business Disk WinMark while ATA drives either show no gain or even regress a bit. Ironically, the result is a hierarchy that would likely make the benchmark more credible in the eyes of those who believe the SCSI interface is invincible. Western Digital’s WD1000BB-SE, a drive that trailed only the X15-36LP (by a miniscule margin at that!), gets displaced by just about every 15k RPM and 10k RPM drive in our sample. Also notable is IBM’s Deskstar 60GXP. This ATA drive was falling behind in WinMarks run on Testbed2. Curiously, Testbed3 allows the 60GXP to excel beyond the norm and grab a place back at the top of the ATA heap.
Note: Scores on top are better.
The High-End DiskMark 99’s order remains somewhat more Testbed2-like with the WD1000BB-SE tenaciously retaining its spot among top SCSI drives. Of note again is the 60GXP, taking advantage of TB3’s contemporary hardware to climb towards the top in ATA performance.
Again, RankDisk does everything that the WinMarks do… just better. As an example, RankDisk is impervious to the playback system’s hardware, something WB99 is not as evidenced by the results above. WinMarks are presented for those who are curious and for diehards only.
Performance, whether it relates to a single-user desktop machine/workstation or multi-user servers, is but one part of the equation for many prospective drive purchasers. For home users especially, factors such as heat and noise are also critical factors. Throughout testbeds 1 and 2, we’ve been content to limit our observations on heat and sound to subjective means only. Why? The fact is that a host of problems arise when one tries to measure heat and noise in an objective manner.
What problems exist in measuring noise? First is the dichotomy between sound power and sound pressure. The relationship between these two measurements may be a bit likened to the difference between mass and weight. The former is an absolute measure of matter while the latter is a relative measure of a resulting force based on other external factors. Likewise, sound power is an absolute figure of energy output while sound pressure is sound power funneled through a host of mitigating factors. Sound pressure is what we hear, yet it is sound power that is more easily generalized into relatable specifications.
These days, most manufacturers cite sound power in a comprehensive list of drive specifications. Measuring sound power, especially for a small outfit such as StorageReview.com, is fraught with difficulties in expense and practicality. Anechoic chambers, spherical arrays of microphones, and sophisticated sound analysis equipment put this endeavor beyond our reach.
This leaves us with much more modest yet much more feasible sound pressure measurements. Here too, however, problems exist. What’s most desirable? Measuring a drive outside of a case and thus outside of the varying interactions that different cases would introduce? Or measuring a drive mounted in a chassis to obtain a more “real world” measurement? Should ancillary noises such as CPU and power supply fans be present to dilute the absolute difference between the sound pressures of different drives?
Interaction between the drive and the environment doesn’t stop with the case. Where should the case be placed in relation to the microphone? Should there be a table located in the midst of things to more properly simulate ear placement vs. drive placement? What about other ambient noises that could be expected to vary the equation? Different frequencies and types of noise bother different people. What about the tendency of otherwise identical units to have different sound profiles? The list goes on and on.
Believe us when we claim we’ve devoted many hours of thought to measuring noise. We’ve had many readers blithely write in chastising us for limiting our discussion of noise strictly to subjective means. We suppose that whether these readers don’t know about the obstacles involved or whether they simply don’t care is irrelevant. They have been demanding objective measurement for some time now. With Testbed3 we’ll acquiesce.
At this time, measuring seek noises as well as delivering high-quality recordings of drive noise aren’t feasible without introducing a significant amount of ambient noise. We’re going to stick strictly with idle noise measurements.
In this recent StorageReview.com reader poll, a near 2/3rds majority indicated that idle noise is the most bothersome emission from a hard drive. We’re going to err on the side of simplicity when it comes to making the hard decisions associated with objective noise measurement. Sound assessment will consist of the following:
- Taking the bare drive into a quiet spare room devoid of all but the quietest ambient noise.
- Connecting the drive to an AT power supply that has its fan disabled.
- Placing the drive upside down (i.e. top flat plate rather than electronics on the berber carpeting) on the floor.
- Placing our type II sound pressure level meter, an Extech 407750, 18 millimeters away from the long side of the drive to the left of the backside that contains the power and data connectors. This near-field placement renders insignificant any remaining external interactions.
- Powering up the drive and allowing it to initialize for 30 seconds.
- Publishing the highest recorded A-weighted measurement reading utilizing slow response time within the following 30 seconds.
Current-generation drives turn in the following results:
Note: Scores on top are better.
When listening to a drive being measured in our quiet room, outside of a case with little ambient noise, our subjective impressions closely match the objective readings presented above. When listening to a drive mounted into our testbed chassis, however, discrepancies arise between subjective feelings and these objective measurements. Operating away in a case, for example, Seagate’s Barracuda ATA IV seems quieter than their U6. Similarly, the Maxtor Atlas 10k III simply doesn’t sound like one of the loudest drives we’ve measured. Remember the dangers of measuring sound pressure and its applicability (or lack thereof) towards more realistic scenarios.
Seagate’s current-generation ATA drives, the U6 and the ‘Cuda ATA IV, respectively weigh in as the quietest 5400 RPM and 7200 RPM drives in our objective tests. Maxtor’s 100 GB DiamondMax D536X also delivers an enviable low noise floor. On the SCSI side of things, Seagate’s Cheetah 36ES exhibits a surprising showing, quieter than most 7200 RPM ATA drives. On the other end of the spectrum, the Ultrastar 36Z15 whirls away as the loudest drive we measured. The general trend, unsurprisingly, is quiet ATA drives and loud SCSI drives.
Note that the introduction of objective noise measurement does not signal the end of our subjective impressions. At the very least, seek noise be addressed subjectively. We’ll continue to provide such comments as we have in the past… these comments will be penned before objective measurements are taken to ensure that bias is not introduced.
What problems arise when attempting objective heat measurements? Thankfully, only a handful when contrasted with objective sound assessment. Our primary concern is maintaining a sufficiently constant ambient temperature to provide a level playing field for heat measurement. And, as it turns out, the air conditioner and heater that services Testbed3’s room has proven up to the task.
Another question: where should measurements be drawn? On the top? From the sides? Which measurement should we report? What about the labels that sometimes appear with various drives that restrict metal-to-metal contact?
Though we briefly entertained the idea of thermal imaging, in the interests of precision we’ve settled on utilizing a Fluke 51 Series II Thermometer with a Fluke 80PK-3A type K thermocouple surface probe to take multiple measurements off of the top of tested hard drives. To permit easy access, tested drives are mounted in a 3.5 – 5.25″ bracket (in this case, simply our CalPC drive cooler with the fans unplugged) and now installed in the top-most drive slot of our testbed case. The drive then runs through our 81 minute and 20 second IOMeter DriveMark suite with the case’s top and side panels removed. Why? The top panel must eventually come off to permit temperature measurement. Doing so releases accumulated hot air within the case and makes timing a critical component of the measurement- not good when we’re fishing around the top of the drive in search of the highest recordable temperature. And without the top panel in place, the testbed’s side panels rattle away due to the kinetic energy of the drive but also the power supply.
After the drive has churned away in IOMeter for 80 minutes (and while continuing to do so), we use our surface probe to take key temperature measurements from a variety of positions off of the drive’s top panel. The reported measurement represents the highest temperature. We’ve been able to keep ambient temperature in a pretty tight range- within a one degree Celsius range. As a result, we can subtract recorded ambient temperature from recorded drive temperature to obtain a “net drive temperature” of sorts.
Note: Scores on top are better.
Samsung’s 2-platter drives, both the 5400 RPM V30 and the 7200 RPM P20, weigh in as the coolest of today’s disks. Not unexpectedly, the other end of the spectrum is held down by hot-running 15k RPM drives… with IBM’s drive in particular generating enormous amounts of heat. Here too a strong dichotomy emerges between ATA and SCSI drives. In fact, the hottest-running ATA drive, the Deskstar 60GXP, runs cooler than the coolest-running SCSI drive, Seagate’s Barracuda 36ES.
Unlike subjective noise comments, we’re going to phase out subjective observations on heat. We hold enough confidence in these results to let them stand alone.
Note, by the way, that our Server DriveMark results (IOMeter derived) aren’t drawn from the same IOMeter runs from which heat measurements are taken. Server DriveMark numbers are taken from a drive that completes the IOMeter suite while being actively cooled by the drive cooler with the case’s top and side panels in place.
StorageReview.com’s Reliabilty Database
Over the course of the past three years various readers have pointed out that the fastest drive in the world doesn’t do folks much good if they can’t depend on it to protect their data. While we concur in principal, the fact remains that it’s impossible to judge the overall reliability of drive given a single sample… or even one-hundred of them. As a result, we’ve had to remain on the sidelines when it came to evaluating drive reliability.
Recently, however, we’ve embarked on an ambitious project to survey readers around the globe on all the various drives that SR has reviewed. The result is the StorageReview.com Reliability Database, a project that aims to combine the experiences of thousands of readers into meaningful metrics.
We’ll admit right up front that certain inherent flaws exist in a voluntary survey when contrasted with one drawn from a strict, random representative sample. This method nonetheless represents the only chance that enthusiasts and IT professionals have at reliability ratings, something that for years has been a closely-guarded industry secret. Further, our use of a comprehensive, multi-layer filter that qualifies or disqualifies entries based on several telling variables and our robust analysis engine takes the survey’s voluntary nature into account.
Reliability ratings for a review drive’s predecessor will be included in its write-up. As results accumulate, a score for the review drive itself will also be placed within the article. These ratings will provide readers with a rough projection on the reliability of reviewed drives. Greater details, individual model resolution, and participant comments may be accessed through the database itself.
To ensure that results may be traced, filtered, and analyzed, registration is required to participate in the ongoing survey and access results directly from the database. Registration requires only a valid e-mail address and about 30 seconds of your time.
For more information on SR’s reliability database and to review the results collected thus far, click here.
Some Myths Debunked
1. ATA features CPU utilization every bit as low as SCSI’s. This is something that we’ve assumed for years with the advent of DMA ATA extensions. However, AnalyzeDisk consistently reports that ATA CPU utilization is approximately double that of SCSI’s. This in fact mirrors a trend that Testbed2 revealed with WB99’s CPU utilization. Inconsistencies in that test, however, left us unclear on the credibility of its results. Note that what Testbed3 is technically telling us is that Intel’s ATA-100 controller + driver features double the CPU utilization of Adaptec’s 29160 Ultra160 host adapter + driver. Since both arguably represent the most common iteration of their respective standard, however, these findings may with some merit be generalized to the interfaces as a whole.
2. SCSI drives are always better than ATA drives for most applications when it comes to performance. ATA drives warrant consideration only due to price. This will be a bitter pill for many readers to swallow. We’re not immune either; it’s this assumption that’s largely responsible for our attempt to deploy IOMeter to represent desktop and workstation performance. After all, how could a class of drives that always featured significantly lower access times and in many cases higher transfer rates not consistently outscore drives of “lesser caliber?” Again, the answer lies in the nuances of locality and caching strategy.
When designing the algorithms to accompany a given drive’s buffer, engineers must make some tradeoffs, due both to scarcity of memory as well as design man-hours. These engineers must first and foremost keep the target market of the drive in mind. Is the drive destined to end up mainly in a desktop/workstation scenario or in a multi-user server? The primary market for ATA drives is the former while for SCSI drives it’s the latter. Optimizing for one side often means that performance on the other will suffer. Oh sure, there are exceptions- the Atlas 10k III, for example, which delivers top-end 10k RPM server performance combined with desktop performance that outraces even the most ambitious ATA drives. More often than not, however, SCSI drives simply do not exhibit the proportional gains in desktop performance that one would expect from their superior spindle speeds, access times, and transfer rates because the manufacturer has invested its resources, time, and effort into crafting the drive into a server-destined design. Overcoming this mental hurdle is critical to the objective understanding and assessment of hard disks. We’re doing our best to get by it; we hope our readers will too.
3. All drives within a given family perform the same, regardless of size. We’re not going to accept complete credit for this one. When taken in proper context, this statement stands. The problem with this assertion in a literal sense is that data is addressed absolutely, not relatively. For example, a 10 GB partition occupies only one-tenth of a 100 GB drive but a full half of a 20 gig unit. When performing a full stroke across the partition, the 100 GB disk in reality performs a relatively short seek while the poor 20 GB drive has to pull out a half-stroke. Because of this, a 160 GB Maxtor DiamondMax D540X can outperform a 40 GB version of the same drive.
Now, that said, the larger drive’s advantage is mitigated by a few factors. First, locality’s influence will restrict the majority of either drive’s movements to a relatively tiny area. Second, keeping in mind this same locality in conjunction with buffer strategies, the impact of seek time on overall drive performance is not nearly as significant as we’ve stated in the past. Finally, though it may not be published in specs, a smaller drive’s lighter actuator may be able to travel physical (ie, centimeters or inches… as opposed to tracks and cylinders) distances slightly faster.
In fact, the latter phenomenon sometimes results in manufacturers specifying a lower average seek time for a smaller capacity drive within a given family. Seagate’s Barracuda ATA IV is the latest example. Two-platter versions of the drive feature a 9.5 millisecond seek time while the one-disk units receive a 9.0 ms claim. Remember, though, in an absolute sense that while the 40 gig drive may be able to move its head a distance of one centimeter slightly faster than the 80 gig drive can, the 80 gig drive will have to move such distances less often. In many cases, it all balances out.
At this time it is not feasible for SR to test every member of every family… we’re not sure how many manufacturers would go for consistently shipping us every capacity point within every family. Further, we’re having a hard enough time keeping up as is- something that’ll be exacerbated by Testbed3’s more comprehensive (and thus more lengthy) tests. We’re going to stick to our policy of formally testing only the flagship (largest) drive within each family. Remember, however, that if an 80 GB drive of a given family outperforms an 80 GB competitor, the 40 gig version of the same drive will in all likelihood outperform the 40 GB version of the competition. So, within this context, drives within the same family truly do perform the same.
Best of Breed Drives
What would such a long discourse be without some concise observations and conclusions? According to Testbed3, in the SCSI realm (and the entire realm of hard drives, for that matter), Seagate’s Cheetah X15-36LP reigns supreme as the fastest, most balanced hard drive around. This fortitude comes at the expense of heat, cash, and relatively small capacities. When it comes to the absolute best in server performance, Fujitsu’s series of enterprise-class drives remain second to none. The MAN3735 delivers the best server-oriented performance of any 10k RPM drive while the company’s 15k RPM MAM3367 bests all comers. Let’s also not forget the Maxtor Atlas 10k III, a “mini X15-36LP” that provides great all-around performance, desktop or server, not to mention a 73 GB capacity.
Even under Testbed3, Western Digital retains the crown in the ATA world. The Caviar WD1000BB-SE continues to outrace all other comers in desktop/workstation performance and delivers server performance that’s rivals the best. The standard WD1000BB also continues to combine leading capacity with great desktop performance. Though updated methodologies cause Maxtor’s latest 7200 RPM ATA drive to stumble in the deskstop arena, the DiamondMax D740X nonetheless continues to deliver the best server performance one can get from an ATA drive. Finally, Seagate’s Barracuda ATA IV and U6 deliver the lowest noise floors around.
The Future of Testbed3
The hardware, benchmarks, and methodologies of Testbed3 open up a revolutionary new world in the examination and evaluation of hard disk performance. In particular, RankDisk and the associated SR DriveMarks clear the way for unrivalled accuracy and relevance when it comes to hard drive benchmarks.
We’re well aware that the preeminence of the SR Desktop DriveMarks in our final assessment of drives will be controversial to many. So, once again, we’d like to point out that it is we, not readers, who had the most “face” to lose in reversing positions on several key hard drive misconceptions. We also realize that we stand to gain and pass on a tremendous deal: truth, accuracy, relevance, and new understanding on how drives work and on what makes them fast. And again, we hope that readers will join us in swallowing bias, misconception, and outdated ideas in favor of a relentless pursuit of data, facts, and knowledge. Thanks for being with us for the past 3.5 years… here’s to some more, better than ever!