Many people don't realize that it is normal for a hard disk to encounter errors
during reading, as part of its regular operation. As hard disks are pushed to the limits
of technology, with tracks and sectors spaced closer together, weaker signals used to
prevent interference, and faster spin rates produced by the spindle motor, the chances of
an error occurring while reading the disk go up dramatically. In fact, the state of
technology has advanced to the point where it is not practical to even try to avoid them.
Of course having actual errors appear to you while using the hard
disk is unacceptable, since you count on your disk reproducing the data you store on it
reliably, for a period of years. Hard disk manufacturers know how important this is, and
so incorporate special techniques that allow them to detect and correct hard disk errors.
This allows them to make faster, higher-capacity drives that appear to the user to be
error-free. The more the technology for storing data is pushed, the more sophisticated the
error correction protocols must be to maintain the same level of reliability.
Making a drive that actually produced read errors infrequently enough that error
detection and correction wasn't necessary, would mean greatly reducing performance and
capacity. This is sort of like touch-typing: there's a school of thought that says
"if you aren't making any mistakes at all, you're going too slow". If correcting
mistakes is easy, as it is with a word-processor, it's better to type 100 words per minute
and correct an error or two, than to type 75 words per minute error-free. As long as the
errors are detectable and correctable, and they don't occur too often, it's better to plan
for them, and then tolerate and deal with them, than to be too conservative in order to
eliminate them.
Note: The errors we are
talking about here are those related to reading correct information off the disk, not
issues like head crashes or motor burn-outs or other hardware problems. Similarly, I don't
get into the discussion of general reliability and failure issues here, nor related
technologies such as SMART. See the section on hard disk quality for more on these related issues.
Next: Error Correcting Code (ECC)