Since each file is stored as a linked list of clusters, the data that is contained in a
file can be located anywhere on the disk. If you have a 10 MB file stored on a disk using
4,096-byte clusters, it is using 2,560 clusters. These clusters can be on different
tracks, different platters of the disk, in fact, they can be anywhere.
However, even though a file can be spread all over the disk, this is far from the
preferred situation. The reason is performance. As discussed in the section describing performance, hard disks are
relatively slow devices, mainly because they are mechanical (they have moving parts--your
processor, chipset, memory and system bus do not). Each time the hard disk has to move the
heads to a different track, it takes time that is equivalent to thousands and thousands of
processor cycles.
Therefore, we want to minimize the degree to which each file is spread around the disk.
In the ideal case, every file would in fact be completely contiguous--each cluster it uses
would be located one after the other on the disk. This would enable the entire file to be
read, if necessary, without a lot of mechanical movement by the hard disk. There are in
fact utilities that can optimize the disk by rearranging the files so that they are
contiguous. This process is called defragmentation or defragmenting. The
utilities that do this are, unsurprisingly, called defragmenters. The most famous
one is Norton's SpeedDisk, and Microsoft now includes a DEFRAG program for DOS and a
built-in defragmenter for Windows 95 as well.
So the big question is: how does fragmentation occur anyway? Why not just arrange the
disk so that all the files are always contiguous? Well, it is in many cases a gradual
process--the file system starts out with all or most of its file contiguous, and becomes
more and more fragmented as a result of the creation and deletion of files over a
period of time.
To illustrate, let's consider a very simple example using a teeny hard disk that
contains only 12 clusters. The table below represents the usage of the 12 clusters.
Initially, the table is empty:
(cluster 1) |
(cluster 2) |
(cluster 3) |
(cluster 4) |
(cluster 5) |
(cluster 6) |
(cluster 7) |
(cluster 8) |
(cluster 9) |
(cluster 10) |
(cluster 11) |
(cluster 12) |
OK, now let's suppose that we create four files: file A takes up 1 cluster, file B
takes 4, file C takes 2, and file D takes 3. We store them in the free available space,
and they start out all contiguous, as follows:
Next, we decide that we don't need file C, so we delete it. This leaves the disk
looking like this:
Then, we create a new file E that needs 3 clusters. Well, there are no contiguous
blocks on the disk left that are 3 clusters long, so we have to split E into two
fragments, using part of the space formerly occupied by C. Here's what the
"disk" looks like now:
Next, we delete files A and E and create file F which takes up 5 clusters. The disk now
looks like this:
As you can see, file F ends up being broken into three fragments. This is a highly
simplified example of course, because real disks have thousands of files and thousands of
clusters, so the problem there is magnified. This gives you the general idea of what
happens though. What a defragmentation program does is to rearrange the disk to get the
files back into contiguous form. After running the utility, the disk would look something
like this:
Next: FAT File System Errors