One issue related to the FAT file system that has gained a lot more attention recently
is the concept of slack. As larger and larger hard disks are being shipped on
systems--especially low-end systems that are not being divided into multiple
partitions--users have begun noticing that large quantities of their hard disk seem to
"disappear". In many cases this can amount to hundreds of megabytes.
Of course the space doesn't really disappear (actually, in some cases lost clusters can
make space really unusable on a disk unless you use a scanning utility to recover it). The space
is simply wasted as a result of the cluster system that FAT uses. A cluster is the minimum
amount of space that can be assigned to any file. No file can use part of a cluster under
the FAT file system.
This means, essentially, that the amount of space a file uses on the disk is
"rounded up" to an integer multiple of the cluster size. If you create a file
containing exactly one byte, it will still use an entire cluster's worth of space. Then,
you can expand that file in size until it reaches the maximum size of a cluster, and it
will take up no additional space. As soon as you make the file larger than what a single
cluster can hold, a second cluster will be allocated, and the file's disk usage will
double, even though the file only increased in size by one byte. Think of this in terms of
collecting rain water in quart-sized glass bottles. Even if you collect just one ounce of
water, you have to use a whole bottle. Once the bottle is in use, however, you can fill it
with 31 more ounces, until it is full. Then you'll need another whole bottle to hold the
33rd ounce.
Since files are always allocated whole clusters, this means that on average, the larger
the cluster size of the volume, the more space that will be wasted. (It's more efficient
to use smaller, cup-sized bottles instead of quart-sized ones, if minimizing the amount of
storage space is a concern). If we take a disk that has a truly random distribution of
file sizes, then on average each file wastes half a cluster. (They use any number of whole
clusters and then a random amount of the last cluster, so on average half a cluster is
wasted). This means that if you double the cluster size of the disk, you double the amount
of storage that is wasted. Storage space that is wasted in this manner, due to space left
at the end of the last cluster allocated to the file, is commonly called slack.
The situation is in reality usually worse than this theoretical average. The files on
most hard disks don't follow a random size pattern, in fact most files tend to be small in
size. (Take a look in your web browser's cache directory sometime.) A hard disk that uses
more small files will in fact result in far more space being wasted. There are utilities
that you can use to analyze the amount of wasted space on your disk volumes, such as the
fantastic Partition Magic. It is not
uncommon for very large disks that are in single FAT partitions to waste up to 40% of
their space due to slack, although 25-30% is more common.
Let's take an example to illustrate the situation. Let's consider a hard disk volume
that is using 32 KB clusters. There are 17,000 files in the partition. If we assume that
each file has half a cluster of slack, then this means that we are wasting 16 KB of space
per file. Multiply that by 17,000 files, and we get a total of 265 MB of slack space. If
we assume that most of the files are smaller, and so therefore on average each file has
slack space of around two-thirds of a cluster instead of one-half, this jumps to 354 MB!
If we were able to use a smaller cluster size for this disk, the amount of space wasted
would reduce dramatically. The table below shows a comparison of the slack for various
cluster sizes for this example. The more files on the disk, the worse the slack gets. To
consider the percentage of disk space wasted in this example, divide the slack figure by
the size of the disk. So if this were a (full) 1.2 GB disk using 32 KB clusters, a full
30% of that space is slack. If the disk is 2.1 GB in size, the slack percentage is 17%:
Cluster Size |
Sample Slack Space,
50% Cluster Slack Per File |
Sample Slack Space,
67% Cluster Slack Per File |
2 KB |
17 MB |
22 MB |
4 KB |
33 MB |
44 MB |
8 KB |
66 MB |
89 MB |
16 KB |
133 MB |
177 MB |
32 KB |
265 MB |
354 MB |
As you can see, the larger the cluster size used, the more of the disk's space is
wasted due to slack. Therefore, it is better to use smaller cluster sizes whenever
possible. This is, unfortunately, often easier said than done. The number of clusters we
can use is limited by the nature of the FAT file system, and there are also performance
tradeoffs in using smaller cluster sizes. Therefore, it isn't always possible to use the
absolute smallest cluster size in order to maximize free space.
Also realize that there will always be some space wasted regardless of the cluster size
chosen. Most people consider the amount of slack obtained when using 4 KB or 8 KB
partitions to be acceptable; most consider the slack of 32 KB cluster size partitions
excessive; and the 16 KB partitions seem to go both ways. I personally only avoid the 32
KB partitions like the plague, but I (more than many others) also dislike having my disk
broken into many pieces. See this discussion of the
tradeoffs between slack space waste and "end of volume" space waste as well
for more perspective on choosing cluster sizes.
Tip: Do remember not to go
overboard in your efforts to avoid slack. To keep it all in perspective, let's take the
worst case above, where 354 MB of space is wasted. With the cost per megabyte of disk now
below 10 cents, this means that the "cost" of this problem is well below $50.
That doesn't mean that wasting hundreds of megabytes of storage is smart; obviously I
don't think that or I wouldn't have written so much about slack and partitioning.
But
on the other hand, spending 20 hours and $100 on utility software may not be too smart
either. Moderation is often the key to using partitioning to reduce slack, so don't be
taken in by some of the "partitioning fanatics" who seem to have lost sight of
the fact that disk space is really very cheap today.
Next: Relationship of Partition Size and Cluster Size