FAT Partition Efficiency: Slack
One issue related to the FAT file system that has gained a lot more attention recently is the concept of slack. As larger and larger hard disks are being shipped on systems--especially low-end systems that are not being divided into multiple partitions--users have begun noticing that large quantities of their hard disk seem to "disappear". In many cases this can amount to hundreds of megabytes.
Of course the space doesn't really disappear (actually, in some cases lost clusters can make space really unusable on a disk unless you use a scanning utility to recover it). The space is simply wasted as a result of the cluster system that FAT uses. A cluster is the minimum amount of space that can be assigned to any file. No file can use part of a cluster under the FAT file system.
This means, essentially, that the amount of space a file uses on the disk is "rounded up" to an integer multiple of the cluster size. If you create a file containing exactly one byte, it will still use an entire cluster's worth of space. Then, you can expand that file in size until it reaches the maximum size of a cluster, and it will take up no additional space. As soon as you make the file larger than what a single cluster can hold, a second cluster will be allocated, and the file's disk usage will double, even though the file only increased in size by one byte. Think of this in terms of collecting rain water in quart-sized glass bottles. Even if you collect just one ounce of water, you have to use a whole bottle. Once the bottle is in use, however, you can fill it with 31 more ounces, until it is full. Then you'll need another whole bottle to hold the 33rd ounce.
Since files are always allocated whole clusters, this means that on average, the larger the cluster size of the volume, the more space that will be wasted. (It's more efficient to use smaller, cup-sized bottles instead of quart-sized ones, if minimizing the amount of storage space is a concern). If we take a disk that has a truly random distribution of file sizes, then on average each file wastes half a cluster. (They use any number of whole clusters and then a random amount of the last cluster, so on average half a cluster is wasted). This means that if you double the cluster size of the disk, you double the amount of storage that is wasted. Storage space that is wasted in this manner, due to space left at the end of the last cluster allocated to the file, is commonly called slack.
The situation is in reality usually worse than this theoretical average. The files on most hard disks don't follow a random size pattern, in fact most files tend to be small in size. (Take a look in your web browser's cache directory sometime.) A hard disk that uses more small files will in fact result in far more space being wasted. There are utilities that you can use to analyze the amount of wasted space on your disk volumes, such as the fantastic Partition Magic. It is not uncommon for very large disks that are in single FAT partitions to waste up to 40% of their space due to slack, although 25-30% is more common.
Let's take an example to illustrate the situation. Let's consider a hard disk volume that is using 32 KB clusters. There are 17,000 files in the partition. If we assume that each file has half a cluster of slack, then this means that we are wasting 16 KB of space per file. Multiply that by 17,000 files, and we get a total of 265 MB of slack space. If we assume that most of the files are smaller, and so therefore on average each file has slack space of around two-thirds of a cluster instead of one-half, this jumps to 354 MB!
If we were able to use a smaller cluster size for this disk, the amount of space wasted would reduce dramatically. The table below shows a comparison of the slack for various cluster sizes for this example. The more files on the disk, the worse the slack gets. To consider the percentage of disk space wasted in this example, divide the slack figure by the size of the disk. So if this were a (full) 1.2 GB disk using 32 KB clusters, a full 30% of that space is slack. If the disk is 2.1 GB in size, the slack percentage is 17%:
Cluster Size |
Sample Slack Space, 50% Cluster Slack Per File |
Sample Slack Space, 67% Cluster Slack Per File |
2 KB |
17 MB |
22 MB |
4 KB |
33 MB |
44 MB |
8 KB |
66 MB |
89 MB |
16 KB |
133 MB |
177 MB |
32 KB |
265 MB |
354 MB |
As you can see, the larger the cluster size used, the more of the disk's space is wasted due to slack. Therefore, it is better to use smaller cluster sizes whenever possible. This is, unfortunately, often easier said than done. The number of clusters we can use is limited by the nature of the FAT file system, and there are also performance tradeoffs in using smaller cluster sizes. Therefore, it isn't always possible to use the absolute smallest cluster size in order to maximize free space.
Also realize that there will always be some space wasted regardless of the cluster size chosen. Most people consider the amount of slack obtained when using 4 KB or 8 KB partitions to be acceptable; most consider the slack of 32 KB cluster size partitions excessive; and the 16 KB partitions seem to go both ways. I personally only avoid the 32 KB partitions like the plague, but I (more than many others) also dislike having my disk broken into many pieces. See this discussion of the tradeoffs between slack space waste and "end of volume" space waste as well for more perspective on choosing cluster sizes.
Tip: Do remember not to go
overboard in your efforts to avoid slack. To keep it all in perspective, let's take the
worst case above, where 354 MB of space is wasted. With the cost per megabyte of disk now
below 10 cents, this means that the "cost" of this problem is well below $50.
That doesn't mean that wasting hundreds of megabytes of storage is smart; obviously I
don't think that or I wouldn't have written so much about slack and partitioning.
But
on the other hand, spending 20 hours and $100 on utility software may not be too smart
either. Moderation is often the key to using partitioning to reduce slack, so don't be
taken in by some of the "partitioning fanatics" who seem to have lost sight of
the fact that disk space is really very cheap today.