Almost all of each disk managed by Swarm is devoted to storing “streams” – either replicated objects or segments of erasure coded objects. This "data region" can be categorized as:
· capacity - the size of the entire region – usually close to the physical size of the disk
· used - the portion of the disk consumed by streams and segments
· free - the portion of the disk immediately available for new object and segment writes
· trapped - the data region space requiring processing to be returned to free space
used + free + trapped = capacity
New writes to a disk consume free space and create used space as object replicas are written sequentially on the disk to maximize write performance. When objects or segments are deleted or relocated, they leave "holes" in the data region which are not immediately usable - these holes are termed trapped space. The process of converting that trapped space back into free space is called trapped space recovery or sometimes just defrag and requires scanning the disk sequentially and repacking (moving) objects to eliminate the trapped space – not unlike defragmentation in other storage systems. This process of moving most of the objects on a drive is quite expensive, and, because it is a background task using available resources which can vary widely with load, is not always very predictable. All other things being equal, the process is much more expensive in scenarios with a lot of small objects than it is in scenarios with a smaller number of large objects – this is why increasing Veeam block size is important, for example. This defrag process can also impact performance significantly in a busy cluster, so while there is sufficient available space, Swarm will give this process a low priority. As the urgency increases due to decreasing free space, Swarm will increase priority of trapped space recovery. Swarm is tuned to balance the need for free space (to accept new writes) with the load of this essential maintenance activity but in a busy system this balance is difficult to attain, and trapped space recovery can fall behind.
Trapped space recovery takes significantly more time and system resources when disks are nearly full. In clusters with a high delete load, the best strategy is to maximize object size and to keep the disks at no more than 70-80% full. Denser or slower disks are also problematic here. Allowing disks to get too full results in situations where trapped space recovery can no longer keep up with the write load. Erasure coding, which converts a large object to a lot of smaller objects, contributes – in high turnover situations it may be preferable to use replication rather than erasure coding.
While it is possible to allocate more system resources to trapped space recovery, doing so both impacts performance and reduces resources available for data protection verification, so this isn’t desirable except in dire emergencies.
In future releases, Swarm will extend support to dual actuator drives, which may become the disk of choice for high turn-over use cases, but these are currently not supported because Swarm’s data protection algorithms don’t take into account that what appear to the OS to be two drives actually share a single physical device and thus represent a potential unit of failure.