When we implemented the versioning feature, we anticipated that many operations on versioned objects would be based on previous versions. For example, a COPY operation on a previous version creates a new version. At that time, we chose to allow the reuse of EC segments between versions. This means that a COPY of an EC version creates a new version, but only a new manifest is created. The segment reuse means that for many use cases, there is a data savings in the cluster for using EC with versioning.
Everything comes at a cost. In this case, the health processor needs to do more work to determine if some manifest version still mentions a particular segment. Without versioning, the presence of a “final” delete marker is sufficient enough of a signal that there is no manifest that has the segment. In the versioning case, there is some element of doubt as to whether there may be some version of the manifest that wasn’t linked properly in the versioning chain, or that was offline at the wrong time, or there was some sort of network error and the manifest exists but couldn’t be found for some reason. In this case, we age out the segments by some number of consecutive examinations of the segment before actually deleting it. This means that segment reclamation can take 5 or more HP cycles in a versioned bucket. For many clusters, this may take a year or more.
This segment cleanup issue disproportionately impacts clusters with a high turnover of objects in one or more versioned bucket. Most Veeam buckets fit this profile.
We don’t have a silver bullet solution here, but there are a number of things to consider and monitor.
First, make sure the Veeam instance(s) are using the recommended configuration. This generally means larger object sizes and fewer objects. Consider longer backup retention policies and less frequent backups.
The main setting that controls segment reclamation is health.segManifestGCMissCount. It defaults to 5, but older versions of Swarm have a larger default that can just be overwritten with the new default. In a healthy cluster, it is acceptable to use a value of 3, but lower values are not recommended. By reducing this setting, you are reducing the number of HP cycles needed to reclaim EC segments. There are other settings related to how frequently the segment looks for the manifest and how much time must have passed since the manifest has been found, but in most clusters, these defaults do not impede segment cleanup.
HP cycle time is also a consideration. Remember that recoveries pause HP examinations, so frequent or stuck recoveries should be monitored in phone home. Also, if there is any node with an abnormally long predicted HP cycle time in phone home, it may be worth looking for a poorly performing disk. HP can be sped up by reducing health.examDelay. The problem here is that versions prior to 15.3 are susceptible to the main process getting overloaded and causing cluster instability. Ideally, the main process CPU utilization should be less than 80 throughout the cluster on multiple samples before speeding up HP.
There may be some situations where not using EC in these buckets is appropriate. The main concern is that Veeam adjusts retention policies on various objects, causing a variant COPY. In a non-versioned bucket, this will immediately create trapped space for the object being replaced… and later, when that object is removed, but both of those operations will NOT require multiple HP cycles for reclamation, so the tradeoffs are complicated. Best practice recommendations are still in progress here.