Products Affected:
Veeam Backup and Replication v11 and higher
Veeam Backup for Office 365 v5 and higher
Description:
During heavy recursive S3 delete operations Swarm storage nodes can fail to delete all the metadata entries from elasticsearch.
This leads to listing requests returning ghost entries – objects or versions that were already deleted.
Any attempt to access those ghost entries returns 404 which leads to failed Veeam jobs.
Neither refreshing the Search Feed nor attempting to delete the object again fixes this.
Problem Symptom:
Failed Veeam offload jobs will show the following error message: (truncated for readability)
REST API error: S3 error: The specified key does not exist. Failed to load object.
Solution:
If you are already experiencing this issue please open a support ticket including the Veeam logs containing the 404 errors, as it requires a manual cleanup procedure.
Whether you have experienced this issue or not please apply the settings changes below to avoid new occurrences of this bug.
Run the following 4 commands on your Swarm Cluster Services node. This enables synchronous indexing in Swarm and increases the wait time so that multi deletes operations are more likely to keep elasticsearch in sync.
/root/dist/swarmctl -d SwarmStorageIP -C scsp.autoSynchronousIndex -V 1 -p <swarm_admin>:<swarm_password> -a
scsctl storage config set -d "scsp.autoSynchronousIndex=true"
/root/dist/swarmctl -d SwarmStorageIP -C scsp.defaultSynchronousIndexWait -V 60 -p <swarm_admin>:<swarm_password> -a
scsctl storage config set -d "scsp.defaultSynchronousIndexWait=60"
*The swarmctl
commands update running storage nodes and the scsctl
command records the setting so it is used after storage node reboots.
*The scsp.autoSynchronousIndex
change can be reverted after upgrading to Gateway 7.10.3 which automatically enables synchronous indexing on multi delete operations, as it already does for other read/write/delete requests. Gateway 7.10.3 will be released later in March 2023 with Swarm 15.2, please plan to upgrade.
*The scsp.defaultSynchronousIndexWait
change should be kept until the default is increased in a future Swarm release.