Veeam Backup and Replication v11 and higher Veeam Backup for Office 365 v5 and higher
During heavy recursive S3 delete operations Swarm storage nodes can fail to delete all metadata entries from Elasticsearch.
This leads to listing requests returning ghost entries – objects or versions that were already deleted.
Any attempt to access these ghost entries returns an HTTP 404 response which leads to failed Veeam jobs.
Neither refreshing the Search Feed nor attempting to delete the object again fixes this.
Failed Veeam offload jobs show the following error message: (truncated for readability)
REST API error: S3 error: The specified key does not exist. Failed to load object.
This requires a manual cleanup procedure if you are already experiencing this issue. Please open a ticket with DataCore Swarm support ticket including the Veeam logs containing the 404 errors.
Whether you have experienced this issue or not please apply the settings changes below to avoid new occurrences of this bug.
Run the following 4 commands on your Swarm Cluster Services (SCS) server. This enables synchronous indexing in Swarm and increases the wait time so multi-delete operations are more likely to keep Elasticsearch in sync.
/root/dist/swarmctl -d SwarmStorageIP -C scsp.autoSynchronousIndex -V 1 -p <swarm_admin>:<swarm_password> -a
scsctl storage config set -d "scsp.autoSynchronousIndex=true"
/root/dist/swarmctl -d SwarmStorageIP -C scsp.defaultSynchronousIndexWait -V 60 -p <swarm_admin>:<swarm_password> -a
scsctl storage config set -d "scsp.defaultSynchronousIndexWait=60"
The swarmctl commands update settings for currently running storage nodes and the scsctl commands records the setting so they persist after storage node are rebooted.
The scsp.autoSynchronousIndex change can be reverted after upgrading to Gateway 7.10.3 which automatically enables synchronous indexing on multi-delete operations, as it already does for other read/write/delete requests. Gateway 7.10.3 was released in March 2023 with Swarm 15.2.
The scsp.defaultSynchronousIndexWait change should be kept until the default is increased in a future Swarm release.