Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The purpose of this article is to provide sizing guidance regarding how much data a Veeam Backup and Replication as well as Veeam M365 Server can store in a Swarm domain.

With Swarm 16.1.4 2 and Gateway 8.1.0 we introduced a new feature called “Index per Domain”, this means within our Elasticsearch database we will now store metadata from a each domain in separate indices. It is enabled by setting search.perDomainIndex=True before creating a Search Feed.

Code Block
scsctl storage config set -d "search.perDomainIndex=true"
scsctl storage config set -d "search.numberOfShards=5"

Data in Elasticsearch is organized into indices. Each index is made up of one or more shards. Each shard is an instance of a Lucene index, which you can think of as a self-contained search engine that indexes and handles queries for a subset of the data in an Elasticsearch cluster.

There are some sizing consideration an administrator needs to review first.

...

Total number of shards planned at start will be 10 (5 primary + 5 replicas)

This is set as follows:

Code Block
scsctl storage config set -d "search.numberOfShards=5"

So one index(per domain) can hold about = 10 shards * 46.5 Millions docs per shard = 465 Million documents(including the replicas)

...

If you are using the default storage optimization aka block size of 1MB , this means that in the best case scenario where Veeam writes data into perfect 1MB sized objects you will be able to write 200 TB to the domain. This includes the storage footprint of immutability ( depends on your chosen backup retention settings ).

...

Panel
bgColor#E3FCEF

Option 3: You split the index , which amounts to increasing the shard number for that domain.

Note: this option requires scheduled maintenance window as the index needs to be in read-only mode during this split process.

Object backup repository can only be put into maintenance mode if they are a member of a SOBR ( scale-out backup repository ), it does not exist when using Direct to S3 backup mode.

If you do not use SOBR, you will need to suspend all backup jobs and disable background retention before proceeding with a split procedure.

Veeam v12.1 introduced a new feature to temporarily disable background retention, you will need to activate this during the split operation. ( See https://helpcenter.veeam.com/docs/backup/vsphere/backup_background_retention_disable.html?ver=120 )

For more details see: TODO final link [DRAFT] Steps to split or reduce the number of shards in perdomain index /wiki/spaces/KBI/pages/3696558130

Note

Keep in mind that splitting an existing index will increase the shards it uses and account towards the maximum 600 shards per node limit.

Conclusion

In the example above we had 5 data nodes, with 600 shards per node this means Swarm can support up to 250 domains ( some indexes indices are needed for csmetrics, kibana, etc.. )

If you require more domains that can be supported by your existing Elasticsearch cluster size, then you will need to grow the size of your Elasticsearch cluster, specifically nodes with “data” role.

If you are projecting to exceed 200TB per domain , then you need to eighter either increase the storage optimization aka block size to our recommended size of 4MB to 8MB , or think about how to distribute this workload over multiple domains for optimal performance.