Introduction
The purpose of this article is to provide sizing guidance regarding how much data a Veeam Backup and Replication as well as Veeam M365 Server can store in a Swarm domain.
With Swarm 16.1.4 and Gateway 8.1.0 we introduced a new feature called “Index per Domain”, this means within our Elasticsearch database we will now store metadata from each domain in separate indices. It is enabled by setting search.perDomainIndex=True
before creating a Search Feed.
scsctl storage config set -d "search.perDomainIndex=true" scsctl storage config set -d "search.numberOfShards=5"
Data in Elasticsearch is organized into indices. Each index is made up of one or more shards. Each shard is an instance of a Lucene index, which you can think of as a self-contained search engine that indexes and handles queries for a subset of the data in an Elasticsearch cluster.
There are some sizing consideration an administrator needs to review first.
Elastic recommends the following regarding shards in their official documentation:
Size of any shard must be maintained between 20GB and 50GB to achieve optimal performance.
Number of shards should be a multiple of number of nodes( with data role ) in the cluster to achieve equal spreading of data across nodes.
Increased or decreased number of shards should be a multiple of number of shards present in the index earlier
20 shards per GB of heap available gives us optimal usage of heap and RAM
30.5 GB of heap (i.e. 61GB RAM) is the optimal resource to have per node(data)
A node can have a maximum of 30.5GB heap * 20 shards per of heap which gives us about 600 shards.
Note: Elastic has a default limit of max shards per node set to 1000, it is recommended to stay under 600 for optimal performance.
Example capacity planning
Let’s assume we have an Elasticsearch cluster with 5 data nodes and 5 shards per index.
Veeam Backup and Replication uses long path names for their objects:
Example: Veeam/Backup/MyCustomFolderName/Clients/{cfc1ab5f-6188-428c-9af8-7eac88d4d38d}/dfa8c857-1336-4aa1-8a91-20281bc5048f/CloudStg/Data/{45b88683-1c15-4149-ad43-b3d47331834d}/{57b53906-43b8-46d2-be9d-ff9c2bb8306e}/449485_77f4217c4a064129e8f55d708d48027d_0a2448d71034f0e9b931ed5fadb0e51c
This is roughly 275 characters on average ( depends on the custom folder name you chose when creating the backup repository ).
On average we have measured that meta-data for an object written by VBR v12 consumes about 1.1kb on disk on the Elasticsearch datastore.
1 shard can therefor hold 50G/1.1kb = 46.5 Million documents.
Total number of shards planned at start will be 10 (5 primary + 5 replicas)
This is set as follows:
scsctl storage config set -d "search.numberOfShards=5"
So one index(per domain) can hold about = 10 shards * 46.5 Millions docs per shard = 465 Million documents(including the replicas)
This means for optimal use of a domain a maximum of 232 Million objects are recommended, after which performance will start to degrade.
We recommend not exceeding 200 Million objects per domain
If you are using the default storage optimization aka block size of 1MB , this means that in the best case scenario where Veeam writes data into perfect 1MB sized objects you will be able to write 200 TB to the domain. This includes the storage footprint of immutability ( depends on your chosen backup retention settings ).
This is another reason why using larger storage optimization / block size of 4MB produces less blocks and therefor increases how much data can be written to a domain while keeping shard sizes below 50GB.
How can you monitor the size of your shards ?
Using Curl via a shell:
curl -s "http://ESIP:9200/_cat/shards/index_*?h=index,shard,d,prirep,sto,ip&v"
Example output:
index shard d prirep sto ip index_swarm.sollab.local1 6 3833537 r 3gb 172.29.10.21 index_swarm.sollab.local1 6 3833537 p 3gb 172.29.10.23 index_swarm.sollab.local1 5 3830333 r 3gb 172.29.10.22 index_swarm.sollab.local1 5 3830333 p 3gb 172.29.10.23 index_swarm.sollab.local1 2 3828835 r 3.1gb 172.29.10.21 index_swarm.sollab.local1 2 3828835 p 3.1gb 172.29.10.22 index_swarm.sollab.local1 1 3832547 r 3.1gb 172.29.10.21 index_swarm.sollab.local1 1 3832547 p 3gb 172.29.10.20 etc...
The column “sto” shows the datastore disk space usage for the shard.
Using Grafana:
Metric name: elasticsearch_indices_shards_store_size_in_bytes{index=~"index_.*"}
Our Dashboard “Swarm Search v8.1” has a panel showing this metric
You can build an alert on top of this metric to be warned when it exceeds 50GB
Example Alert Rule configuration:
alert: elasticsearch_shard_size
expr: max(elasticsearch_indices_shards_store_size_in_bytes{index=~"index_.*"})
> 50000000
for: 5m
labels:
severity: warning
annotations:
summary: "Elasticsearch Cluster Shard Size Limit Exceeded"
identifier: "{{ $labels.instance }}"
description: "Elasticsearch cluster shard size has exceeded recommended size of 50GB."
What can I do if my shard exceeds 50GB ?
You have 3 options:
Option 1: You do nothing, and accept that the performance of queries ( Listings ) and shard recovery will degrade as it grows.
Option 2: Stop writing new data to this domain until expired backups have been cleaned up. Reduce backup retention or split your backup jobs over more domains to avoid future bottlenecks.
Option 3: You split the index , which amounts to increasing the shard number for that domain.
Note: this option requires scheduled maintenance window as the index needs to be in read-only mode during this split process.
Object backup repository can only be put into maintenance mode if they are a member of a SOBR ( scale-out backup repository ), it does not exist when using Direct to S3 backup mode.
If you do not use SOBR, you will need to suspend all backup jobs and disable background retention before proceeding with a split procedure.
Veeam v12.1 introduced a new feature to temporarily disable background retention, you will need to activate this during the split operation. ( See https://helpcenter.veeam.com/docs/backup/vsphere/backup_background_retention_disable.html?ver=120 )
For more details see: TODO final link [DRAFT] Steps to split index or reduce the number of shards in perdomain index
Keep in mind that splitting an existing index will increase the shards it uses and account towards the maximum 600 shards per node limit.
Conclusion
In the example above we had 5 data nodes, with 600 shards per node this means Swarm can support up to 250 domains ( some indices are needed for csmetrics, kibana, etc.. )
If you require more domains that can be supported by your existing Elasticsearch cluster size, then you will need to grow the size of your Elasticsearch cluster, specifically nodes with “data” role.
If you are projecting to exceed 200TB per domain , then you need to eighter increase the storage optimization aka block size to our recommended size of 4MB to 8MB, or think about how to distribute this workload over multiple domains for optimal performance.