Cluster Protection Planning
With Swarm's density-friendly architecture (introduced in Swarm 10), Swarm's cluster structure and protections have changed:
Node = Chassis: Swarm is no longer multi-process, because the new architecture assigns one and only one IP address to each chassis, simplifying management.
Fewer Nodes: With one Swarm node per chassis (physical or virtual machine), clusters are now smaller, in terms of the number of Swarm processes.
No Auto-Subclusters: Automatic subclustering by chassis is no longer needed so keep using explicit (named) subclusters for optimizing protection across specific locations or networks.
Multiple Segments Per Level: By default, Swarm allows segments to double up per level if needed, deprecating the old setting
ec.subclusterLossTolerance
.Settings Checker: To ease migration and upgrades, Swarm has a Storage Settings Checker to run before installation, to identify settings issues to resolve with Support.
Cluster-in-a-Box: Swarm supports a cluster-in-a-box configuration by requiring there to be at least four nodes in VMs or containers, each with its own IP address and memory index to keep track of replicas.
Requirements and Guidelines
Observe the following data protection requirements and guidelines when designing the Swarm cluster:
Small Clusters: Verify the following settings if running 10 or fewer Swarm nodes (min three required in production).
Policy.replicas: The
min
anddefault
values for numbers of replicas to keep in the cluster must not exceed the number of nodes. For example, a 4-node cluster may have onlymin=3
ormin=4
.EC Encoding: For EC encoding, verify you have enough nodes to support the cluster's encoding (
policy.ecEncoding
). For EC k:p encoded writes to succeed with fewer than (k+p)/p nodes, use the lower level,ec.protectionLevel=volume
.
Best Practice
Keep at least one physical machine in the cluster beyond the minimum number needed. This allows for one machine to be down for maintenance without compromising the constraint.
Important
If you need to change any, do before upgrading to Swarm 10.
Cluster in a Box: Swarm supports a "cluster in a box" configuration as long as that box is running a virtual machine host and Swarm instances are running in three or more VMs. Each VM boots separately and has its own IP address. Follow the recommendations for small clusters, substituting VMs for nodes. For two physical machines, use the "cluster in a box" configuration, but with three or more, move to direct booting of Swarm.
Subclusters: All nodes remain in the single, default subcluster unless manually grouped into named subclusters by setting
node.subcluster
across the nodes. Perform this to allow Swarm to distribute content according to groupings of machines with a shared failure mode, such as being in the same building in a widely distributed cluster.
Caution
Setting ec.protectionLevel=subcluster
without creating subclusters cause a critical error and lower the protection level to 'node'.
Replication: For data protection reasons, Swarm does not store multiple replicas of an object on the same node. If using fewer physical machines than are required for the replication scheme, use a virtualization/containerization technology to run multiple Swarm nodes on the same hardware appliance. Do not specify too many replicas: setting the number of replicas equal to the number of storage nodes can lead to uneven loading when responding to volume recoveries.
Erasure-coding: Best practice is to use
ec.protectionLevel=node
, which distributes segments across the cluster's physical/virtual machines. Do not useec.protectionLevel=subcluster
unless subclusters are defined and are sure enough nodes (machines) exist to support the specified EC encoding. The lowest level,ec.protectionLevel=volume
, allows EC writes to succeed with a small cluster with fewer than (k+p)/p nodes. See the next section for details.
Choosing EC Encoding and Sizing
The EC encoding defines the way Swarm divides and stores large objects:
k:p: Defines the encoding, where
k (data segments) drives the footprint: An EC object's data footprint in the cluster approximates this value:
size * (k+p)/k
p (parity segments) is protection: Choose the protection level needed, two or higher; p=2 and p=3 are most common.
k+p (total segments) is the count of segments: The original object can be reconstructed if any p segments are lost.
Manifests: Segments are tracked in a manifest, which is itself protected with p+1 replicas, distributed across the cluster.
Sets of Sets: Very large EC objects (or incrementally written objects) are broken up into multiple EC sets because any segment that's over the size limit triggers another level of EC. Each set has its own k:p encoding, and the overall request combines them all in sequence.
See Elastic Content Protection
How Many Nodes are Needed?
The number of nodes required in the cluster depends on both the encoding scheme and the protection profile being targeted:
EC Profile | Formula | Example: 5:2 | Notes |
---|---|---|---|
Manifest minimum | p+1 | 2 + 1 = 3 | Basic requirement for storing manifests. |
Segment minimum | ceil((k+p)/p) | ceil((5 + 2) / 2) = 4 | Objects can be read (but not written) if one node is lost or offline. Per 5:2, four nodes allow 2+2+2+1 segment distribution because Swarm allows two segments per node. |
Recommended protection | ceil((k+p)/p +p) | ceil((5 + 2) / 2 + 2) = 6 | Objects can be read and written if one node is lost or offline. |
High protection | k+p | 5 + 2 = 7 | Objects can be read and written even if two entire nodes are lost or offline. |
High performance | (k+p)*2 | (5 + 2) × 2 = 14 | Recommended for best performance and load distribution (load-balancing becomes easier as clusters expand). |
How Many Volumes are Needed?
A minimum of k+p volumes are needed in a cluster, assuming ec.protectionLevel=volume
, which is not recommended. For ec.protectionLevel=node, a minimum of p volumes are needed per node. For recommended volume protection, use the formula ‘>=p+1’ per node.
Optimizing Erasure Coding
What Improves EC Performance?
Good-Enough Encoding: Do not over-protect. The more nodes are involved, the more constraints on EC write to succeed and the more overhead is created.
Keeping
k+p
small reduces the overhead of EC writes.Keeping
k
small reduces the overhead of EC reads.
Consistent Scaling: The rule of thumb is to scale erasure coding and add one additional node for each
ceil((k+p)/p)+1
node.Faster Nodes: As a rule, an EC read/write is limited by the slowest node, and there is a significant constant expense to set up connections.
More Nodes: Having more nodes in the cluster than needed for an encoding allows the cluster to better load-balance.
What Helps Balancing?
Do not Run Full: This is the most important principle, so be ready to proactively scale the cluster. Unbalancing typically happens if a cluster is allowed to fill up before provisioning additional, empty nodes.
More Nodes: Larger clusters have an easier time load balancing, and all nodes do not need to be involved in an EC write. A cluster with k+p nodes fills those nodes at the same rate, but, if a node loses a volume, one node fills faster and stops fully-distributed writes, even though there may be ample space on other nodes.
It takes Swarm a long time to rebalance a cluster that is heavy on EC objects, several times longer than if they are fully replicated, because inadequately distributed EC segments can only be moved by health processors on other nodes, and there are many constraints.
© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.