Capacity Planning
Following is a high-level view of factors to consider when researching what hardware capacity is needed for the Swarm implementation.
Storage Capacity Factors
Expected Object Count and Average Object Size
Object count and object size are the primary drivers for capacity planning
Object count drives storage cluster memory requirements: more objects require more memory for the cluster's overlay index
Average object size multiplied by object count provides the logical storage footprint (the amount of content uploaded to the cluster), but it does not account for the space taken by replicas/segments from the protection scheme
Average object size is the key factor (along with cluster size) for which protection scheme to use (replication vs. erasure coding)
See Elastic Content Protection
Choice of Protection Scheme
Which protection scheme (chosen) drives the memory requirements for the storage cluster
Erasure coding (EC) requires more memory than Replication (which uses more space)
Erasure coding impacts CPU performance requirements (because of calculating parity for erasure coding)
Required volume footprint is derived from the combination of (object count) x (average object size) x (protection scheme overhead)
Replication example: (1 million objects) x (1 megabyte/object) x (2 replicas) = 2 TB
EC example: (1 million objects) x (1 megabyte/object) x (5:2 EC scheme or 7/5) = 1.4 TB
RAM per Node | 16 GB | 32 GB | 64 GB | 128 GB |
---|---|---|---|---|
Storage Node RAM Index Slots | 268M | 536M | 1073M | 2146M |
Immutable Objects | 268M | 536M | 1073M | 2146M |
Mutable Objects | 134M | 268M | 536M | 1073M |
5:2 Erasure Coded Objects | 26M | 53M | 107M | 214M |
See Configuring Content Policies
Need for High Availability
Knowing what failure scenarios can and cannot be tolerated helps with design optimization:
A requirement for high availability (HA) drives extra capacity needed to cover more catastrophic disk and server failures
Designs typically account for either multiple volumes or multiple server failure scenarios
Availability requirements vary in complexity and feedback to protection scheme choice
Best Practice
Start expanding the cluster when the cluster capacity reaches 80%.
Memory for Overlay Index
A cluster may have other features enabled that require more resources to support them
Example: Overlay Index for large clusters (32+ nodes)
Always consider and account for the resource impact of a given feature/setting before enabling it in a cluster
Best Practice
Allow an additional 25% of cluster memory to support the Overlay Index.
Elasticsearch (Search and List)
Provides the ability to search for and list objects based on metadata
Always assume full index of object metadata (custom metadata)
Memory: 64 GB RAM per 1 billion distinct objects
Disk: 1.5 TB required for 1 billion distinct objects
Networking: 1 Gb Ethernet minimum
Minimum 3 to 4 server counts for redundancy and performance
Scale out as needed by adding more Elasticsearch servers
Gateway (Including S3)
Provides reverse proxy into storage with added protocol conversion support (S3), authentication, and authorization policy enforcement
Best treated with a “scale out” approach (think “web farm” behind a load balancer)
The underlying engine is Java (Jetty)
Tuned out of the box to account for large session counts based on field feedback
Memory/CPU/Disk requirements are light for a single Gateway server (4 GB RAM/multi-core x86-64/4 GB Disk)
Networking needs to align with the choice used for the Storage Cluster (use 10 Gb interfaces for the Gateway servers if the Storage Cluster is using 10 Gb interfaces)
SwarmFS
Provides a protocol gateway for NFS clients (NFS v4.1 to SCSP+)
The level of concurrent write requests drives the resource requirements
Best Practice
Split up the different NFS client workloads across multiple SwarmFS servers (“scale out”).
Memory/CPU/Disk requirements are higher than Gateway (recommended baseline of 16 GB RAM/multi-core x86-64/40 GB Disk)
As with Gateway, align networking choices with the Storage Cluster choice to guarantee throughput
FileFly
Provides a transparent tiering mechanism to move data from Windows or NetApp file servers into a Swarm storage cluster
Deployments can range from “single server” configurations to multi-server/high-availability architectures
Agent software has a small footprint (minimal servers require 4 GB RAM, x86-64 CPU, 2 GB Disk for logs, etc.)
Treat as a “scale out” solution to support multiple Windows/NetApp file servers (multiple migration agents, multiple fpolicy servers)
Verify the servers under FileFly source management are “close” to Swarm on the network (avoid routing)
Align network interface choice for FileFly components with those used in Storage Cluster for best throughput/latency characteristics
Capacity planning for the FileFly source servers becomes important when performing a large de-migration from Swarm
Verify this scenario is planned for when assigning storage shares from the source servers to clients
Related content
© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.