It is common for the data footprint of a cluster to grow over time and eventually exceed its original cluster sizing. Consider the following things to make this a smooth process:
Cluster administrators need to monitor the space usage over time and should not delay in adding capacity. COVID and other economic disruptions have delayed the supply chain of hardware products even when someone is ready to cut the purchase order. If the cluster is growing and over 80% full, start adding capacity.
Check the License
Licensed capacity and hardware capacity are different, therefore, update the license as part of the capacity added.
Trapped space does NOT count against the license so it is good to have more hardware capacity than you are licensed for.
Most customers start with their nodes fully populated. If customers can afford extra server capacity earlier, leave the disk slots empty. When more space is required, hot-plug the new disk which is added to a node without downtime. Any failed or retired drive is swapped for empty disks over time. This operation does not require a reboot.
Adding multiple servers at a time is a common approach to add capacity. Adding nodes to a cluster is easy, hence, those steps are not listed here. Let’s discuss about what happens when a new empty node is added to a cluster.
Client writes go to all nodes/volumes that have space, so the “Start Early” advice is about avoiding the situation where many volumes are full and the remaining ones with space get overloaded. New writes must go to different nodes/volumes in the cluster to protect objects from hardware failures.
In the background, Swarm re-balances the cluster moving objects from full volumes to less full ones. A couple of settings to control this behavior. Relocation is a necessary load to the cluster that might impact client writes. Ideally, the cluster has free space on all volumes to write objects during the inevitable recoveries that happen when disks go bad. This also applies to new writes. Multiple disks with space provide better performance and load balancing. If cluster performance is important to the business, add space that enables a longer re-balancing window with less performance impact.
Another technique is to add new nodes if all volumes in the cluster are getting full. The idea is to “sprinkle” empty volumes throughout all the nodes in the cluster so that when the operation is complete, “old” and “new” nodes have a mixture of “old” and “new” disks, meaning more full disks and also disks with plenty of available capacity. This technique involves hands-on experience with the cluster during the upgrade, and requires the following steps:
Prepare a plan of how many new disks each node will have in the entire upgraded cluster. Each node should have one or two new disks.
Add the new node(s) to the cluster but only populated with the empty disks each node will have.
Iterate over all the remaining cluster nodes by taking them down using normal shutdown operations. This allows the cluster to clear any client requests from the node being rebooted. When the node is down, pull however many disks that need to be replaced and put in empty disks. Bring the node back up. Meanwhile, the hot plug adds one or two disks to the same new node. These steps can be done in parallel. After each step, a handful of empty disks are visible while shifting the location of existing disks.
Repeat for all the remaining nodes.
At the end of this process, re-balancing work is required, which is evenly spread throughout the cluster. A minimal replica movement is also needed to protect the existing data.