Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

It’s common for the data footprint of a cluster to grow overtime over time and eventually exceed its original cluster sizing. There are a variety of things to consider to make this a smooth process.

Start Early

Cluster administrators should be monitoring need to monitor the space usage over time and should not delay in adding capacity. Remember that with COVID and other economic disruptions , there are often delays in getting hardware even after you are have delayed the supply chain of hardware products even when someone is ready to cut a the purchase order. If your the cluster is growing and over 80% full, it’s already time to start planning to add start adding capacity.

Check

...

the License

Licensed capacity and hardware capacity are different things. You might need to update your , therefore, update the license as part of the capacity add. Note that trapped added.

Info

Trapped space does NOT count against

...

the license so it is good to have more hardware capacity than you are licensed for.

Adding Disks

Most customers start with their nodes fully populated, but if you . If customers can afford extra server capacity earlier, leaving leave the disk slots empty is just fine. When there’s a need for more space is required, hot-plug the new disk can just be hot plug which is added to a node without down time. Of course, any downtime. Any failed or retired drive should also be is swapped for empty disks over time and these operations don’t require . This operation does not require a reboot.

Adding Nodes

Adding one or more multiple servers at a time is the option most commonly used a common approach to add capacity. Adding nodes to a cluster is relatively easy, hence, so I won’t dwell on those steps are not listed here. Instead, let’s focus on Let’s discuss about what happens when a new empty node is added to a cluster.

Client writes will go to all nodes/volumes that have space, so the “Start Early” advice is largely about avoiding the situation where many volumes are full and the remaining ones with space get overloaded. New writes must go to different nodes/volumes in the cluster so as to help protect objects from hardware failures.

In the background, Swarm will be re-balancing balances the cluster by moving objects from full volumes to less full ones. There are a A couple of settings that to control this behavior. But relocation Relocation is a necessary load to the cluster that can have some might impact on client writes. In an ideal caseIdeally, the cluster should have has free space on all volumes so that there are many places to quickly write objects during the inevitable recoveries that happen when disks go bad. This is also true for applies to new writes. Having lots of Multiple disks with space enables provide better performance and load balancing and better performance. If cluster performance is important to your the business, proactively adding space allows for add space that enables a longer re-balancing window with less performance impact.

A more involved technique can be used Another technique is to add new nodes if all volumes in the cluster are getting full by the time new nodes are added. The idea is to “sprinkle” empty volumes throughout all the nodes in the cluster so that when the operation is complete, “old” and “new” nodes have a mixture of “old” and “new” disks, meaning more full disks and also disks with plenty of available capacity. This option isn’t commonly done because it involves a lot of technique involves hands-on work to experience with the cluster during the upgrade.

...

, and requires the following steps:

  1. Prepare a plan of how many new disks each node will have in the entire

...

  1. upgraded cluster

...

  1. . Each node should have one or two new disks

...

  1. .

  2. Add the new node(s) to the cluster but only populated with the empty disks each node will have.

  3. Iterate over all the remaining cluster nodes by taking

...

  1. them down using normal shutdown operations. This allows the cluster to clear any client requests from the node being rebooted. When the node is down, pull however many disks that need to be replaced and put in empty disks. Bring the node back up. Meanwhile, the hot plug

...

  1. adds one or two disks to the same new node. These steps can be done in parallel.

...

  1. After each step,

...

  1. a handful of empty disks are visible while shifting the location of existing disks.

  2. Repeat for all the remaining nodes.

At the end of this process, there will still be re-balancing work but it will be more is required, which is evenly spread throughout the cluster and there will be . A minimal replica movement is also needed to protect the existing data.