SNMP Commands
Storage cluster nodes are controlled through the SNMP action commands. The following OIDs enable you to disable nodes and volumes with nodes from a storage cluster:
castorShutdownAction. Disable nodes and volumes within nodes for servicing.
castorRetireAction. Disable nodes and volumes within nodes for retirement.
Shutdown Action for Nodes
To gracefully shut down a Swarm node, the string shutdown is written to the castorShutdownAction OID. Similarly, writing the string reboot to this OID causes a Swarm node to reboot.
When a node receives a shutdown or reboot action, it initiates a graceful stop by unmounting all of its volumes and removing itself from the cluster. For a shutdown, the node is powered off if the hardware supports this action. For a reboot, the node will reboot to machine, re-read the node or cluster configuration files, and startup Swarm.
A graceful shutdown is required to perform a quick reboot. Performing an ungraceful shutdown forces the node to perform consistency checks on all its volumes before it can rejoin the cluster.
Tip
Before shutting down or rebooting a node, check the node status page or the SNMP castorErrTable OID for critical error messages. Any logged critical messages will be cleared upon reboot.
Note
If you are rebooting more than one node at a time but not the whole cluster, wait at least 10 seconds in between each node reboot. This pause ensures that each node can communicate its rebooting state to the rest of the cluster, so that other nodes do not initiate recovery for the rebooting node.
Retire Action for Nodes and Volumes
The Retire action is used to permanently remove a node or a volume within a node from the cluster. This action is intended for retiring legacy hardware or pre-emptively pushing content away from a volume with a history of I/O errors. Retired volumes and nodes are visible in the Swarm Admin Console until the cluster is rebooted.
See Retiring Volumes.
Note
The Retire action may take an extended amount of time to complete and requires at least three health processor cycles.
Single volumes
When a volume is retired, all of its stored objects are moved to other nodes in the storage cluster. After you initiate a volume retirement, the volume becomes a read-only volume and no additional objects can be stored on it. After all of the objects are moved to other locations in the cluster, the volume is idled with no further read/write requests.
Each volume is given a unique name within its node – the device string from the vols line in the configuration file. To retire a volume, its name is written as a string to the castorRetireAction OID. The volume retirement process is initiated immediately upon receipt and the action cannot be aborted after it starts.
To manually retire a volume,
Open the Swarm UI (or legacy Admin Console).
Click the targeted chassis/node (IP address).
For the targeted disk/volume, select Retire.
Entire node
Retiring a node means all volumes on the node are retired at the same time. After all volumes in the node are retired and the node data is copied elsewhere in the cluster, the node is permanently out of service and will not respond to further requests.
To retire a node and all of its volumes, the all string is written to the castorRetireAction OID. The node retirement process is initiated immediately upon receipt and the action cannot be aborted after it starts.
Warning
Ensure that the cluster has enough free space and nodes to store the objects from the retiring volume. For subclusters, this applies to the subcluster where the retiring volume resides. If the number of nodes in the cluster or subcluster do not have enough space to store at least two replicas of all objects, the retiring node cannot complete the retirement process until you add additional nodes. The Retire action does not require that the configured default replicas (policy.replicas default
) is maintained to complete retirement. If there are not enough nodes to maintain the minimum number of replicas, messages will be logged that sufficient replicas cannot be created.