Platform's CLI (command-line interface) is installed by default on the Platform server and supports common administrative tasks.
Rebooting a Cluster
There are two CLI options for rebooting a cluster: full versus rolling.
Full reboot notifies every chassis in the cluster to reboot itself at the same time, meaning that the entire cluster will temporarily be offline as the chassis reboot.
Full rebootplatform restart storagecluster --full
Rolling reboot is a long-running process that keeps the cluster operational by rebooting the cluster one chassis at a time, until the entire cluster has been rebooted. A rolling reboot includes several options, such as to limit the reboot to one or more chassis:
Rolling rebootplatform restart storagecluster --rolling [--chassis <comma-separated system IDs>] [--skipConnectionTest] [--skipUptimeTest] [--continueWithOfflineChassis] [--stopOnNodeError]
Requirements
Before a rolling reboot can begin, these conditions must be met:
- All chassis targeted for rebooting must be running and reachable. If you have offline chassis, be sure to set a flag to have them ignored:
- To skip the connection check altogether, add the flag
--skipConnectionTest
- To have the reboot process ignore currently offline chassis, add the flag
--continueWithOfflineChassis
- To skip the connection check altogether, add the flag
- All chassis must have an uptime greater than 30 minutes. To skip this requirement, add the flag
--skipUptimeTest
- All chassis targeted for rebooting must be running and reachable. If you have offline chassis, be sure to set a flag to have them ignored:
Managing Rolling Reboots
You have 10 seconds to cancel a rolling reboot before it begins. Once a rolling reboot has started, it will stop and report an error if any of these occur:
- A chassis is offline when it is selected for reboot. To have the reboot process ignore currently offline chassis, add the flag
--continueWithOfflineChassis
. - The reboot process will continue if the volumes come up but a node goes into an error state. To have the reboot process stop, add the flag
--stopOnNodeError
. - If the chassis boots with a number of volumes that doesn't match the number present before the chassis was rebooted. A volume is considered up if it has a state of: ok, retiring, retired, or unavailable
- The chassis does not come back online after 3 hours has passed.
If a rolling reboot has stopped due to an error, you can resume the reboot using the resume
command below after you have resolved the error.
Status check — To retrieve the status of a rolling reboot task, use the following commands for reboots remaining and reboots completed:
platform status rollingreboot
platform status rollingreboot --completed
Global states — When viewing the status for a rolling reboot, a rolling reboot task can have the following global states:
- in-progress: The rolling reboot is currently running.
- paused: The rolling reboot has been paused (using the
pause
command). - completed: The rolling reboot finished successfully.
- cancelled: The rolling reboot was caused per a user request.
- error: The reboot has been stopped due to an error of some kind.
Chassis states — The status listing will also show the status for each chassis that is processed by the rolling reboot task. Each chassis can have one of the following states:
- pending: The rolling reboot task has yet to process the chassis.
- in-progress: The rolling reboot task is in the process of rebooting the chassis.
- completed: The chassis was successfully rebooted.
- removed: The chassis was removed from the list of chassis to process after the rolling reboot was started (using the
delete rolling reboot
command). - error: The chassis encountered an error of some kind.
- abandoned: The chassis was currently being processed when a user cancelled the rolling reboot.
- dropped: The rolling reboot was in the process of waiting for the chassis to reboot when a user request was made to move to the next chassis (using the
--skip
flag). - offline: The chassis was already offline when the reboot task attempted to reboot the chassis.
Cancel reboot — To cancel (not pause) an active rolling reboot, issue the delete command, which the reboot process at the earliest moment and thus cannot be restarted later.
platform delete rollingreboot --cancel
Exclude from reboot — To exclude from a currently running rolling reboot one or more chassis that have not yet been rebooted:
platform delete rollingreboot --chassis <comma-separated system IDs>
Pause reboot — To pause the current rolling reboot process so that it can be restarted later:
platform pause rollingreboot
Resume reboot — To resume a paused rolling reboot:
platform resume rollingreboot
No-wait reboot — Normally, the rolling reboot process will wait up to 3 hours for a rebooted chassis to come back online before proceeds to the next. To force the process to stop waiting and move to the next chassis, use the --skip
flag:
platform resume rollingreboot --skip
Adding a Chassis
Which version of Swarm a given node uses is set at the time of provisioning.
To add a single chassis as a new Swarm node, use the following process:
Create a node.cfg file and add any node-specific Swarm settings to apply, or leave it blank to accept all current settings.
Power on the chassis for the first time.
Wait until the chassis enlists and powers off.
Deploy the new server:
platform deploy storage -n 1 -v <#.#.#-version-to-deploy>
To deploy an individual chassis by system ID, use this process:
Create a node.cfg file and add any node-specific Swarm settings to apply, or leave it blank to accept all current settings.
Get a list of chassis that are available for deployment by using the following command:
platform list nodes --state New
Choose a System ID to deploy a single chassis using a command like the following:
platform deploy storage -y 4y3h7p -v 9.2.1
Service Proxy
If the Service Proxy is running on the Platform Server when you add or remove chassis, be sure to restart the service so that it can pick up the new chassis list:
platform restart proxy
Reconfiguring the Cluster
You can modify your cluster-wide Swarm configuration at anytime using the CLI and a configuration file. The reconfiguration process is additive: all existing settings that are not referenced in your file are preserved. That is, if you define only two settings, Platform overwrites or adds only those two settings.
Create a supplemental .cfg file (such as
changes.cfg
) and specify any new or changed Swarm settings to apply.To upload your configuration changes, use the following CLI command:
platform upload config -c {Path to .cfg}
The CLI will parse the uploaded configuration file for changes to make to Platform.
If Swarm was running during the upload, Platform Server attempts to communicate the new configuration to Swarm. Any settings that cannot be communicated to Swarm will require a reboot of the Swarm cluster in order to take effect. For each setting contained in the file, the CLI will indicate if the setting was communicated to the Storage cluster and if a reboot is required. The Swarm UI also indicates which settings require rebooting.
Example: Increase Swarm processes
Swarm 10
Swarm Storage 10 has a single-process architecture, so the configuration setting chassis.processes
is no longer used and cannot be increased.
Option 1: Create a configuration file:
To set all chassis throughout the cluster to a higher number of processes, you would create a configuration file and upload it to Platform Server.
Create a text file, such as
update.cfg
, containing only the setting to be changed.chassis.processes = 6
To upload your configuration changes, use the following CLI command:
platform upload config -c {Path to update.cfg}
Note
Include the
-m <mac-address>
parameter if you want to target the update to specific chassis.
Option 2: Use the CLI directly:
Add the configuration change directly:
platform add config --name "chassis.processes" --value 6
Reconfiguring a Chassis
You can modify the node-specific settings for a single chassis by the same process, but you need to specify the MAC address of any valid NIC on that chassis.
Create a .cfg file (such as
changes.cfg
) and specify any new or changed node-specific settings to apply.To upload your configuration changes, use the following CLI command:
platform upload config -c {Path to .cfg} -m {mac address}
The CLI will parse the uploaded configuration file for changes to make to that chassis.
Releasing a Chassis
There may be times when you need to release a chassis from the Swarm cluster, either for temporary maintenance or for permanent removal.
Important
release
commands.Temporary release — Temporary release of a chassis assumes that the chassis will be added back into the cluster at a later time. Releasing a chassis lets you unallocate its cluster resources, such as IP Addresses, or wipe and reset its configuration.
Once the chassis is powered off, you can release the chassis from the Swarm cluster:
platform release storagechassis -y <system-id>
Permanent removal — Permanent removal is for retiring a chassis altogether or changing the chassis' main identifying information, such as changing a NIC. Removing the chassis from management will cause the chassis to start the provisioning life cycle as if it were a brand new chassis, if it is powered on again.
Once the chassis is powered off, you can remove the chassis from Platform Server management permanently:
platform release storagechassis -y <system-id> --remove
Resetting to Defaults
If you would like to clear out all existing setting customizations from a given chassis or the entire cluster, you can issue the following commands.
Note
platform delete allchassisconfig
platform delete allclusterconfig
Managing Subclusters
After all the chassis have been deployed and are running, you can assign chassis to subclusters.
To see the current subcluster assignments, use the list
command:
platform subcluster list
To assign a chassis to a subcluster, use the assign
command:
platform subcluster assign -y <system-id> --subcluster <subcluster-name>
Note
To remove a chassis from a subcluster, use the unassign
command:
platform subcluster unassign -y <system-id>
Important
Changing the Default Gateway
By default, the Platform Server configures Swarm Storage to use the Platform Server as its default gateway.
To override this behavior, either add a "network.gateway" to your cluster configuration file or issue the following command:
platform add config --name "network.gateway" --value "<ip-of-gateway>"
Managing Administrators
With one exception, modifying the admin users for the Storage cluster requires the Storage cluster to be up and running before the operations can be done. The one exception to this is the "snmp" user which which can have its password set while the cluster is down or before the cluster has been booted for the first time.
Important
Adding or Updating Users
Important
Modifying passwords for the admin user will require you to restart the Service Proxy, if you have it installed. It could also require updates to Gateway configuration.
To add a new admin user, use the following CLI command
platform add adminuser [--askpassword] [--username <username>] [--password <user password>] [--update]
The --askpassword
flag lets you avoid specifying a password via the command line by providing the password via stdin. When this flag is used, you'll be prompted to enter a new/updated password for the user. You can also use the Linux pipe functionality:
cat password.txt | platform add adminuser --askpassword --username admin --update
Important
--update
flag. platform delete adminuser --username <username>
Upgrading Swarm Storage
To upgrade Swarm Storage in a live cluster, you can use the CLI to upload the version and then take steps to deploy it to your running nodes, either by restarting the entire cluster or each chassis in turn.
Note
The deploy storage --upgrade
command is used for both upgrades and downgrades of Storage versions.
Upload the new version of the Swarm Storage software to Platform server, making sure that the <version-name> matches the version of Swarm Storage being uploaded:
platform upload storageimages -i <path-to-zip> -v <version-name> platform upload storageimages -i ./storage-9.6.0-x86_64.zip -v 9.6
Note
The zip file to use above is contained with the
Swarm-{version}-{date}.zip
file that was downloaded. Inside this zip, there is a folder called Storage which contains a file calledstorage-{version}-x86_64.zip
. This is the zip file to use for the command above.Get a full listing of all of the nodes and their IPs, MAC addresses, and system IDs:
platform list nodes --state Deployed
Using the list of system IDs, deploy the upgrade on each of the nodes. If you want to restart the node immediately after upgrade, run that command as well:
platform deploy storage --upgrade -v 9.2.1 -y <system-id> platform restart storagenode -y <system-id>
If you did not restart each node individually, restart the cluster now, either full or rolling:
platform restart storagecluster --full or platform restart storagecluster --rolling [<options>]
Managing Service Proxy
Status — To check the status of the Service Proxy, use this command:
platform status proxy
Upgrade — To upgrade the Service Proxy on the Platform server, use the CLI to upload the version and deploy it:
platform deploy proxy -b <path-to-zip> --upgrade
Note
After a Service Proxy upgrade, it will take several minutes for the UI to come back up.
Configuring DNS
You may need to have the Storage nodes resolve names for outside resources, such as Elasticsearch or Syslog. To do so, configure the DNS server on the Platform Server to communicate with outside domains.
Option 1: Simple forwarding
A Slave/Backup DNS zone is a read-only copy of the DNS records; it can only receive updates from the Master zone of the DNS server.
If you have no DNS master/slave relationships configured, you can do simple forwarding by having the domain managed by the Platform server forward all lookups to outside domains:
Edit
/etc/bind/named.conf.options
and add the following line after the "listen-on-v6
" lineforwarders {172.30.0.202;};
Run the following command to restart bind9 on the Platform Server:
sudo systemctl restart bind9
Option 2: Configuring a Slave DNS Zone
If you have an external DNS Zone configured, you can have the Platform Server become a slave DNS of that zone; the reverse can be done to allow other systems to resolve names for servers managed by the Platform server.
This process assumes that the external DNS server has been configured to allow zone transfers to the Platform server. The DNS server on the Platform server is not configured to restrict zone transfers to other DNS slaves.
Edit
/etc/bind/named.conf.local
and add the following line at this location:// slave other local zones include "/etc/bind/named.conf.slaves";
Create a new file called
/etc/bind/named.conf.slaves
and add your settings in this format:// local slave zones zone "example.com" in { type slave; masters {172.30.0.100; }; file "/var/cache/bind/slave/zone-example.com"; };
Run the following command to restart bind9 on the Platform Server:
sudo systemctl restart bind9
Configuring Docker Bridge
To configure or modify the network information that is used by the default Docker (docker0) bridge, edit the file /etc/docker/daemon.json
. You can add networking properties as properties to the root JSON object in the file:
{ "log-opts": { "max-size": "5m", "max-file": "10" }, "bip": "10.0.1.1/24" }
The bip
property sets the IP address and subnet mask to use for the default docker0 bridge. For details on the different properties, see the Docker documentation.