Procedure for Shutting Down and Starting Up DataCore Swarm Cluster
- 1 Power Down Procedure
- 2 Power On Procedure
- 2.1 Step 1: Power on the SCS
- 2.2 Step 2: Power on Elasticsearch Nodes
- 2.3 Step 3: Power on Storage Nodes
- 2.4 Step 4: Monitor Swarm Cluster Rejoining
- 2.5 Step 5: Resume Recoveries
- 2.6 Step 6: Power on Gateway Nodes
- 2.7 Step 7: Validate System Health
- 2.8 Step 8: Power on Load Balancer (optional)
- 2.9 Step 9: Enable Client Access
- 3 Additional Notes
- 4 Conclusion
This document outlines the appropriate procedure to safely power down and power on a DataCore Swarm
Cluster, including the SCS, Elasticsearch cluster, Gateways, HAProxy and storage nodes.
Open a support Ticket to Swarm Support, collect fresh support bundle on SCS, all Elasticsearch nodes, and all Gateways upload to the ticket.
How to collect a support bundle
Power Down Procedure
Step 1: Disable Client Access via Load Balancers
Stop client access at the load balancers, typically using HAProxy:
sudo systemctl stop haproxy
Step 2: Stop and Power Down Gateways
Stop the cloudgateway services and power down the gateway nodes:
On each Gateway node:
sudo systemctl stop cloudgateway sudo shutdown now
Step 3: Suspend Recoveries on Storage Nodes
Suspend recoveries
On the SCS
cd /root/dist/ ./swarmctl -p admin:<adminPassword> -d <any_swarm_node_ip> -fsuspend
Wait for approximately 6 minutes for the recoveries to suspend
Step 4: Power Down All Storage Nodes
Power down all the storage nodes from the SCS:
Verify shutdown: Check that all nodes have powered down using IPMI or by pinging their network addresses.
Step 5: Shutdown Elasticsearch Cluster
Put Elasticsearch in maintenance mode:
On the SCS or any Elasticsearch node:
Stop Elasticsearch services and power down the Elasticsearch nodes:
On each Elasticsearch nodes:
Step 6: Power Down SCS Node
Power down the SCS
Power On Procedure
Bringing it all back up again is pretty much the reverse
Step 1: Power on the SCS
Begin by powering on the SCs node and wait for approximately 5 minutes. Ensure all containers on SCS is up and running
Step 2: Power on Elasticsearch Nodes
Power on all Elasticsearch nodes and wait for them to become ready
On SCS verify Elasticsearch Health:
Once the cluster is in yellow status, take Elasticsearch out of ‘maintenance’ mode and ensure the cluster become green on SCS:
Step 3: Power on Storage Nodes
Power on the storage nodes, staggering them by about 30 seconds to avoid multiple nodes requesting PXE images simultaneously.
Monitor the storage node’s status via IPMI or ping their IP addresses to confirm they are online.
Step 4: Monitor Swarm Cluster Rejoining
Monitor the rejoining of storage nodes to the cluster from SCS:
Wait for the nodes to mount and show OK status
Step 5: Resume Recoveries
Once all nodes are up and operational, resume recoveries from SCS:
Step 6: Power on Gateway Nodes
Power on all Gateway nodes, ensure Gateway services is started
Step 7: Validate System Health
Check the cluster hardware status via the Storage UI
Ensure the Elasticsearch cluster and feeds are functioning correctly.
Verify content access through the Content UI for Tenants, Domains and Buckets, testing each Gateway individually.
Step 8: Power on Load Balancer (optional)
Power on all HAProxy nodes, ensure HAProxy service is started
Step 9: Enable Client Access
Re-enable client access
Additional Notes
Ensure that all services are started and stopped in the correct order as per the procedures above to avoid data inconsistency or service failure.
After the system is back online, monitor logs and check for any errors to ensure that all nodes and services are fully operational.
On SCS, verify above Swarm able sending Health Reports to DataCore:
Conclusion
By following this procedure, you can safely power down and power on a DataCore Swarm cluster, minimizing risks and ensuring a smoothly recovery.
© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.