Operational Problems
For disk-related events requiring user action (such as disk removal), Swarm helps locate the hardware by logging the SCSI locator (bus ID) and volume serial number at CRITICAL and ANNOUNCE log levels, which makes them displayed in the UI. (v9.2)
Helpful Statistics
Swarm keeps statistics on incomplete read and write requests, which can help diagnose clients behaving incorrectly.
Tip
For disk-related events requiring user action (such as disk removal), Swarm helps locate the hardware by including the SCSI locator (bus ID) and volume serial number in the log message displayed in the UI. (v9.2)
Symptom | Action |
---|---|
A volume device failed. | Allow the node to continue running in a degraded state (lowered storage) |
A node failed. | Repair the hardware and return it to service within 14 days if a node fails but the volume storage devices are functioning properly. All volumes are considered stale and cannot be used if a node is down for more than 14 days. Force a volume to be remounted by modifying the volume specification and adding the See Managing Volumes. |
In the UI, all remaining cluster nodes Viewing the legacy Admin Console from different nodes, other nodes appear offline and unreachable. | Check the Swarm network configuration setting in each node (particularly the Verify IGMP Snooping is enabled on the network switch if the network configuration appears to be correct. An IGMP querier must be enabled in the same network (broadcast domain) if enabled. In multicast networks, this is normally enabled on the router leading to the storage cluster, which is usually the default gateway for the nodes. See IGMP Snooping. |
Read-only access to the UIs even though Cannot view the Swarm UI. | Added an operator (a read-only user) to Add all administrator users to the See Defining Swarm Admins, Swarm Users, and Swarm Passwords. |
The network does not connect to a node configured with multiple NIC ports. | Verify the network cable is plugged into the correct NIC. Depending on the bus order and the order the kernel drivers are loaded, the network ports may not match the external labeling. |
A node automatically reboots. | This issue may indicate a software problem if the node is plugged into a reliable power outlet and the hardware is functioning properly. The Swarm system includes a built-in fail safe that reboots itself if something goes wrong. Contact DataCore Support for guidance. |
A node is unresponsive to network requests. | Perform the following steps until the node responds to network requests.
|
The cluster is using more data than expected. | Using Elasticsearch, enumerate the |
A node is not performing as expected. | In the 2015-11-05 16:13:22,
898 NODE INFO: system utilization stats:
pid_cpusys: 0.06,
pid_cputot: 1.67,
pid_cpuusr: 1.61,
sys_contexts_rate: 5728.00,
sys_cpubusy: 0.91,
sys_cpubusy0: 0.37,
sys_cpubusy1: 1.46,
sys_cpuio: 0.02,
sys_cpuirq: 0.01,
sys_cpusys: 0.06,
sys_cpuusr: 0.82 |
© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.