Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

title
Info
Warning

Deprecated

The Legacy Admin Console (port 90) is still available but has been replaced by the Swarm Storage UI. (v10.0)

Table of Contents
minLevel1
maxLevel2

...

outlinefalse
typelist
printablefalse

Viewing the Node Status Page

To view the status of a node, click its Click the IP address on the left side of the Swarm Admin Console . If the cluster is configured to use subclusters, expand to view the status of a node. Expand a subcluster node name to display IP addresses and then click an IP address to display the node information if the cluster is configured to use subclusters.

To find a particular node, see See Finding Nodes in the Cluster to find a particular node.

The top row of the Node Status page provides summary information about the node and its the associated volumes, such as up-time and storage usage statistics:Image Removed

...

  • Streams: Counts the total number of managed data components (such as replicas and segments), not logical objects (such as video files).

  • Trapped: Calculates the space

    that is

    pending reclamation by the Swarm defragmentation process. This process is controlled by several Swarm parameters (

    see the 

    See Settings Reference).

...

Note

The node status page automatically refreshes every 30 seconds.

Shutting Down or Restarting a Node

To shut down or restart a node,

Click Shutdown Node or Restart Node in the Swarm Admin Console to shut down or restart a node.

A node that is shutdown or rebooted by an Administrator will appear appears with a Maintenance state on other nodes in the cluster.

See Finding Nodes in the Cluster.

Identifying a Drive

You can identify Identify one or all volumes on a node using the links on the right side of the Swarm Admin Console under Restart Node.

The Identify function allows you to select selection of a particular volume and enable the corresponding LED drive light, which can be helpful in identifying a failed or failing drive. Simply select Select the targeted volume and the amount of time that the light will be is enabled.

On the Node Status page, an Identify light displays next to the targeted volume for easy identification.

See See Drive Identification Plugin for how to enable the drive light.

If a hardware-specific API is not used, Swarm will revert Swarm reverts to a default process to flash the light if a hardware-specific API is not used.Image Removed

...

Retiring a Drive

You can retire Retire one or all volumes on a node using the links on the right side of the Swarm Admin Console under Restart Node.

On occasion, you may need to replace replacing Swarm volumes is required for regular maintenance or to upgrade the cluster nodes with higher capacity drives. If Best practice is to retire volumes one at a time if multiple volumes need to be replaced across multiple Swarm nodes, the volumes should be retired one at a time. When you initiate a retire, you choose if you would like . Either choose a minimally disruptive retire that is limited to just the volume(s) being retired, or an accelerated retire that uses using all nodes in the cluster to replicate objects on the retiring volume(s) as quickly as possible when initiating a retire. Note that the cluster-wide retire may impact performance as it does put additional load on the cluster.

Clicking Retire Node retires all volumes on the node, at the same time. Clicking Retire next to a volume retires only the selected volume. A volume is also retired automatically if a configurable number of errors occur.

See Retiring Volumes.

Before you retire Verify the cluster meets the following conditions before retiring a node or volume, make sure that the cluster:

...

  • Has enough capacity for the objects on the retiring node to replicate elsewhere.

  • Has enough unique nodes to replicate the objects with

    only

    one replica on any given node.

...

Note

Retire succeeds

...

if objects can be replicated elsewhere in the cluster.

...

The Retire action

...

does not remove an object until it can guarantee

...

at least two replicas exist in the cluster or the existing number of replicas matches the policy.replicas min parameter value.

A retiring node or volume accepts no new or updated objects. Retiring a node or volume means all of its objects, including replicas, are copied to other nodes in the cluster.

On the Swarm Admin Console's Node Status page, the Node Operations section includes a Retire Rate that tracks  tracking the number of objects per hour that were removed from a retiring volume. The SNMP MIB includes this same value in the retireRatePerHour MIB entry. If The value is 0 if no volumes on the node are retiring, the value is 0.

After all objects are copied, the The node or volume's state changes to Retired and Swarm no longer uses the node or volume after all objects are copied. At this point, you should remove and repair the volume or discard it.

Errors and Announcements

The last 10 errors and announcements appear on the Node Status page. If The page is blank if there are no errors or announcements, the page is blank. The error count in the node summary grid corresponds to the list of errors in the error section.

Info
title

Tip

You can control Control how long uncleared error messages continue to appear in the error table by configuring the Swarm setting console.messageExpirationSeconds, which defaults to two weeks.

Messages display in the node status area if you remove removing or insert inserting a drive into a running node. This feature, referred to as hot plugging (adding a new drive) or hot swapping (replacing a failed drive), lets you remove allows removal of failed drives for analysis or to add storage capacity to a node at any time.

For example, when you add a volume, the following message appearsThe following message appears when adding a volume:

Code Block
languagebash
mounted /dev/sdb, volumeID is 561479FB832DCC526B1D7EDCD06B83E1

When you remove a volume, the The following message appears when removing a volume:

title
Code Block
languagebash
removed /dev/sdb, volumeID was 561479FB832DCC526B1D7EDCD06B83E1
Info

Note

These messages appear at the announcement level. Additional debug level messages appear in the syslog.

Node Status Reporting

You can troubleshoot Troubleshoot node errors and announcements by viewing the reporting sections in the Node Status page. You can access  Access these sections at the bottom of the Node Status page. The information in each section can be helpful when working with Swarm Support to resolve an issue.

Node Info section

The Node Info status section contains general information about the hardware installed on the node, as well as time server information and current uptime. Use this status information to determine if your a node requires additional hardware resources.

For example, if the Index Utilization and Buffer Utilization values rise to 80% or more, the The Swarm Admin Console generates an alert that indicates indicating the node may require additional RAM to maintain cluster performance . Additionally, if the Index Utilization and Buffer Utilization values rise to 80% or more. The node may not be communicating properly with an NTP server if the Time value  value does not match the same value in the remaining cluster nodes, the node may not be communicating properly with an NTP server.Image Removed

...

Additional Node Info

...

Reports

Scroll to the bottom of the Node Info section  section to access these links to additional reports:

  • SNMP Repository (the SNMP repository dump)

  • Object Counts (the Python classes in use)

  • Uncollectable Garbage

  • HTML Templates

  • Loggers... (the settings window for changing the logging levels)

  • Dmesg dump (the last 1000 messages logged by the Linux kernel reading buffer, for diagnosing a Swarm issue when a system panic or error occurs)

  • Hwinfo dump (the Linux hardware detection tool output)

Node Configuration

...

Section

The Node Configuration status section contains the cluster and network configuration settings assigned to the node. Use this status information to quickly verify your the system configuration without using SNMP commands.Image Removed

...

Node Operations

...

Section

The Node Operations status section describes the state of the node. If you encounter a problem in your storage cluster, a A Swarm Support representative can use the information in this page to help you determine assist in determining if the node is communicating effectively with other nodes and resources in the cluster if a problem is encountered a problem in a storage cluster.

For example, some Some cluster features (such as the Capacity column  column value in the Swarm Admin Console) will do not update until the HP cycles are completed separately on each node. The HP Cycle time parameter  parameter increases exponentially as the number of objects increase on the node. Additionally, The node may not be servicing new requests if the SCSP Last read bid and  and SCSP Last write bid parameters  parameters are high, the node may not be servicing new requests.Image Removed

...

Hardware Status

...

Section

The Hardware Status section contains status and operational reporting (if available) for various hardware components installed on the node. Use this status information to retrieve node system data, such as the serial number and BIOS version.

Hardware status reporting is dependent on hardware that supports and populates IPMI supporting and populating IPMI sensors, SMART SMART status, and, in some cases, manufacturer-specific components such as SAS. Depending on your hardware, not Not all status fields are populated depending on the hardware. The hardware status values are independently scanned and populated for each node, allowing variations in supported utilities on a node-by-node basis.Image Removed

...

Additional Hardware Status

...

Reports

Scroll to the bottom of the Hardware Status section  section to access these links to additional reports:

  • Test Network - Pings all nodes in the cluster to ensure that verify all nodes can communicate with each other using TCP/IP and UDP (see details below).

  • Test Volumes - Pings the volumes on

    your

    the local hard drives and provides a response time (in milliseconds).

  • Dmesg Dump - Displays the last 1000 messages logged by the Linux kernel reading buffer. These messages can help

    you

    troubleshoot and diagnose a Swarm issue when a system panic or error occurs.

  • Hwinfo dump (the Linux hardware detection tool output)

  • Send Health Report (script that sends the hardware health report to the configured destination)

Test Network

Test Network performs  performs two sets of tests:

  • First, it sends 100 UDP multicasts to the cluster and computes the results:

  • Which nodes responded

  • How many responses returned

  • How long the responses took, on average

  • Next, it fetches the status page (port 80) via TCP for all

    of those

    responding nodes (

    only

    once for each node). It tracks the total time for each of those round trips.

The data in the Network Test Results window lets you compare allows comparing the responding nodes with the list of expected nodes that you expected to see in the cluster. You can also evaluate Evaluate UDP packet loss and TCP connectivity within the cluster.

title
Info

Important

If

A network issue may exists in the cluster if one or more nodes do not appear in the display

, you may have a network issue in the cluster

.