Replacing Failed Drives

Swarm volumes can be replaced after either an admin-initiated Retire (see Managing Chassis and Drives) or a Swarm-initiated failure resulting from I/O errors (see Retiring Volumesarchived). A volume can be replaced after being marked either Retired or Unavailable.

Administrators can insert a drive into a running node without restarting the server, provided the server hardware supports this function and disk.volumes = all is configured. This feature (called hot plugging or hot swapping) allows adding storage capacity to a node at any time.

See Hot Swapping and Plugging Disks.

Identifying the Drive

The physical drive light turns on and stays on for one hour when a volume is marked unavailable or retired. Use the drive light features of the UI when identifying a failed or failing drive:

  • Swarm UI: Click through Cluster > Hardware to view the chassis, and enable the drive light. Click the disk light toggle in the summary row to flash the drive light for a specific drive. Drive lights remain lit until turned off when enabling manually. 
    See Managing Chassis and Drives.

  • Legacy Admin Console: The Identify feature allows identification of a Retired volume that need to be replaced. Use process of elimination if the volume was marked Unavailable: identify each of the working volumes in the chassis to determine which one does not flash and therefore needs to be replaced. 

Remove the drive and verify the serial number with the message in the UI once the correct drive has been identified. Swarm recognizes a new volume is available and formats it for use when a new drive is inserted.

See Drive Identification Plugin.

Suspending Volume Recovery

Suspend volume recovery while replacing a failed hard drive:

  • Swarm UI: Administrators can suspend an in-process volume recovery using the Suspend Recovery option under the settings (gear) icon in the Swarm UI. Resume the recovery using either the Enable Disk Recovery button in the banner message or the Enable Recovery under the settings gear icon after the drive is replaced.

Tip

For drive-related events requiring user action (such as drive removal), Swarm helps locate the hardware by including the SCSI locator (bus ID) and volume serial number in the log message displayed in the UI. (v9.2)

  • Legacy Admin Console:

    1. Select Volume: Suspend Recovery in the Settings menu.

    2. Remove the defective drive and install the replacement drive.

    3. Verify the new drive appears in the Swarm Admin Console and has a non-zero stream count after several minutes of cluster activity.

    4. Turn off Volume: Suspend Recovery in the Settings menu.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.