Swarm volumes can be replaced after either an admin-initiated Retire (see Managing Chassis and Disks) or a Swarm-initiated failure resulting from I/O errors (see Retiring Volumes). After a volume is marked either Retired or Unavailable, it can be replaced.
Administrators can insert a drive into a running node without restarting the server, provided that the server hardware supports this function and disk.volumes = all is configured. This feature (called hot plugging or hot swapping) lets you add storage capacity to a node at any time.
See Hot Swapping and Plugging Drives.
Identifying the Drive
When a volume is marked unavailable or retired, its physical drive light turns on and stays on for one hour. When you need help identifying a failed or failing drive, use the drive light features of the UI:
- Swarm UI: Click through Cluster > Hardware to view the chassis, and enable the drive light. To flash the drive light for a specific drive, click the disk light toggle in its summary row. When you enable drive lights manually, they will remain lit until you turn them off.
See Managing Chassis and Drives. - Legacy Admin Console: The Identify feature lets you identify a Retired volume that needs to be replaced. However, if the volume was marked Unavailable, use process of elimination: identify each of the working volumes in the chassis to determine which one does not flash and therefore needs to be replaced.
Once you have identified the correct drive, you can simply remove the drive and verify its serial number with the message in the UI. When you insert a new drive, Swarm will recognize that a new volume is available and will then format it for use.
See Drive Identification Plugin.
Suspending Volume Recovery
While replacing a failed hard drive, be sure to suspend volume recovery:
Swarm UI: In the Swarm UI, administrators can suspend an in-process volume recovery using the Suspend Recovery option under the settings (gear) icon. After the drive is replaced, resume the recovery using either the Enable Disk Recovery button in the banner message or the Enable Recovery under the settings gear icon.
Tip
For drive-related events requiring user action (such as drive removal), Swarm helps you locate the hardware by including the SCSI locator (bus ID) and volume serial number in the log message that displays in the UI. (v9.2)
- Legacy Admin Console:
- In the Settings menu, select Volume: Suspend Recovery.
- Remove the defective drive and install the replacement drive.
- Ensure that the new drive appears in the Swarm Admin Console and has a non-zero stream count after several minutes of cluster activity.
- In the Settings menu, turn off Volume: Suspend Recovery.