Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
snmpset -v2c -c ourpwdofchoicehere -m ./CARINGO-MIB.txt:./CARINGO-CASTOR-MIB.txt 
	192.168.99.100 castorCancelVolumeRetire s "all"

Fast vs. Slow Retire

...

The only way to have a slow retire (without recovery) is to initiate it manually. If a retire is kicked off from I/O errors, it will always be the “fast” retire.

  • Retire can be initiated due to Swarm detecting hardware issues or the retire may be manually initiated. In the latter case, the volume can be unretired to return it to normal service.

  • The “fast” retire initiates a cluster-wide recovery of the volume that attempts to rapidly replace the replicas within the cluster that were on that/those volume(s). This action usually has an impact on cluster performance during recovery. The “slow” retire has minimal performance impact but takes longer. After the recovery phase, “fast” and “slow” retires do a checking phase that is largely the same.

  • Retiring a volume requires significant work and typically takes weeks to complete. The estimate of three HP cycles is not a bad one. The retire rate per hour can be monitored via SNMP. It is also visible in the management API under a node endpoint. Customers can watch the stream counts go down on the volume(s).
    It is normal for the rate stream count to be constant for long periods near the end of the process (it’s not linear). This is because the entire disk has to be scanned for the remaining streams.

Retire a Chassis

Retiring a node/chassis is the same as retiring all its volumes (and other related facts). Swarm retires all drives within a chassis when retiring. Drives can be reformatted and returned to service if in good shape.

...