Maintenance mode

Swarm 7.0 introduced a node state called "Maintenance". When a node is gracefully rebooted or shut down, for example to perform hardware upgrades, firmware updates, etc, the other nodes in the cluster are informed of the state change and reflect a cluster status of "Maintenance" with a yellow background and the affected node is also labelled in the same fashion. This only occurs on graceful reboots or shutdowns using the admin console or via SNMP.

When a node is in maintenance mode, volume recoveries are not attempted on the offline/maintenance volumes for a certain amount of time. Health processor replication of streams with hints to those volumes are also not attempted. This reduces cluster wide over replication while nodes are temporarily taken offline on purpose but allows other nodes with legitimate failures to still execute volume recoveries if necessary.

The default timeout for maintenance mode is 10800 seconds, or 3 hours. This is set in the config files as recovery.volMaintenanceInterval or via SNMP as volMaintenanceInterval. If a node is offline for longer than this interval, then its status changes to "Offline", volume recovery is initiated, and health processor replication for streams with hints for that node resume.