Stale Volumes and the ":k" flag

Every volume in a Swarm cluster is checked during the startup of a node. If a volume has been disconnected from the cluster for more than 14 days*, it is considered "stale" and its contents are not used unless an administrator specifically overrides this behavior.

When online, the volume contains a timestamp that gets regularly touched when active. However, if a raid card failure or simply bringing down the node for more than 2 weeks prevents access to the volume, then that timestamp is no longer touched. If it has not been touched in more than 2 weeks, then Swarm considers it stale. That means that Swarm will not let the node back in the cluster unless you override the configuration with the ":k" modifier in the vols line in the cluster.cfg/ node.cfg file (see examples below).

In this case you will see a message that looks like this:

Apr 2, 2013 19:30:48 [/dev/cciss/c0d6] is stale, last touched at Sun Mar 10 07:27:42 2013. Use 'k' modifier to mount, but note in that case deleted streams on it are resurrected.

The purpose of this feature is to prevent older, invalid streams from reintroducing themselves back into the cluster. Swarm normally remembers about object deletions for 2 weeks, but if a deleted object is present on the drive that has been marked "stale", and it has been more than 2 weeks, then when the volume is reintroduced (with the :k flag) the object will be reintroduced into the cluster. The memory that it should have been deleted will be lost resulting in needless consumption of space by objects that were believed deleted and which may now be difficult to get rid of. The need for the ":k" flag is to prevent reintroducing deleted objects. This is not a problem for content that was deleted by automatic lifepoint policies because the content is discovered and deleted by Swarm’s continuous health processor. In that case, HP will simply delete the reintroduced object again on the next run.

Swarm's normal behavior when a volume goes offline is to use the failed volume recovery (FVR) process to replicate the data that the missing volume contained. Unless there were multiple simultaneous (or nearly simultaneous) drive failures, a node or volume that has been offline for more than 2 weeks should have already had its contents recovered, so there may be no good reason to reintroduce it. You could instead just format it and introduce it as a new drive.

Examples using the "keep" flag to ignore the timestamp (use "volumes" if your node.cfg is organized into sections such as [disk]):

To use the "keep" flag on all volumes:

vols = ALL:k

or better

disk.volumes = ALL:k

or if your configuration is already broken into sections

[disk]

volumes = ALL:k

To use the "keep" flag on only one volume:

vols = /dev/sda /dev/sdb:k

or

[disk]

volumes = /dev/sda /dev/sdb:k

*Note that the 14 day interval is configurable.

You can use disk.deleteMarkerLifespan to alter the length of time that delete markers are retained - 1209600 seconds (2 weeks) is the default. If you are getting "stale" messages as above, it is too late to make this change - your delete markers have already "expired." This would need to be done in advance of a disk being offline to be effective. Increasing retention of delete markers will increase your index space utilization, which may be a problem if your nodes don't have sufficient memory.

For example, to retain markers for four weeks:

[disk]

deleteMarkerLifespan = 2419200