Time of Last Access - atime

Swarm can capture and persist the time of last access ("atime") on objects and add it to the search feed. This allows search queries to list objects that may be candidates for deletion or tiering (moving to cheaper storage). Write an application using atime values to purge "cold" objects not read in the last three years. Swarm stores the atime as the Castor-System-Accessed header and indexes it as the accessed field in Elasticsearch, which is useful for bulk evaluations of content. 

Performance Impacts

Tracking atime does affect performance, so enable only if needed. Tracking access times can incur long-tail latencies on first reads, particularly when disk demands are heavy. For around 90% of objects read for the first time, the latency is negligible (<1 ms); when requests queue on specific volumes do the effects become noticeable. Subsequent reads within the window of the disk.atimeGranularity value have no performance impact.

Having high numbers of small object reads (such as thumbnail images) can cause memory indexes to run full.

Implementing atime

Because support for "atime" involves changes to the underlying Elasticsearch schema, existing feeds cannot be restart after a Swarm upgrade, as the written and accessed fields are not populated for some records have the incorrect type. 

Tip

The atime feature requires a rebuilding of the search index, so take the opportunity to migrate to Elasticsearch 6 (Migrating from Older Elasticsearch) with the same reindexing.

  1. For intensive READ access scenarios, provision additional memory to support the load on the in-memory index.

  2. Finish installing the storage cluster to Swarm 10, and install the latest versions of the Swarm metrics and search RPMs in the Elasticsearch cluster.

  3. Enable the cluster setting for the feature, which is disabled by default: disk.atimeEnabled = true

  4. Create a new search feed, which uses the new Elasticsearch schema that supports atime.

  5. Complete these steps to transition to the new search food if a previous feed exists:

    1. After the new feed completes processing, make the new feed the Primary.

    2. Pause the old feed.

    3. Delete the old feed and the old index data after verifying the new feed is working as expected.

Configuring atime

The public settings for "atime" are dynamic. These values can be updated on one node and Swarm updates all others, and the values persist across reboots. Following are all settings that control the gathering of atime information:

Settings for atime

Default

Type

Description

Settings for atime

Default

Type

Description

disk.atimeEnabled
SNMP: accessedTimeEnabled

False

bool

Whether to track the time of last access on GET requests, stored in the Castor-System-Accessed header and indexed as the search field 'accessed'. Increases the load proportionally to the load of GETs in the cluster.

disk.atimeGranularity
SNMP: accessedTimeGranularity

86400

int

In seconds; defaults to 1 day. The window of time during which atime is not updated. Multiple reads may have occurred within window of time.

Lowering the value affects GET performance. A 1-second granularity provides most accurate accessed time results, but results in a GET performance penalty due to increased disk access.

disk.atimeEnabledTime
SNMP: accessedTimeEnabledTime

0

float

Non-UI. Read-only. The Linux epoch timestamp recorded when disk.atimeEnabled was set to True.

This time is nulled out in SNMP, REST API, and phone home reports if the atime feature is later disabled.

Using atime with SCSP

Swarm keeps a record of the request time of each object's last write or read (successful GET request) when enabling atime tracking for the cluster, and it sends that time to Elasticsearch as the accessed date field, for use in search queries. HEAD operations do not change an object's atime. To access atime without Elasticsearch, check the SCSP headers Swarm adds to the objects.

With atime enabled, both SCSP HEAD and GET requests include a Castor-System-Accessed header on the response when the verbose query argument is used. The Castor-System-Accessed response header has either the value of Castor-System-Created (because the object has not been read since the feature was enabled or the object was written) or else the read atime in the same GMT-based time format as Castor-System-Created. The 1-day granularity (default) in updating atime means additional reads may have occurred within that window of time.

Exceptions - GET requests trigger atime updates, except for these situations:

  • Administrative and authorized admin requests

  • Swarm requests for replication and other internal GET requests, such as for domains, settings, or manifests

  • Any request with the special query argument to suppress recording atime: notaccessed

  • Any request performing an integrity check or other specialized operation

Tip

The atime information is most useful on a HEAD request since the atime is returned without changing it. Although atime is returned on a GET request, it is simultaneously updated by the operation.

To determine if an object has been read, HEAD the object using the verbose query argument.

The Castor-System-Access value matches the Castor-System-Created if a read atime has not occurred:

> curl -I http://192.168.1.12:80/5647f528ea85667a44dc754f975816c6?verbose HTTP/1.1 200 OK Castor-System-Alias: 5647f528ea85667a44dc754f975816c6 Castor-System-Cluster: Baker Castor-System-Created: Wed, 19 Jul 2017 17:42:48 GMT Castor-System-Accessed: Wed, 19 Jul 2017 17:42:48 GMT ...

The Castor-System-Access value is more recent than the Castor-System-Created if a read has occurred:

> curl -I http://192.168.1.12:80/5647f528ea85667a44dc754f975816c6?verbose HTTP/1.1 200 OK Castor-System-Alias: 5647f528ea85667a44dc754f975816c6 Castor-System-Cluster: Baker Castor-System-Created: Wed, 19 Jul 2017 17:42:48 GMT Castor-System-Accessed: Tue, 02 Oct 2018 23:03:56 GMT ...

Using atime with Elasticsearch

In Elasticsearch, the atime value is indexed as the accessed date field, which can be used in Swarm Search Queries. Both the written and accessed fields are populated in the Elasticsearch record:

Metadata Field