Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
maxLevel3

...

  • Cold storage offers the lowest monthly prices per byte stored compared to the standard storage classes.

  • Standard storage classes have low-latency retrieval times, which can allow a Swarm Restore to complete in a single run.

  • Cold storage has longer retrieval latency, as much as 12-48 hours for S3 Glacier Deep Archive, to pull content from archival storage. Depending upon how a restore is performed, the Swarm Restore tool may need to be run multiple times over several hours to complete a restoration.

  • Cold storage incurs additional charges for egress and API requests to access the backup, so it is best suited to low-touch use cases.

  • S3 Glacier Deep Archive rounds up small objects, so the overall footprint being charged may be larger because of Swarm's use of metadata objects.

...

While these instruction steps are for AWS S3 (see also S3 Backup Feeds to Wasabi), S3-based public cloud providers have a similar setup process:

  1. Service — Sign up for Amazon S3 if needed.

    1. Navigate to aws.amazon.com/s3 and choose Get started with Amazon S3.

    2. Follow the on-screen instructions.

    3. AWS notifies by email when the account is active and ready to use.

    4. Note: S3 is accessed for the new bucket but the separate IAM service for the new user:

  2. Bucket — Create a bucket dedicated to backing up the Swarm cluster.

    1. Sign in and open the S3 console: console.aws.amazon.com/s3

    2. Choose Create bucket. (See S3 documentation: Creating a Bucket.) 

    3. On tab 1 - Name and region, make the initial entries:

      1. For Bucket name, enter a DNS-compliant name for the new bucket. This cannot be changed later, so choose well:

        1. The name must be unique across all existing bucket names in Amazon S3.

        2. The name must be a valid DNS name, containing lowercase letters and numbers (and internal periods, hyphens, underscores), between 3 and 64 characters. (See S3 documentation: Rules for Bucket Naming.)
          Tip: For easier identification, incorporate the name of the Swarm cluster that this bucket is dedicated to backing up.

      2. For Region, choose the one that is appropriate for business needs. (See S3 documentation: Regions and Endpoints.)

    4. On tab 2 - Configure options, take the defaults. (See S3 documentation: Creating a Bucket, step 4.)
      Best practice: Do not enable versioning or any other optional features, unless it is required for the organization.

    5. On tab 3 - Set permissions, take the default to select Block all public access; now the bucket owner account has full access.
      Best practice: Do not use the bucket owner account to provide Swarm's access to the bucket; instead, create a new, separate IAM user that holds the credentials to share with Swarm. 

    6. Choose Create, and record the fully qualified bucket name (such as "arn:aws:s3:::example.cluster1.backup") for use later, in policies.

    7. Record these values for configuring the S3 Backup feed in Swarm:

      • Bucket Name

      • Region

  3. User — Create a programmatic (non-human) user dedicated to Swarm access.

    1. On the Amazon S3 console, select the service IAM (Identity and Access Management), click Users.

    2. Add a dedicated user, such as caringo_backup, to provide Programmatic access for Swarm.

    3. The IAM console generates an access key (an access key ID + secret access key), which must be recorded immediately.
      (See S3 documentation: Managing Access Keys for IAM Users and Understanding and Getting Your Security Credentials.)

      • This is the sole opportunity to view or download the secret access key, so save it in a secure place.

    4. Record the fully qualified user (such as "arn:aws:iam::123456789012:user/caringo_backup") for use later, in policies.

    5. Record these values for configuring the S3 Backup feed in Swarm:

      • Access Key ID

      • Secret Access Key

  4. Policies — Create policies on both the user and the bucket so the programmatic user has exclusive rights to the S3 bucket. Use the policy generators provided or enter edited versions of the examples below.

    1. Create an IAM policy for this user, allowing it all S3 actions on the backup bucket, which need to be specified as a fully qualified Resource (recorded above), starting with arn:aws:s3:::

      IAM policy

      Code Block
      languagexml
      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": "s3:*",
                  "Resource": "arn:aws:s3:::example.cluster1.backup"
              }
          ]
      }


    2. Create a matching bucket policy to grant access to the dedicated backup user, which need to be specified as a fully qualified Principal, which is the User ARN (recorded above) starting with arn:aws:iam:: (See S3 Using Bucket Policies.) 
      Using the Policy Generator, allow all S3 actions for the bucket, using the full ARN name:

      Bucket policy

      Code Block
      languagexml
      {
        "Id": "Policy1560809845679",
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "Stmt1560809828003",
            "Action": "s3:*",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::example.cluster1.backup",
            "Principal": {
              "AWS": [
                "arn:aws:iam::123456789012:user/caringo_backup"
              ]
            }
          }
        ]
      }


  5. Best practice for security: After implementing the S3 Backup feed in Swarm, write a script to automate rotation of the S3 secret access key on a regular basis, including updating in the S3 Backup feed definition in Swarm (using the management API call, given in Rotating the S3 Access Key, below).

...

ID (existing feeds)

Read-only; system-assigned identifier

Status (existing feeds)

Read-only; the current feed processing state. The state can be:

  • Active. Default state when operating normally.

  • Recovering. Temporarily paused due to volume recovery.

  • Paused. Paused by user request.

  • Blocked. Processing blocked due to a transient condition. 

  • Configuration error. Feed is unable to operate due to incorrect configuration

  • Overlapping feeds. More than the limit of 8 feeds are defined for the same set of objects.

  • Closed. Feed is inactive and no longer operational.

Name

The name attached to this backup feed.

Scope

The scope filter selected for the backup feed. Backup includes objects within the scope indicated here. If the scope includes a context where Swarm object versioning is (or was) generating historical versions, those versions are backed up as well.

  • Entire source cluster (global) — To replicate all objects in the source cluster, leave the default selection of Entire source cluster (global)

  • Only objects in select domain(s) — To replicate the objects in one or more domains, select the 'Only objects in select domain(s) option. In the text box that appears, enter one or more domains:

    • To replicate the objects within a specific domain, enter that domain.

    • To replicate the objects within multiple domains, enter those domains separated by commas and/or use pattern matching.

    • To exclude domains from replication, enter them.

The field value allows pattern matching with the Python regular expression (RE) syntax so multiple domain names can be matched. The exception to the RE matching is that the "{m,n}" repetitions qualifier may not be used.

An example domain list value using RE is: .*\.example\.com 
This matches both of these domains: accounting.example.com, engineering.example.com.

  • Include objects without a domain — To replicate any unnamed objects that are not tenanted in any domain, enable the option.

Target S3 Provider

The configuration for the S3 bucket.

Info

Caution

Although it is possible to specify another Swarm cluster (via Content Gateway S3) for the S3 backup, it is risky if there is any chance of it replicating back to the source cluster: both clusters can fill to capacity with backups of backups. Best practice is to use a regular Replication feed, which has the mechanisms needed for mirroring clusters safely.

Host

From the S3 configuration, the host name of the S3 service. An IP address cannot be used here because the host name itself becomes the Host header in the feed operation, which is required for communication to S3 services.

Important: Add the bucket name as the prefix to the host name (mybackup.s3.aws.com). This prefix must match the bucket name exactly, including case. This supports the new AWS bucket-in-host request style. If the bucket is not defined here, Swarm uses the legacy bucket-in-path (s3.aws.com/mybackuprequest style. (v12.0) 

  • Amazon AWS: Existing feeds are not required to change to this format immediately, but new ones should, as bucket-in-path is unsupported in the future. 

  • Other S3 provider: Verify the provider supports the bucket-in-host request style, where the bucket is part of the FQDN; if not, use bucket-in-path.

Port

The port to use for the S3 service, which defaults to 443 (for HTTPS) or else 80 (HTTP), if Require trusted SSL is disabled, below. If the port is customized, the value no longer updates based on changes to the SSL setting.

Region

From the S3 configuration, the destination S3 bucket’s region. 

Note: Changing this value triggers a restart of the feed.

Bucket

From the S3 configuration, the destination S3 bucket name. This bucket must be dedicated to one source cluster. Complete this field regardless of whether the Host includes the bucket name as a prefix.

Note: Changing this value triggers a restart of the feed.

Access key ID
and secret key

From the S3 configuration, the S3 access key ID and S3 secret access key to use. (See S3 documentation: Understanding and Getting Your Security Credentials.)

Swarm protects the secret key as a secure field, and hides it. Updating the key does not trigger a restart of the feed, so keys may be updated as frequently as the security policies require.

SSL Server

For production usage, select Require trusted SSL.

Recommended: To keep bandwidth usage by the S3 Backup feed in check, select the option to use a Local Cluster Forward Proxy and configure one for that purpose. The Forward Proxy Host (hostname or IP address) and Port are required.

Threads

The default backup speed for push thread (20 per Swarm Storage node) is optimal for maintaining an existing S3 backup. Earlier In releases prior to v15.0, it was 6 per volume.

For a faster initial backup, increase the threads temporarily, but monitor bandwidth and cluster performance, as boosting the speed stresses internet bandwidth.

...