Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
3
minLevel1
maxLevel2
outlinefalse
typelist
printablefalse

Choosing S3 for Disaster Recovery

In addition to on-premises Swarm storage, your an organization may want to take advantage of public cloud services for off-premises disaster recovery (DR) storage. With its the S3 protocol support, Swarm gives you the choice provides choices of many public cloud providers including AWS S3, AWS S3 Glacier, and Wasabi. 

By implementing an S3 backup feed from Swarm, you have the The security of knowing that your backups are continuous, have is continuous. It has minimal latency , and require requires little intervention and monitoring by youimplementing an S3 backup feed from Swarm. Using Swarm's feed mechanism for backup leverages numerous existing strengths: its

  • The long-term iteration over objects in the cluster

...

  • A proven method for tracking work as it is performed

...

  • Support for TLS network encryption and forward proxies. Using the parallelism of the entire Swarm cluster makes the best use of

...

  • network bandwidth

...

  • while sending the backups through an optional forward proxy allows

...

  • implementing bandwidth throttling if needed.

Back up : S3 Backup is an integral part of your the operating Swarm cluster. In the Swarm UI, create a new feed of type S3 Backup, giving and provide credentials and information about the network path to the service. After the feed is started, you can monitor its progress and be warned progress can be monitored and warnings of blockages and particular object failures can be sent, as with any other feed. The S3 Backup feed honors the versioning settings in your a cluster, as enabled, disabled, or suspended throughout the domains and buckets. While you can create multiple S3 Backup feeds can be created, each one requires its own a dedicated target bucket.

Clean up : No action on your part is needed to keep the backup current and trimmed. Whenever you disable Swarm  When disabling Swarm versioning on buckets or domains, delete the buckets or /domains, or have expire the object lifepoints expire, as the Swarm feeds mechanism processes the expired content as deleted, allowing the S3 Backup feed to clear them from the S3 bucket. Throughout content additions and deletions, the total number of objects in the S3 bucket always approximates twice the number of logical objects that you are backing up from the source cluster (because AWS functionality requires there to be one for the object's content and another for its metadata).

Restore : The Restore tool runs outside of Swarm, using a command-line interface for executing the data and restoration tasks. You can restore what you need:  Restore what is needed, either the entire cluster , or only some portions of the cluster. Swarm supports bulk restores at the granularity of cluster, domain, or bucket, as well as more surgical restores of a few objects. You Multiple copies can also be run multiple copies to achieve a faster, parallel recovery. See the S3 Backup Restore Tool.

Info

Important

Objects in the S3 backup bucket are wholly dedicated to DR for Swarm and are not for general use by owners of the account where the bucket resides. Swarm uses a very specific naming convention within the backup bucket in order to provide 100% fidelity for object restoration. No external processes other than Swarm should manipulate the content within this bucket.

Standard or Cold Storage?

Swarm 12 supports the AWS S3 storage classes for standard buckets and those using S3 Glacier and S3 Glacier Deep Archive. For this discussion, cold storage refers to S3 Glacier and S3 Glacier Deep Archive, and standard storage refers to the traditional S3 storage classes.

Refer to the documentation for your the public cloud provider, and consider these points when choosing among the AWS S3 storage classes:

  • Cold storage offers the lowest monthly prices per byte stored compared to the standard storage classes.

  • Standard storage classes have low-latency retrieval times, which can allow a Swarm Restore to complete in a single run.

  • Cold storage has longer retrieval latency, as much as 12-48 hours for S3 Glacier Deep Archive, to pull content from archival storage. Depending upon how a restore is performed, you the Swarm Restore tool may need to be run the Swarm Restore tool multiple times over several hours in order to complete a restoration.

  • Cold storage incurs additional charges for egress and API requests to access your the backup, so it is best suited to low-touch use cases.

  • S3 Glacier Deep Archive rounds up small objects, so the overall footprint being charged may be larger because of Swarm's use of metadata objects.

Public storage pricing is competitive, and you may find that services such as Wasabi Hot Cloud Storage may compare very favorably with AWS cold storage, especially when you consider considering egress and API charges.

Setting up the S3 Bucket

To implement an S3 backup feed, first complete a one-time set up setup of the S3 side: set . Set up an account with an S3 cloud service provider and then create an S3 bucket dedicated to backing up this cluster.

...

Note

...

Swarm must be granted access to the target S3 bucket and provide login credentials as part of the S3 backup feed configuration. Neither the S3 Backup feed nor the S3 Backup Restore Tool administers

...

the S3 credentials or create any target S3 buckets.

While these instruction steps are for AWS S3 (see also S3 Backup Feeds to Wasabi), S3-based public cloud providers have a similar setup process:

  1. Service

...

  1. : Sign up for Amazon S3 if needed.

...

    1. Navigate to aws.amazon.com/s3 and

...

    1. select Get started with Amazon S3.

    2. Follow the on-screen instructions.

    3. AWS notifies by email when

...

    1. the account is active and ready to use.

    2. Note

...

    1. : S3 is accessed for

...

    1. the new bucket but the separate IAM service for

...

    1. the new user:

      Image Modified
  1. Bucket

...

  1. : Create a bucket dedicated to backing up the Swarm cluster.

    1. Sign in and open the S3 console: console.aws.amazon.com/s3

    2. Choose Create bucket. (See S3 documentation: Creating a Bucket.) 

    3. On tab 1 - Name and region, make

...

    1. the initial entries:

      1. For Bucket name, enter a DNS-compliant name for

...

      1. the new bucket.

...

      1.  This cannot be changed later, so choose well:

        1. The name must be unique across all existing bucket names in Amazon S3.

        2. The name must be a valid DNS name, containing

...

        1. lowercase letters and numbers (and internal periods, hyphens, underscores), between 3 and 64 characters. (See S3 documentation: Rules for Bucket Naming.)
          Tip: For easier identification, incorporate the name of the Swarm cluster that this bucket is dedicated to backing up.

      1. For Region, choose the one that is appropriate for

...

      1. business needs. (See S3 documentation: Regions and Endpoints.)

    1. On tab 2 - Configure options, take the defaults. (See S3 documentation: Creating a Bucket, step 4.)
      Best practice: Do not enable versioning or any other optional features, unless it is required for

...

    1. the organization.

    2. On tab 3 - Set permissions, take the default to select Block all public access; now

...

    1. the bucket owner account has full access.
      Best practice: Do not use the bucket owner account to provide Swarm's access to the bucket; instead, create a new, separate IAM user that holds the credentials to share with Swarm. 

...

    1. Select Create, and record the fully qualified bucket name (such as "arn:aws:s3:::example.cluster1.backup") for use later, in policies.

    2. Record these values for configuring

...

    1. the S3 Backup feed in Swarm:

      • Bucket Name

      • Region

  1. User

...

  1. : Create a programmatic (non-human) user dedicated to Swarm access.

    1. On the Amazon S3 console, select the service IAM (Identity and Access Management)

...

    1. and click Users.
      Image Modified

    2. Add a dedicated user, such as caringo_backup, to provide Programmatic access for Swarm.
      Image Modified

    3. The IAM console generates an access key (an access key ID + secret access key), which

...

    1. must

...

    1. be recorded immediately.
      (See S3 documentation: Managing Access Keys for IAM Users and Understanding and Getting Your Security Credentials.)

      • This is

...

      • the sole opportunity to view or download the secret access key, so save it in a secure place.

    1. Record the fully qualified user (such as "arn:aws:iam::123456789012:user/caringo_backup") for use later, in policies.

    2. Record these values for configuring

...

    1. the S3 Backup feed in Swarm:

      • Access Key ID

      • Secret Access Key

  1. Policies

...

  1. : Create policies on both the user and the bucket so

...

  1. the programmatic user has exclusive rights to

...

  1. the S3 bucket.

...

  1. Use the policy generators provided or enter edited versions of the examples below.

    1. Create an IAM policy for this user, allowing it all S3 actions on the backup bucket, which

...

    1. need to

...

    1. be specified as a fully qualified

...

    1. Resource(

...

    1. recorded above), starting with arn:aws:s3:::
      Image Modified

      IAM

...

    1. Policy

      Code Block
      languagexml
      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Action": "s3:*",
                  "Resource": "arn:aws:s3:::example.cluster1.backup"
              }
          ]
      }
    2. Create a matching bucket policy to grant access to the dedicated backup user, which

...

    1. needs to

...

    1. be specified as a fully qualified Principal, which is the User ARN (

...

    1. recorded above) starting with arn:aws:iam:: (See S3 Using Bucket Policies.) 
      Using the Policy Generator, allow all S3 actions for the bucket, using the full ARN name:
      Image Modified

      Bucket

...

    1. Policy

      Code Block
      languagexml
      {
        "Id": "Policy1560809845679",
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "Stmt1560809828003",
            "Action": "s3:*",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::example.cluster1.backup",
            "Principal": {
              "AWS": [
                "arn:aws:iam::123456789012:user/caringo_backup"
              ]
            }
          }
        ]
      }
  1. Best practice for security: After

...

  1. implementing the S3 Backup feed in Swarm, write a script to

...

  1. automate the rotation of the S3 secret access key on a regular basis, including

...

  1. updates in the S3 Backup feed definition in Swarm (using the management API call, given in Rotating the S3 Access Key, below).

Configuring the S3 Backup Feed

The S3 Backup Feed option is available in Swarm 11 and higher, and it may be used immediately after upgrading Swarm Storage. (v11.0)

In addition to Swarm's other feed types, Search and Replication, you can create a dedicated S3 Backup feed can be created. It resembles a Replication feed, but it requires an S3 bucket as the destination and has defaults appropriate for use with a cloud service.

...

  1. Navigate to the Feeds page.
    Image Modified

  2. Select + Add at the top right.

    Image Modified

...

  1. Select the S3 Backup feed type:

    Image Modified

An S3 Backup feed has these parameters:

ID (

existing

Existing feeds)

Read-only; system-assigned identifier

Status (

existing

Existing feeds)

Read-only; the current feed processing state. The state can be:

  • Active

.
  • : Default state when operating normally.

  • Recovering

.
  • : Temporarily paused due to volume recovery.

  • Paused

.
  • : Paused by user request.

  • Blocked

.
  • : Processing blocked due to a transient condition. 

  • Configuration

error.
  • Error: Feed is unable to operate due to incorrect configuration

  • Overlapping

feeds.
  • Feeds: More than the limit of 8 feeds

have been
  • are defined for the same set of objects.

  • Closed

.
  • : Feed is inactive and no longer operational.

Name

The name

you attach

attached to this backup feed.

Scope

The scope filter

that you select

selected for

your

the backup feed. Backup includes objects within the scope

you indicate

indicated here. If

your

the scope includes a context where Swarm object versioning is (or was) generating historical versions, those versions are backed up as well.

  • Entire

source cluster
  • Source Cluster (

global
  • Global)

 — To
  • : To replicate all objects in the source cluster, leave the default selection of Entire source cluster (global)

  • Only

objects
  • Objects in

select domain
  • Select Domain(s)

 —
  • : To replicate

only
  • the objects in one or more domains, select the 'Only objects in select domain(s) option. In the text box that appears, enter one or more domains:

    • To replicate

only
    • the objects within a specific domain, enter that domain.

    • To replicate

only
    • the objects within multiple domains, enter those domains separated by commas and/or use pattern matching.

    • To exclude domains from replication, enter them.

The field value allows pattern matching with the Python regular expression (RE) syntax so

that

multiple domain names can be matched. The exception to the RE matching is that the "{m,n}" repetitions qualifier may not be used.

An example domain list value using RE is: .*\.example\.com 
This matches both of these domains: accounting.example.com, engineering.example.com.

Image Modified
  • Include

objects
  • Objects without a

domain —
  • Domain: To replicate any unnamed objects that are not tenanted in any domain, enable the option.

Target S3 Provider

The configuration for

your

the S3 bucket.

info
Note

Caution

Although it

's

is possible to specify another Swarm cluster (via Content Gateway S3) for

your

the S3 backup, it is risky if there is any chance of it replicating back to

your 

the source cluster: both clusters can fill to capacity with backups of backups. Best practice is to use a regular Replication feed, which has the mechanisms needed for mirroring clusters safely.

Host

From

your

the S3 configuration, the host name of the S3 service.

You cannot use an

An IP address cannot be used here because the host name itself becomes the Host header in the feed operation, which is required for communication to S3 services.

Info

Important

Add

your

the bucket name as the prefix to the host name (mybackup.s3.aws.com). This prefix must match the bucket name exactly, including case. This

supports the

supports the new AWS bucket-in-host request style. If

you do not have

the bucket is not defined here, Swarm uses the legacy bucket-in-path (s3.aws.com/mybackup)

 request

 request style. (v12.0) 

  • Amazon AWS: Existing feeds are not required to change to this format immediately, but new ones should, as bucket-in-path is unsupported in the future. 

  • Other S3

provider: Ensure that your
  • Provider: Verify the provider supports the bucket-in-host request style, where the bucket is part of the FQDN; if not, use bucket-in-path.

Port

The port to use for the S3 service, which defaults to 443 (for HTTPS) or else 80 (HTTP), if

you disable 

Require trusted SSL is disabled, below. If

you customize

the port is customized, the value no longer updates based on changes to the SSL setting.

Region

From

your

the S3 configuration, the destination S3 bucket’s region.

 

Note

:

Changing this value triggers a restart of the feed.

Bucket

From

your

the S3 configuration, the destination S3 bucket name. This bucket must be dedicated to

one and only

one source cluster. Complete this field regardless of whether

your

the Host includes the bucket name as a prefix.

Note

:

Changing this value triggers a restart of the feed.

Access

key

Key ID
and

secret key

Secret Key

From

your

the S3 configuration, the S3 access key ID and S3 secret access key to use. (See S3 documentation: Understanding and Getting Your Security Credentials.)

Swarm

protects your

protects the secret key as a secure field, and hides it. Updating the key does not trigger a restart of the feed, so

you

keys may

update keys

be updated as frequently as

your

the security policies require.

SSL Server

For production usage, select Require trusted SSL.

Info
:

Threads

The default backup speed

(6 simultaneous threads

for push thread (20 per Swarm Storage node) is optimal for maintaining an existing S3 backup.

 

 In releases prior to v15.0, it was 6 per volume.

For a faster initial backup, increase the threads temporarily, but monitor bandwidth and cluster performance, as boosting the speed stresses internet bandwidth.

Rotating the S3 Access Key

Tip

Best Practice

It is a DevOps best

...

practice to routinely change

...

cloud access credentials and to automate this S3 access key rotation for

...

the S3 Backup feed.

  1. Through

...

  1. the public cloud provider, create a new S3 access key and grant the correct permissions for the target S3 bucket.

  2. Using Swarm's management API, update the access credentials for

...

  1. the existing S3 backup feed.

...

  1. Expire/remove the old S3 access key upon confirming successful feed operations with the new credentials

...

  1. .

The following command template demonstrates how to use the Swarm management API to update the access credentials for an existing S3 backup feed:

Code Block
curl -X PATCH --header 'Content-Type: application/json' -u <admin>:<password> -d '[       \
{"op": "replace", "path":"destination/accessKeyId", "value":"<newAccessKeyID>"},          \
{"op": "replace", "path":"destination/secretAccessKey", "value":"<newSecretAccessKey>"}]' \
'http://<nodeIP>/api/storage/s3backupfeeds/<s3feedid>'
  • <admin> — The Swarm administrative user name, which is usually admin.

  • <password> — The Swarm administrative password, required for all management API calls that perform actions.

  • <newAccessKeyID> — The new access key ID for the target S3 bucket.

  • <newSecretAccessKey> — The new secret access key for the target S3 bucket.

  • <nodeIP> — The IP address of any Swarm node in the cluster.

  • <s3feedid> — The small integer feed ID that is associated with the S3 Backup feed. It appears as the feed's ID field in the Swarm UI.

Insert excerpt
Replication Feeds
Replication Feeds
nopaneltrue

...