...
Table of Contents | ||
---|---|---|
|
Choosing S3 for Disaster Recovery
...
Back up — S3 Backup is an integral part of your operating Swarm cluster. In the Swarm UI, you will create a new feed of type S3 Backup, giving credentials and information about the network path to the service. After the feed is started, you can monitor its progress and be warned of blockages and particular object failures, as with any other feed. The S3 Backup feed will honor the versioning settings in your cluster, as enabled, disabled, or suspended throughout the domains and buckets. While you can create multiple S3 Backup feeds, each one requires its own dedicated target bucket.
Clean up — No action on your part is needed to keep the backup current and trimmed. Whenever Whenever you disable Swarm disable Swarm versioning on buckets or domains, delete buckets or domains, or have object lifepoints expire, the Swarm feeds mechanism will process the expired content as deleted, allowing the S3 Backup feed to clear them from the S3 bucket. Throughout content additions and deletions, the total number objects in your S3 bucket will always approximate twice the number of logical objects that you are backing up from the source cluster (because AWS functionality requires there to be one for the object's content and another for its metadata).
Restore — The Restore tool runs outside of Swarm, using a command-line interface for executing the data and restoration tasks. You can restore what you need: either the entire cluster, or only portions. Swarm supports bulk restores at the granularity of cluster, domain, or bucket, as well as more surgical restores of a few objects. You can also run multiple copies to achieve a faster, parallel recovery. See the S3 Backup Restore Tool.
Info |
---|
ImportantObjects in the S3 backup bucket are wholly dedicated to DR for Swarm and are not for general use by owners of the account where the bucket resides. Swarm uses a very specific naming convention within the backup bucket in order to provide 100% fidelity for object restoration. No external processes other than Swarm should manipulate the content within this bucket. |
...
Refer to the documentation for your public cloud provider, and consider these points when choosing among the AWS S3 storage classes:
Cold storage offers the lowest monthly prices per byte stored compared to the standard storage classes.
Standard storage classes have low-latency retrieval times, which can allow a Swarm Restore to complete in a single run.
Cold storage has longer retrieval latency, as much as 12-48 hours for S3 Glacier Deep Archive, to pull content from archival storage. Depending upon how a restore is performed, you may need to run the Swarm Restore tool multiple times over several hours in order to complete a restoration.
Cold storage incurs additional charges for egress and API requests to access your backup, so it is best suited to low-touch use cases.
S3 Glacier Deep Archive rounds up small objects, so the overall footprint being charged may be larger because of Swarm's use of metadata objects.
Public storage pricing is competitive, and you might find that services such as Wasabi Hot Cloud Storage compare very favorably with AWS cold storage, especially when you consider egress and API charges.
...
To implement an S3 backup feed, you will first complete a one-time set up of the S3 side: you will set up an account with an S3 cloud service provider and then create an S3 bucket that will be dedicated to backing up this cluster only.
Info | |
---|---|
title | NoteYou must grant Swarm access to the target S3 bucket and provide login credentials as part of the S3 backup feed configuration. Neither the S3 Backup feed nor the S3 Backup Restore Tool will administer your S3 credentials or create any target S3 buckets. |
While these instruction steps are for AWS S3 (see also
Service — If needed, sign up for Amazon S3.
Go to aws.amazon.com/s3 and choose Get started with Amazon S3.
Follow the on-screen instructions.
AWS will notify you by email when your account is active and ready to use.
Note that you will access S3 for your new bucket but the separate IAM service for your new user:
Bucket — Create a bucket that will be dedicated to backing up your Swarm cluster.
Sign in and open the S3 console: console.aws.amazon.com/s3
Choose Create bucket. (
See S3See S3 documentation: Creating a Bucket.)
On tab 1 - Name and region, make your initial entries:
For Bucket name, enter a DNS-compliant name for your new bucket. You will not be able to change it later, so choose well:
The name must be unique across all existing bucket names in Amazon S3.
The name must be a valid DNS name, containing only lowercase letters and numbers (and internal periods, hyphens, underscores), between 3 and 64 characters. (
See S3 See S3 documentation: Rules for Bucket Naming.)
Tip: For easier identification, incorporate the name of the Swarm cluster that this bucket will be dedicated to backing up.
For Region, choose the one that is appropriate for your business needs. (
See S3See S3 documentation: Regions and Endpoints.)
On tab 2 - Configure options, take the defaults. (
See S3See S3 documentation: Creating a Bucket, step 4.)
Best practice: Do not enable versioning or any other optional features, unless it is required for your organization.On tab 3 - Set permissions, take the default to select Block all public access; now only the bucket owner account has full access.
Best practice: Do not use the bucket owner account to provide Swarm's access to the bucket; instead, you will create a new, separate IAM user that will hold the credentials to share with Swarm.Choose Create, and record the fully qualified bucket name (such as "
arn:aws:s3:::example.cluster1.backup
") for use later, in policies.Record these values for configuring your S3 Backup feed in Swarm:
Bucket Name
Region
User — Create a programmatic (non-human) user that will be dedicated to Swarm access.
On the Amazon S3 console, select the service IAM (Identity and Access Management), click Users.
Add a dedicated user, such as
caringo_backup
, to provide Programmatic access for Swarm.The IAM console generates an access key (an access key ID + secret access key), which you must record immediately.
See S3
(See S3 documentation: Managing Access Keys for IAM Users and Understanding and Getting Your Security Credentials.)
This is your only opportunity to view or download the secret access key, so save it in a secure place.
Record the fully qualified user (such as "
arn:aws:iam::123456789012:user/caringo_backup
") for use later, in policies.Record these values for configuring your S3 Backup feed in Swarm:
Access Key ID
Secret Access Key
Policies — Create policies on both the user and the bucket so that the programmatic user has exclusive rights to your S3 bucket. You may use the policy generators provided or enter edited versions of the examples below.
Create an IAM policy for this user, allowing it all S3 actions on the backup bucket, which you need to specify as a fully qualified
Resource
(which you recorded above), starting witharn:aws:s3:::
IAM policy
Code Block language xml title IAM policy { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:*", "Resource": "arn:aws:s3:::example.cluster1.backup" } ] }
Create a matching bucket policy to grant access to the dedicated backup user, which you need to specify as a fully qualified
Principal
, which is the User ARN (which you recorded above) starting witharn:aws:iam::
(See S3 Using Bucket Policies.)
Using the Policy Generator, be sure to allow all S3 actions for your bucket, using the full ARN name:Bucket policy
Code Block language xml titleBucket policy { "Id": "Policy1560809845679", "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1560809828003", "Action": "s3:*", "Effect": "Allow", "Resource": "arn:aws:s3:::example.cluster1.backup", "Principal": { "AWS": [ "arn:aws:iam::123456789012:user/caringo_backup" ] } } ] }
Best practice for security: After you implement the S3 Backup feed in Swarm, write a script to
automate rotation ofautomate rotation of the S3 secret access key on a regular basis, including updating in the S3 Backup feed definition in Swarm (using the management API call, given in Rotating the S3 Access Key, below).
Configuring the S3 Backup Feed
The S3 Backup Feed option is available in Swarm 11 and higher, and it may be used immediately after upgrading Swarm Storage. (v11.0)
In addition to Swarm's other feed types, Search and Replication, you can create a dedicated S3 Backup feed. It resembles a Replication feed, but it requires an S3 bucket as the destination and has defaults appropriate for use with a cloud service.
Go to the Feeds page.
Select + Add at the top right.
Choose the S3 Backup feed type:
An S3 Backup feed has these parameters:
ID (existing feeds) | Read-only; system-assigned identifier |
---|---|
Status (existing feeds) | Read-only; the current feed processing state. The state can be:
|
Name | The name you attach to this backup feed. |
Scope | The scope filter that you select for your backup feed. Backup will only include objects within the scope you indicate here. |
If your scope includes a context where Swarm object versioning is (or was) generating historical versions, those versions are backed up as well.
|
The field value allows pattern matching with the Python regular expression (RE) syntax so that multiple domain names can be matched. The exception to the RE matching is that the "{m,n}" repetitions qualifier may not be used. An example domain list value using RE is: |
|
| ||
Target S3 Provider | The configuration for your S3 bucket.
|
---|
| ||
Host | From your S3 configuration, the host name of the S3 service. You cannot use an IP address here because the host name itself becomes the Host header in the feed operation, which is required for communication to S3 services. Important: Add your bucket name as the prefix to the host name (
|
---|
| |
Port | The port to use for the S3 service, which defaults to 443 (for HTTPS) or |
---|
else 80 (HTTP), if you disable Require trusted SSL, below. If you customize the port, the value will no longer update based on changes to the SSL setting. | |
Region | From your S3 configuration, the destination S3 bucket’s region. Note: Changing this value triggers a restart of the feed. |
---|---|
Bucket | From your S3 configuration, the destination S3 bucket name. |
This bucket must be dedicated to one and only one source cluster. Complete this field regardless of whether your Host includes the bucket name as a prefix. Note: Changing this value triggers a restart of the feed. | |
Access key ID | From your S3 configuration, the S3 access key ID |
---|
and S3 secret access key to use. ( |
See S3 documentation: Understanding and Getting Your Security Credentials.) Swarm protects your secret key as a secure field, and hides it. Updating the key does not trigger a restart of the feed, so you may update keys as frequently as your security policies require. | |
SSL Server | For production usage, select Require trusted SSL. Recommended: To keep bandwidth usage by the S3 Backup feed in check, select the option to use a Local Cluster Forward Proxy and configure one for that purpose. The Forward Proxy Host (hostname or IP address) and Port are required. |
---|---|
Threads | The default backup speed (6 simultaneous threads) is optimal for maintaining an existing S3 backup. For a faster initial backup, increase the threads temporarily, but be sure to monitor bandwidth and cluster performance, as boosting the speed will stress your internet bandwidth. |
Rotating the S3 Access Key
It is a DevOps best-practice to routinely change your cloud access credentials and to automate this S3 access key rotation for your S3 Backup feed.
Through your public cloud provider, create a new S3 access key and grant the correct permissions for the target S3 bucket.
Using Swarm's management API, update the access credentials for your existing S3 backup feed.
Upon confirming successful feed operations with the new credentials, expire/remove the old S3 access key.
The following command template demonstrates how to use the Swarm management API to update the access credentials for an existing S3 backup feed:
|
|
|
|
<admin>
— The Swarm administrative user name, which is usuallyadmin
.<password>
— The Swarm administrative password, required for all management API calls that perform actions.<newAccessKeyID>
— The new access key ID for the target S3 bucket.<newSecretAccessKey>
— The new secret access key for the target S3 bucket.<nodeIP>
— The IP address of any Swarm node in the cluster.<s3feedid>
— The small integer feed ID that is associated with the S3 Backup feed. It appears as the feed's ID field in the Swarm UI.
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
Child pages (Children Display) |
---|