Overview

Swarm Hybrid Cloud provides the capability to copy objects from Swarm to a target, such as S3 or Azure cloud storage. Native cloud services and/or applications running on utility computing work with data directly from the target cloud storage. Similarly, use either S3 or Azure to pull data from cloud storage to Swarm.

Capabilities

Info

Each object from the focus dataset is copied to or from the cloud and does not move to or from the cloud. The objects remain on the source for user discretion. Each object remains on the Swarm namespace as the authoritative copy and remains searchable in the Swarm namespace after the copy. The object’s data is processed on the cloud and can be repatriated to Swarm with Content Portal 7.7 and later, if desired.

Info

Break up the task to run on a folder or subset of objects via strict collection search if a copy of over 50,000 objects is failing.

Future releases will provide additional capabilities.

Prerequisites

Usage

Swarm Hybrid Cloud feature is accessible to clients via Content UI. Clients need to select a specific dataset to copy to the cloud, which can be either a collection, a bucket, or a folder within a bucket. Provide the remote bucket or container details, e.g., endpoint, access key, and secret key, etc. Results for each object are provided in the source bucket as a status file. The focus dataset is defined shortly after the job is triggered and is not redefined during execution. Use the generated dictionary and log files to review the job.

Workflow

Content UI

Hybrid Cloud helps in replicating the focus dataset, therefore, the client needs two environments:

Copy all the data from the source path, or subset of the source path defined by a focus dataset, to the destination. It is applied at the bucket level and the job creation is initiated from a Swarm bucket-scoped view.

Create a job at the bucket level to start copying the focus dataset to the target storage.

Whether or not the focus dataset was copied successfully, there are two or more files created after the job submission:

Each file is important and provides information about the hybrid cloud job. The format of these may change in future releases. The Manifest and Log files are overwritten if the same job name is used from a previous run, so save the files or use a different job name if this is not desired.

Tip

If required, renaming of a single or all support files is possible after the copying has started. Any log file updates continue under the old name.

Replicating Data to/or from the Remote Side

Refer to the following steps to replicate the focus dataset:

  1. Navigate to the Swarm UI bucket or collection to copy.

  2. Click Actions (three gears icon) and select either “Push to cloud” or “Pull from cloud” (formerly Copy to S3). Select S3 or Azure depending on the remote endpoint.


    A modal presents a form, with required fields marked with asterisks (*) as shown in the example below:

  3. Click Begin Copy. This button is enabled once all required text fields are filled.

Result Analysis

The push and pull operations generate support objects (manifest, dictionary object, log, and result). All objects use the given job name as a prefix but are appended with separate suffixes. The duration of the job depends on the size of the job (the count of objects and the total number of bytes to be transferred). Download and open the latest copy of the status log to monitor the status of the job.

Early Failure

Early failures are reported by the user interface as “Failed to submit” system errors.

For early failures, verify the following with the Gateway server setup:

Late Failures

Failures may occur after the job is successfully submitted.

A job has not completed processing until the result summary object of JSON format is generated, and that contains the overall processing result. A failure here may point to an incorrectly formatted endpoint or wrong or expired access keys. Note that a job can complete successfully despite individual object failures. Individual object results appear in the separate log file object. Late failures can be further debugged through the Gateway log.