Replication Feeds

Types of Replication

What type of replication method you should choose and how to configure it depends on whether you have a legacy Swarm implementation and on your needs for securing replication traffic over untrusted networks. (v10.0)

Secure Replication — Swarm Storage supports remote replication over a WAN, so that replication feeds can operate through Content Gateway. When you define a replication feed, you specify which replication mode to use: either the legacy bidirectional GET method of replication (which you may need for specific application compatibility or network requirements) or the recommended direct POST method, which offers better performance and flow management. With Swarm Storage 10.0 and later, you can implement TLS/SSL security as fits your implementation:

Upload a trusted certificate to Swarm
Replicate to an SSL offloader that services the target cluster
Replicate from a forward proxy on your source cluster.

See Replicating Feeds over Untrusted Networks and Adding a Trusted Certificate to Swarm.

Replication Methods — Below are two replication methods available to you, along with the configuration variants of each that are supported. For best performance, choose direct POST replication, which can go through Gateway. GET replication is the legacy method, which may be needed for application compatibility or networking requirements.

Note

Using the legacy Bidirectional GET for remote replication requires that you populate the Storage configuration setting cluster.proxyIpList for any cluster using a reverse proxy. The setting is a comma-separated list of reverse proxy IP addresses or names, including ports in name:port format. If using Direct POST replication, this setting can be populated or left blank, as it has no effect.

Adding a Replication Feed

To add a feed in the cluster, click the +Add button at the top right the page and then select the Replication button.

When you define a replication feed, set the scope and select which type (Replication Mode) is in force and with what speed (number of concurrent Threads), if you are using direct POST:

The following table describes the data entry fields in the dialog box.

ID (existing feeds)	Read-only; system-assigned identifier
Status (existing feeds)	Read-only; the current feed processing state.
Name	The name you attached to a feed.
Scope	The feed filters that selected for the replication feed. The object is only replicated to the domain(s) indicated in this field. Gateway adminDomain Never create the same domain in two clusters: always create it in the source cluster and then replicate it to the target. A Gateway must use an independent adminDomain, at least temporarily, if the Gateway is in front of the target cluster. (CLOUD-2785) If you choose to replicate specific domains, ensure that the source cluster’s adminDomain is included in the list of replicated domains. Entire source cluster (global) — To replicate all objects in the source cluster, leave the default selection of Entire source cluster (global) Only objects in select domain(s) — To replicate only the objects in one or more domains, select the 'Only objects in select domain(s) option. In the text box that appears, enter one or more domains: To replicate only the objects within a specific domain, enter that domain. To replicate only the objects within multiple domains, enter those domains separated by commas and/or use pattern matching. To exclude domains from replication, enter them. (v10.0) The field value allows pattern matching with the Python regular expression (RE) syntax so that multiple domain names can be matched. The exception to the RE matching is that the "{m,n}" repetitions qualifier may not be used. An example domain list value using RE is: `.\.example\.com` This matches both of these domains: `accounting.example.com, engineering.example.com`. Include objects without a domain* — To replicate any unnamed objects that are not tenanted in any domain, enable the option.
Target Remote Cluster Name	The configuration setting for your target cluster (for example, the `cluster.name` parameter in the .cfg file of the target cluster). configure the Gateway setting `allowSwarmAdminIP` when using Gateway as target. See Gateway Configuration
Proxy or Host(s)	The IP address(es) or host name(s) of either: One or more nodes in the target cluster. A reverse proxy host that routes to the target cluster. To enter two or more node IP addresses, enter each address separated by a comma or spaces.
Port	Defaults to 80. Allows specifying a custom port for the remote cluster.
Replication Mode	Defaults to Direct POST. Choose replication via direct POST (recommended) or bidirectional GET. Switching modes does not require a feed restart. (v9.6) For best performance, choose direct POST replication, which can go through Gateway. GET replication is the legacy method, which may be needed for application compatibility or networking requirements.
Threads	Replication via direct POST only. The default replication speed (6 simultaneous threads) is best for same-sized clusters with minimal replication backlog. When processing backlog, the thread count is per source cluster volume. (v9.6) Reduce the threads to avoid overwhelming a smaller target cluster. For faster replication against a backlog, increase the threads temporarily, monitor bandwidth and cluster performance, as boosting the speed stresses both clusters.
SSL Server	Replication via direct POST only. Defaults to none. If you are replicating over an untrusted network, enable Require trusted SSL; Allow untrusted SSL is available but not intended for production systems. (v10.0)
Remote Admin Name/Password	Inherit from source cluster: Uncheck the enabled box, only if the remote cluster user name is different from the source cluster name in the same realm. Then enter: User/Password credentials The administrative user name of the target cluster. The administrative password of the target cluster.

Propagate Deletes

The legacy option Propagate Deletes is deprecated; it existed to allow setting a target cluster to preserve all deleted objects. This need is now covered by Object Versioning; you can access historical versions of deleted objects to recover content that was deleted by mistake. You can also limit versioning to the target cluster that is serving as your archive, to minimize space usage. (v11.1)

If you have an existing feed that still has this option specified, note these restrictions and behaviors:

With Versioning — Always propagate deletes when using Object Versioning in your cluster.
Without Versioning — The target cluster maintains deleted content carrying no verifiable indication of the deleted status if this option is disabled.

Using Feed Actions

For an existing feed, clicking on it in the Feeds list opens its Feed Settings page, with the existing settings populated. The Actions (gear) icon menu at the top right supports multiple feed actions, appropriate to the type of feed:

Pause / Resume	You may occasionally wish to pause feed processing in order to perform system maintenance. For example, when upgrading an Elasticsearch cluster, pause the search feed before stopping the Elasticsearch service in the search cluster. After completing your system maintenance, return to the action menu and select the Resume action to resume feed processing.
Refresh	Object data is sent to the feed target in near real-time (NRT) as they are written or updated. Any objects that cannot be processed immediately are retried each HP cycle until they succeed, at which point they are marked as complete and are never resent. If a data loss failure occurs on your remote feed target and you cannot restore from backup, select the Refresh option from the feed action menu, which verifies and rehydrates all of the previously sent content to a remote cluster. This process takes some time, as it must revisit all objects in the cluster.
Delete	When you delete a feed, it frees source cluster resources. This process does not affect the objects previously pushed to the remote target. To delete a feed, select the Delete option from the feed action menu and confirm you intend to permanently delete the feed. The deleted feed is removed from the remaining cluster nodes within 60 seconds.
View feed table	Displays the SNMP Repository Dump for the selected node, for feed diagnostics (see below).

Troubleshooting Feeds

Feed diagnostics — To troubleshoot blocked feed, double-click it to open its settings page, click the gear icon, and select View feed table, which displays the SNMP Repository Dump for the selected node. (v2.0)
Review the feedPluginState status to identify the blockage.
Example: feedPluginState blocked: Destination cluster onyx1 reports invalid request: Castor-System-Cluster value must refer to a remote cluster on RETRIEVE request
Idle feeds — A feed can appear to be idle with items still queued for processing. Plan for the fact that feed status reporting is a best-effort snapshot, not a low-latency or guaranteed transaction mechanism.
Feed prioritization — Domain and bucket context objects are prioritized for all types of feeds; this improves usability when you initiate remote sites.
Retries for blocked feeds — Blocked feeds are retried every 20 minutes, but if you change the definition for a blocked feed, it triggers an immediate attempt with the new definition, which may clear the blockage. (v10.1)