Swarm Storage 11.2 Release

New Features

Expanded SEND — SCSP SEND, an admin-only method that allows forcing an object to be written immediately in another cluster, now works with every type of and any number of Swarm feeds: replication, search, and S3 backup. The expanded functionality works through several new query arguments, two to specify which feed IDs or types to target and one to control the timeout (if any) for the request to complete. See SCSP SEND. (SWAR-8441)

Elasticsearch — This release focuses on changes that make it easier to monitor and manage Elasticsearch, Swarm Search, and Swarm Metrics:

The Swarm Search RPM installation now checks and warns if firewalld is enabled, reminding to check the firewall rules for ports 9200 and 9300, which are needed by Elasticsearch. (SWAR-8416)
Swarm dynamically updates DNS lookups after Elasticsearch nodes are restarted. (SWAR-8817)
The Swarm Metrics curator is now independent of HTTP_PROXY and related shell environment variables and so is less subject to disruption. (SWAR-8452)
The Swarm Metrics curator has improved defaults for its logging, increased to 10 logs and up to 10 MB. (SWAR-8401)

This release also includes changes to help with Swarm management and performance:

Swarm now ships with the Prometheus Node Exporter enabled and configured to work by default, to simplify implementation and avoid rebooting. To disable the Node Exporter on a node, set "metrics.enabledNodeExporter=False" in the node's configuration file; to disable across the entire cluster, set metrics.nodeExporterFrequency to 0. (SWAR-8578)
Swarm's inter-process locking process has been reworked, granting a small performance gain for larger clusters and a reduction in related WARNING-level log messages. (SWAR-8835)
Swarm has restored performance for clients who have not yet migrated from legacy authentication/authorization. (SWAR-8810)

Additional Changes

These items are other changes, including those that come from testing and user feedback.

OSS Versions — See Third-Party Components for 11.2 for the complete listing of packages and versions.
Fixed in 11.2.0
- The multipart write 202 response now includes Location headers of the resulting manifests that are analogous to the Location headers of a normal EC write. (SWAR-8886)
- Resolved an error in the assessment of licensed space usage that prevented a node from accepting writes. (SWAR-8869)
- Resolved an issue related to TCP window sizes that can cause socket disconnects, pauses, and hangs. (SWAR-8847)
- Resolved an issue that can lead to a node crash in large clusters. (SWAR-8832)
- Basic auth of the admin user for special administrative SCSP requests did not correctly handle a stored hashed admin password. (SWAR-8814)
- Infrequent WARNING messages, "Node/Volume entry not published due to lock contention (...); action will be retried," may appear in logs. (SWAR-8802)
- Resolved an issue causing rebooted Swarm nodes to allow client requests before mounting all volumes. (SWAR-8801)

Upgrade Impacts

Use the supported versions of Swarm components for the target version of Elasticsearch:

Elasticsearch 6.8.6	Swarm Storage 11.1 - 11.2	Gateway 6.3	SwarmFS 2.4	Recommended configuration.
Elasticsearch 5.6.12	Swarm Storage 10.0 - 11.2	Gateway 6.0 - 6.3	SwarmFS 2.4	Plan to migrate to Elasticsearch 6. Support for earlier versions is ending.
Elasticsearch 2.3.3	Swarm Storage 9.6 - 11.2	Gateway 5.4	SwarmFS 2.1

These items are changes to the product function that may require operational or development changes for integrated applications. Address the upgrade impacts for each of the versions since the currently running version:

Impacts for 11.2

Upgrading Elasticsearch — Elasticsearch 5.6.12/2.3.3 may be used with Storage 11 if move to ES 6 cannot be performed immediately, but start the migration now (see Migrating from Older Elasticsearch). Support for ES 5.6.12/2.3.3 ends in a future release, and testing for 2.3.3 with Swarm 11 is discontinued. Important: Always upgrade Swarm Search and Metrics at the same time upgrading ES. Do not run an ES 5 Search or Metrics Curator against ES 6.
Rolling upgrade — During a rolling upgrade, the mixed state in Swarm versions among nodes may cause errors in the Swarm UI (and in management API calls). Use the legacy Admin Console (port 90) to monitor the rolling upgrade. (SWAR-8716)
Settings changes — These settings are new with this release:
- scsp.defaultFeedSendTimeout, (default 30 seconds) a non-persisted node-level setting that sets the timeout on a feed SEND request, if the timeout=true query argument is provided. (SWAR-8441).
- chassis.name, (default blank), a node-level setting that stores a user-defined chassis name. (SWAR-8823)
Differences in scsp.forceLegacyNonce configuration depending on the version upgrading from (SWAR-9020):
If currently running a Swarm Storage version prior to 11.1, and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:
Before upgrading, set scsp.forceLegacyNonce=true in the node.cfg file. After the upgrade, when the cluster is fully up, update scsp.forceLegacyNonce=false using swarmctl and change scsp.forceLegacyNonce=false in the node.cfg file.
If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:
Before upgrading, verify that scsp.forceLegacyNonce=false is in the node.cfg file and verify using swarmctl that scsp.forceLegacyNonce=false in the cluster.
Use swarmctl to check or change settings

Use 'swarmctl -C scsp.forceLegacyNonce' to check the value of scsp.forceLegacyNonce.
Use 'swarmctl -C scsp.forceLegacyNonce -V False' to set the value to false.
For more details, see https://support.cloud.caringo.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.

Impacts for 11.1

Upgrading Elasticsearch: Use Elasticsearch 5.6.12/2.3.3 with Storage 11.1 if moving to ES 6 immediately is not possible, but start the migration now (see Migrating from Older Elasticsearch). Support for ES 5.6.12/2.3.3 ends in a future release, and testing for 2.3.3 with Swarm 11 is discontinued. Important: Always upgrade Swarm Search and Metrics at the same time ES is upgraded. Do not run an ES 5 Search or Metrics Curator against ES 6.
Swarm Search and Metrics: This release includes new versions of Swarm Search and Metrics RPMs. Both require Python 3 to be installed on the ES servers they run on.
- For Swarm Metrics on RHEL/CentOS 7.7, first install this dependency: yum install epel-release
Python 3: Install Python 3 if is not automatically installed with RHEL/CentOS 7.
Propagate Delete Removed: For Replication Feeds, the Propagate Deletes option is removed from the legacy Admin Console and the Management API (propagateDeletes, nodeletes fields). (SWAR-8609, SWAR-8615)
Swarm Configuration: Run the Storage Settings Checker before upgrading to this version, to identify configuration issues.
- The Storage Settings Checker now requires Python 3 to be installed. (SWAR-8742)
- crier.deadVolumeWall has been unpublished for reimplementation. (SWAR-8640)
S3 Backup Restore: The S3 Backup Restore Tool has been migrated to Python 3.6. If the tool is installed, uninstall it and install the new version. (SWAR-8703)
Upgrade Process: During the upgrade to 11.1, it may not be possible to monitor the cluster via the Swarm UI. Workaround: Use the legacy Admin Console (port 90) during upgrade. (SWAR-8716)
Differences in scsp.forceLegacyNonce configuration depending on the version being upgraded from (SWAR-9020):
If currently running a Swarm Storage version prior to 11.1 and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:
Before upgrading, set scsp.forceLegacyNonce=true in the node.cfg file. After the upgrade, when the cluster is fully up, update scsp.forceLegacyNonce=false using swarmctl and change scsp.forceLegacyNonce=false in the node.cfg file.
If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:
Before upgrading, verify scsp.forceLegacyNonce=false is in the node.cfg file and verify using swarmctl that scsp.forceLegacyNonce=false in the cluster.

Use swarmctl to Check or Change Settings

Use 'swarmctl -C scsp.forceLegacyNonce' to check the value of scsp.forceLegacyNonce.

Use 'swarmctl -C scsp.forceLegacyNonce -V False' to set the value to false.

For more details, see https://support.cloud.caringo.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.

Impacts for 11.0

Upgrading Elasticsearch: You may use Elasticsearch 2.3.3 with Storage 11.0 if you cannot move to 5.6 now, but plan the migration immediately (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release, and testing with Swarm 11 is discontinuing.
Propagate Deletes Deprecated: The option to disable Propagate Deletes on Replication Feeds is deprecated; use Object Versioning to preserve deleted content. Do not disable Propagate Deletes when versioning is enabled or when defining an S3 Backup. (SWAR-8609)
Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.
- Changed settings:
  - ec.segmentConsolidationFrequency (ecSegmentConsolidationFrequency in SNMP) has an improved default (10), which you must apply to your cluster when you upgrade. (SWAR-8483)
  - cluster.name is now required. Add it to the cluster.cfg file. (SWAR-8466).
  - metrics.nodeExporterFrequency (metricsExporterFrequency in SNMP) is now a persisted cluster setting. (SWAR-8467).
- Removed settings:
  - chassis.processes is allowed but is ignored.
- Numerous settings are now promoted to cluster-level (versus node-level) scope, so you can manage them via Settings > Cluster in the Swarm UI (SWAR-8457):
  - console.expiryErrInterval
  - console.expiryWarnInterval
  - console.indexErrorLevel
  - console.indexWarningLevel
  - console.port
  - console.reportStyleUrl
  - console.spaceErrorLevel
  - console.spaceWarnLevel
  - console.styleUrl
  - feeds.retry
  - feeds.statsReportInterval
  - health.parallelWriteTimeout
  - log.obscureUUIDs
  - metrics.enableNodeExporter
  - network.dnsDomain
  - network.dnsServers
  - network.icmpAcceptRedirects
  - network.igmpVersion
  - network.mtu
  - startup.certificates

Impacts for 10.2

Upgrading Elasticsearch — You may continue to use Elasticsearch 2.3.3 with Storage 10.2 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release. the upgrade to Elasticsearch 5.6 must be completed before upgrading to Gateway 6.0.
Configuration Settings — Run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues. Note these changes:
- ec.protectionLevel is now persisted. (SWAR-8231)
- index.ovMinNodes=3 is the new default for the overlay index, in support of Swarm 10's new architecture. To keep your overlay index operational, set this new value in your cluster, through the UI or by SNMP (overlayMinNodes). (SWAR-8278)
- metrics.enableNodeExporter can be set to True, which enables the Prometheus Node Exporter on that node. (SWAR-8408, SWAR-8578)
- metrics.nodeExporterFrequency, a new dynamic setting, sets how frequently to refresh Swarm-specific Prometheus metrics in Elasticsearch; it defaults to 0, which disables this export. (SWAR-8408).

Impacts for 10.1

Upgrading Elasticsearch — Continue to use Elasticsearch 2.3.3 with Storage 10.1 until able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release. Complete the upgrade to Elasticsearch 5.6 before upgrading to Gateway 6.0.
Configuration Settings — Run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues.
- metrics.enableNodeExporter=true enables Swarm to run the Prometheus node exporter on port 9100. (SWAR-8170)
IP address update delay — When upgrading from Swarm 9 to the new architecture of Swarm 10, note the "ghosts" of previously used IP addresses may appear in the Storage UI; these resolve within 4 days. (SWAR-8351)
Update MIBs on CSN — Before upgrading to Storage 10.x, the MIBs on the CSN must be updated. From the Swarm Support tools bundle, run the platform-update-mibs.sh script. (CSN-1872)

Impacts for 10.0

Upgrading Elasticsearch: You may continue to use Elasticsearch 2.3.3 with Storage 10.0 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release.
Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.
- Changes for the new single-IP dense architecture:
  - network.ipAddress - multiple IP addresses now disallowed
  - chassis.processes - removed; multi-server configurations are no longer supported
  - ec.protectionLevel - new value "volume"
  - ec.subclusterLossTolerance - removed
- Changes for security (see next section)
  - security.administrators, security.operators - removed 'snmp' user
  - snmp.rwCommunity, snmp.roCommunity - new settings for 'snmp' user
  - startup.certificates - new setting to hold any and all public keys
- New settings:
  - disk.atimeEnabled
  - health.parallelWriteTimeout
  - search.pathDelimiter
Required SNMP Security Change: Remove the snmp key from the security.administrators setting, and update snmp.rwCommunity with its value. Nodes that contain only the snmp key in the security.administrators setting does not boot. If you changed the default value of the snmp key in the security.operators setting, update snmp.roCommunity with that value and then remove the snmp key from security.operators. In the security.operators setting, 'snmp' is a reserved key, and it cannot be an authorized console operator name. (SWAR-8097)
EC Protection
- Best practice: Use ec.protectionLevel=node, which distributes segments across the cluster's physical/virtual machines. Do not use ec.protectionLevel=subcluster unless you already have subclusters defined and are sure the specified EC encoding is supported. A new level, ec.protectionLevel=volume, allows EC writes to succeed if you have a small cluster with fewer than (k+p)/p nodes. (Swarm always seeks the highest protection possible for EC segments, regardless of the level you set.)
- Optimize hardware for EC by verifying there are more than k+p subclusters/nodes (as set by ec.protectionLevel); for example, with policy.ecEncoding=5:2, you need at least 8 subclusters/nodes. When Swarm cannot distribute EC segments adequately for protection, EC writes can fail despite ample free space. (SWAR-7985)
- Setting ec.protectionLevel=subcluster without creating subclusters (defining node.subcluster across sets of nodes) causes a critical error and lowers the protection level to 'node'. (SWAR-8175)
Small Clusters: Verify the following settings if using 10 or fewer Swarm nodes. Do not use fewer than 3 in production.
Important: If you need to change any, do so before upgrading to Swarm 10.
- policy.replicas: The min and default values for numbers of replicas to keep in your cluster must not exceed your number of nodes. For example, a 3-node cluster may have only min=2 or min=3.
- EC Encoding and Protection: For EC encoding, verify you have enough nodes to support the cluster's encoding (policy.ecEncoding). For EC writes to succeed with fewer than (k+p)/p nodes, use the new level, ec.protectionLevel=volume.
- Best Practice: Keep at least one physical machine in your cluster beyond the minimum number needed. This allows for one machine to be down for maintenance without compromising the constraint.
Cluster in a Box: Swarm supports a "cluster in a box" configuration as long as that box is running a virtual machine host and Swarm instances are running in 3 or more VMs. Each VM boots separately and has its own IP address. Follow the recommendations for small clusters, substituting VMs for nodes. If you have two physical machines, use the "cluster in a box" configuration, but move to direct booting of Swarm with 3 or more.
Offline Node Status: Because Swarm 10's new architecture reduces the number of IP addresses in your storage cluster, you may see the old IPs and subclusters reporting as Offline nodes until they timeout in 4 days (crier.forgetOfflineInterval), which is expected.

Info

The Multipath support is obselete from Swarm 10 onward.

Watch Items and Known Issues

The following operational limitations and watch items exist in this release.

If a node mounts an encrypted volume that is missing the encryption key in the configuration, the node fails to mount all of the disks in the node. (SWAR-8762)
S3 Backup feeds do not yet backup logical objects larger than 5 GB. (SWAR-8554)
If downgrading from Swarm 11.0, CRITICAL errors may appear on the feeds. To stop the errors, edit the existing feed definition names via the Swarm UI or legacy Admin Console. (SWAR-8543)
When restarting a cluster of virtual machines that are UEFI-booted (versus legacy BIOS), the chassis shut down but do not come back up. (SWAR-8054)
If wiping the Elasticsearch cluster, the Storage UI shows no NFS config. Contact DataCore Support for help repopulating the SwarmFS config information. (SWAR-8007)
If bucket is deleted, any incomplete multipart upload into that bucket leaves the parts (unnamed streams) in the domain. To find and delete them, use the s3cmd utility (search the Support site for "s3cmd" for guidance). (SWAR-7690)
Logs showed the error "FEEDS WARNING: calcFeedInfo(etag=xxx) cannot find domain xxx, which is needed for a domains-specific replication feed". The root cause is fixed; if such warnings are received, contact DataCore Support so the issue can be resolved. (SWAR-7556)
With multipath-enabled hardware, the Swarm console Disk Volume Menu may erroneously show too many disks, having multiplied the actual disks in use by the number of possible paths to them. (SWAR-7248)

Note these installation issues:

The elasticsearch-curator package may show an error during an upgrade, which is a known curator issue. Workaround: Reinstall the curator: yum reinstall elasticsearch-curator (SWAR-7439)
Do not install the Swarm Search RPM before installing Java. If Gateway startup fails with "Caringo script plugin is missing from indexer nodes", uninstall and reinstall the Swarm Search RPM. (SWAR-7688)

Upgrading Swarm

Proceed to How to Upgrade Swarm to upgrade Swarm 9 or higher.

Important

Contact DataCore Support for guidance if needing to migrate from Swarm 8.x or earlier.