Swarm Storage 11.0 Release

New Features

S3 Backup and Restore: In addition to on-premises Swarm storage and remote clusters, you can now take advantage of public cloud services for off-premises disaster recovery (DR) storage. Amazon S3 has the widest support in the industry, and Swarm Content Gateway already supports S3, so S3 is the first cloud destination from Swarm. By implementing an S3 backup feed from Swarm, you have the security of knowing backups are continuous, have minimal latency, and require little intervention and monitoring by you. Using Swarm's feed mechanism for backup leverages numerous existing strengths: its long-term iteration over objects in the cluster, proven method for tracking work as it is performed, and mechanisms for TLS connections and forward proxies. Having the parallelism of the entire cluster makes best use of your network bandwidth, while sending the backups through a forward proxy enables bandwidth throttling.

Faster Volume Mounting: Due to re-engineering of disk mounting and common disk operations, Swarm 11 has a 30% improvement in volume mount times over previous versions. (SWAR-7957)

Prometheus Node Exporter: To make your Prometheus node exporter metrics named for global uniqueness and also ease of identification, Swarm now prefixes the Prometheus node exporter metrics with 'caringo_swarm_' instead of 'metrics_'. (SWAR-8539) In addition, the setting metrics.nodeExporterFrequency is now a persisted cluster setting with MIB name metricsExporterFrequency. See Prometheus Node Exporter and Grafana. (SWAR-8467)

System Status on Console: On the System Menu accessed from the physical console of a Swarm node, the Diagnostics Menu has additional functionality for viewing system status. The new options include Systemd Unit Status, Systemd journal, and Top processes list. (SWAR-3412)

Improved Memory Management: Swarm 11 includes changes for better memory management in low memory situations. In Swarm 10, insufficient memory on a node for all volumes being managed causes Swarm to reboot; with these improvements, rebooting is less likely. Verify each node meets a minimum physical memory of 2 GB + (0.5 GB * number of volumes) for best results. More memory benefits Swarm's performance. (SWAR-8558)

Container-Compatible: The architecture work of Swarm 10 continues with build-out of support for containerization, so Swarm storage nodes can now be managed in containers.

Large Cluster Performance: This release includes performance improvements for very large clusters, which benefits clusters of all sizes. (11.0.1: SWAR-8616)

Additional Changes

These items are other changes and improvements including those that come from testing and user feedback.

OSS Versions

See Third-Party Components for 11.0 for the complete listing of packages and versions.

  • The Linux kernel is upgraded to 4.19.56, which mitigates Linux Sack vulnerability. (SWAR-8534)

  • Linux firmware is upgraded to 1.179. (SWAR-8341)

  • Numerous network drivers are updated, including bnx2, bnx2x, ixgbe, and i40; see the complete listing for variants and versions. (SWAR-8341)

Fixed in 11.0.3

  • A kernel configuration issue prevented the discovery of ATA disks attached to an SAS controller. (SWAR-8663)

Fixed in 11.0.2

  • Improved: When Swarm completes a retire task, the announce-level message it generates now reports the overall duration and rate of the retire. (SWAR-8633)

  • The health processor does not always clear memory of replicas on long-removed volumes, which caused periodic FVRs. (SWAR-8639)

  • Swarm 11.0.0 showed an incorrect value (11.0.0.rc8) for its build revision. (SWAR-8627)

  • When recoveries of specific volumes are suspended by SNMP or API calls, those recoveries still appear to be running. (SWAR-8604)

  • The health processor state (healthProcessorState in SNMP) sometimes showed "idle" when health processing was paused for failed volume recoveries (FVRs). (SWAR-8601)

  • Retiring volumes are reported as available space even though they cannot be written to. (SWAR-7865)

  • Under some conditions, Swarm may start without mounting some of its volumes. (SWAR-8597)

Fixed in 11.0.0

  • The node console's system menu can be obscured by stray text from the boot process. (SWAR-8591)

  • A dmesg dump (on the Chassis Details page or the legacy Admin Console) may be missing some or all driver messages. (SWAR-8573)

  • Although the bucket existed, erroneous CRITICAL messages may report that "Bucket (uuid=...) in domain '...' has been deleted with orphan content." (SWAR-8560)

  • Too many replicas of context objects (buckets and domains) caused error messages about being unable to index objects. After upgrading, these messages stop once several HP cycles are able to complete. (SWAR-8555)

  • The OS in 10.2.1 cannot mount USB flash drives and so cannot read node.cfg files from them. (SWAR-8501)

  • Swarm now prevents and removes any overage caused by erroneous remote replication of EC streams via a replication feed, which can double the space usage. (SWAR-8439)

  • While a node is down for maintenance, erroneous CRITICAL errors may report that EC objects have insufficient protection. (SWAR-8421)

  • Swarm returns a 410 Gone response (instead of 412 Precondition Failed) for unrecoverable multipart upload requests. (SWAR-8343)

  • On getting new capacity, fuller clusters are slow to rebalance over the available volumes. (SWAR-8116)

Upgrade Impacts

These items are changes to the product function that may require operational or development changes for integrated applications. Address the upgrade impacts for each of the versions since the one you are currently running:

Impacts for 11.0

  • Upgrading Elasticsearch: You may use Elasticsearch 2.3.3 with Storage 11.0 if you cannot move to 5.6 now, but plan the migration immediately (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release, and testing with Swarm 11 is discontinuing.

  • Propagate Deletes Deprecated: The option to disable Propagate Deletes on Replication Feeds is deprecated; use Object Versioning to preserve deleted content. Do not disable Propagate Deletes when versioning is enabled or when defining an S3 Backup. (SWAR-8609)

  • Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.

    • Changed settings:

      • ec.segmentConsolidationFrequency (ecSegmentConsolidationFrequency in SNMP) has an improved default (10), which you must apply to your cluster when you upgrade. (SWAR-8483)

      • cluster.name is now required. Add it to the cluster.cfg file. (SWAR-8466).

      • metrics.nodeExporterFrequency (metricsExporterFrequency in SNMP) is now a persisted cluster setting. (SWAR-8467).

    • Removed settings:

      • chassis.processes is allowed but is ignored.

    • Numerous settings are now promoted to cluster-level (versus node-level) scope, so you can manage them via Settings > Cluster in the Swarm UI (SWAR-8457):

      • console.expiryErrInterval

      • console.expiryWarnInterval

      • console.indexErrorLevel

      • console.indexWarningLevel

      • console.port

      • console.reportStyleUrl

      • console.spaceErrorLevel

      • console.spaceWarnLevel

      • console.styleUrl

      • feeds.retry

      • feeds.statsReportInterval

      • health.parallelWriteTimeout

      • log.obscureUUIDs

      • metrics.enableNodeExporter

      • network.dnsDomain

      • network.dnsServers

      • network.icmpAcceptRedirects

      • network.igmpVersion

      • network.mtu

      • startup.certificates

For Swarm 9 impacts, see Swarm Storage 9 Releases.

Watch Items and Known Issues

The following operational limitations and watch items exist in this release.

  • When using ES 5.6, deprecation warnings can cause logs to consume excessive disk space. Workaround: To exclude the warnings, add 'logger.deprecation.level = error' to the top of the log4j2.properties file. (SWAR-8632)

  • Swarm 11.0.0 shows an incorrect value (11.0.0.rc8) for its build revision. (SWAR-8627)

  • Under some conditions, Swarm may start without mounting some of its volumes. If this happens, reboot the node. (SWAR-8597)

  • S3 Backup feeds do not back up logical objects greater than 5 GB. (SWAR-8554)

  • If you downgrade from Swarm 11.0, CRITICAL errors may appear on your feeds. To stop the errors, edit the existing feed definition names via the Swarm UI or legacy Admin Console. (SWAR-8543)

  • When restarting a cluster of virtual machines that are UEFI-booted (versus legacy BIOS), the chassis shut down but do not come back up. (SWAR-8054)

  • If the Elasticsearch cluster is wiped, the Storage UI shows no NFS config. Contact DataCore Support for help repopulating your SwarmFS config information. (SWAR-8007)

  • If a bucket is deleted, any incomplete multipart upload into that bucket leaves the parts (unnamed streams) in the domain. To find and delete them, use the s3cmd utility (search the Support site for "s3cmd" for guidance). (SWAR-7690)

  • Logs showed the error "FEEDS WARNING: calcFeedInfo(etag=xxx) cannot find domain xxx, which is needed for a domains-specific replication feed". The root cause is fixed; if you received such warnings, contact DataCore Support so the issue can be resolved. (SWAR-7556)

Note these installation issues:

  • The elasticsearch-curator package may show an error during an upgrade, which is a known curator issue. Workaround: Reinstall the curator: yum reinstall elasticsearch-curator (SWAR-7439)

  • Do not install the Swarm Search RPM before installing Java. If Gateway startup fails with the "Caringo script plugin is missing from indexer nodes", uninstall and reinstall the Swarm Search RPM. (SWAR-7688)

Upgrading Swarm

Proceed to How to Upgrade Swarm to upgrade Swarm 9 or higher.

Important

Contact DataCore Support for guidance if needing to upgrade from Swarm 8.x or earlier.

 

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.