Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 23 Next »

New Features

Performance — This release of Swarm Storage enhances both memory management and cluster performance:

  • Swarm cluster startup has been optimized to guarantee the fastest sequencing. Now volume mounting must complete and the persistent settings must be processed before any needed recovery activities can commence. (SWAR-8911)

  • Swarm nodes shut down faster, allowing for quicker rebooting of Swarm clusters. (SWAR-8891)

  • Swarm nodes with limited physical memory can now respond better under high client loads. (SWAR-8870)

  • Swarm's memory management has been improved, which enables higher loads for client writes. (SWAR-8816)

Stability — This release also includes changes that improve Swarm stability and administration:

  • Better handling of newly added hotplug volumes results in clients receiving fewer 503 Service Unavailable responses. (SWAR-8887)

  • HP cycles now cleanse all traces of removed volumes from the cluster, greatly reducing the chance that recovery can be started erroneously for a volume already recovered. (SWAR-8836)

  • Reworking of cluster operations has reduced spurious "Cannot contact node" announcements during maintenance rebooting of multiple nodes. (SWAR-8848)

  • When secure logging (security.secureLogging) is enabled, Swarm removes more sensitive information from AUDIT-level messages. (SWAR-8790)

Additional Changes

These items are other changes, including those that come from testing and user feedback.

  • OSS Versions — See Third-Party Components for 11.3 for the complete listing of packages and versions.

  • Fixed in 11.3.0

    • Drive light plug-in control is restored for hardware in mpt3sas enclosures, including Western Digital Ultrastar Serv60. (SWAR-8934)

    • For some feed statistics, feed accounting resets and requires a reboot to correct the statistic. (SWAR-8854)

Upgrade Impacts

Use the supported versions of Swarm components if running an older version of Elasticsearch:

Elasticsearch 6.8.6

Swarm Storage 11.1 - 11.3

Gateway 6.3

SwarmFS 2.4

Recommended configuration.

Elasticsearch 5.6.12

Swarm Storage 10.0 - 11.3

Gateway 6.0 - 6.3

SwarmFS 2.4

Plan to migrate to Elasticsearch 6.
Support for earlier versions is ending.

Elasticsearch 2.3.3

Swarm Storage 9.6 - 11.3

Gateway 5.4

SwarmFS 2.1

These items are changes to the product function that may require operational or development changes for integrated applications. Address the upgrade impacts for each of the versions since the version currently running:

Impacts for 11.3

  • Upgrading Elasticsearch — Use Elasticsearch 5.6.12/2.3.3 with Storage 11 if moving to ES 6 immediately is not possible, but start the migration now (see Migrating from Older Elasticsearch). Support for ES 5.6.12/2.3.3 ends in a future release, and testing for 2.3.3 with Swarm 11 is discontinued. Important: Always upgrade Swarm Search and Metrics at the same time ES is upgrade . Do not run an ES 5 Search or Metrics Curator against ES 6.

  • Rolling upgrade — During a rolling upgrade from a version older than 11.1, the mixed state in Swarm versions among nodes may cause errors in the Swarm UI (and in management API calls). Use the legacy Admin Console (port 90) to monitor the rolling upgrade. (SWAR-8716)

  • Settings changes — The setting health.parallelWriteTimeout, which was disabled by default, now defaults to 1 month. It sets when to time out an uncompleted multipart upload, triggering clean up of the unused parts. Do not disable (0) if using SwarmFS. (SWAR-8902)

  • Encryption-at-rest —If upgrading from Swarm 11.0 or earlier and encryption-at-rest is used, contact DataCore Support to verify a roll back to the prior version is possible, if needed. (SWAR-8941)

  • Differences in scsp.forceLegacyNonce configuration depending on the version being upgraded from (SWAR-9020):

  • If currently running a Swarm Storage version prior to 11.1, and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:

    Before upgrading, set scsp.forceLegacyNonce=true in the node.cfg file. After the upgrade, when the cluster is fully up, update scsp.forceLegacyNonce=false using swarmctl and change scsp.forceLegacyNonce=false in the node.cfg file.

    If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:

    Before upgrading, verify the scsp.forceLegacyNonce=false is in the node.cfg file and verify using swarmctl that scsp.forceLegacyNonce=false in the cluster.

    Use swarmctl to check or change settings

    Use 'swarmctl -C scsp.forceLegacyNonce' to check the value of scsp.forceLegacyNonce.

    Use 'swarmctl -C scsp.forceLegacyNonce -V False' to set the value to false.

    For more details, see https://support.cloud.caringo.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.


Impacts for 11.2

  • Upgrading Elasticsearch: Elasticsearch 5.6.12/2.3.3 may be used with Storage 11 if move to ES 6 cannot be performed immediately, but start the migration now (see Migrating from Older Elasticsearch). Support for ES 5.6.12/2.3.3 ends in a future release, and testing for 2.3.3 with Swarm 11 is discontinued. 

Important

Always upgrade Swarm Search and Metrics at the same time upgrading ES. Do not run an ES 5 Search or Metrics Curator against ES 6.

  • Rolling Upgrade: During a rolling upgrade, the mixed state in Swarm versions among nodes may cause errors in the Swarm UI (and in management API calls). Use the legacy Admin Console (port 90) to monitor the rolling upgrade. (SWAR-8716)

  • Settings Changes - These settings are new with this release:

    • scsp.defaultFeedSendTimeout, (default 30 seconds) a non-persisted node-level setting that sets the timeout on a feed SEND request, if the timeout=true query argument is provided. (SWAR-8441).

    • chassis.name, (default blank), a node-level setting that stores a user-defined chassis name. (SWAR-8823)

  • Differences in scsp.forceLegacyNonce configuration depending on the version upgrading from (SWAR-9020):

  • If currently running a Swarm Storage version prior to 11.1, and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:

    Before upgrading, set scsp.forceLegacyNonce=true in the node.cfg file. After the upgrade, when the cluster is fully up, update scsp.forceLegacyNonce=false using swarmctl and change scsp.forceLegacyNonce=false in the node.cfg file.

    If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:

    Before upgrading, verify scsp.forceLegacyNonce=false is in the node.cfg file and verify using swarmctl that scsp.forceLegacyNonce=false in the cluster.

Use swarmctl to Check or Change Settings

Use 'swarmctl -C scsp.forceLegacyNonce' to check the value of scsp.forceLegacyNonce.

Use 'swarmctl -C scsp.forceLegacyNonce -V False' to set the value to false.

For more details, see https://support.cloud.caringo.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.

Impacts for 11.1

  • Upgrading Elasticsearch: Use Elasticsearch 5.6.12/2.3.3 with Storage 11.1 if moving to ES 6 immediately is not possible, but start the migration now (see Migrating from Older Elasticsearch). Support for ES 5.6.12/2.3.3 ends in a future release, and testing for 2.3.3 with Swarm 11 is discontinued. Important: Always upgrade Swarm Search and Metrics at the same time ES is upgraded. Do not run an ES 5 Search or Metrics Curator against ES 6.

  • Swarm Search and Metrics: This release includes new versions of Swarm Search and Metrics RPMs. Both require Python 3 to be installed on the ES servers they run on.

    • For Swarm Metrics on RHEL/CentOS 7.7, first install this dependency: yum install epel-release

  • Python 3: Install Python 3 if is not automatically installed with RHEL/CentOS 7.

  • Propagate Delete Removed: For Replication Feeds, the Propagate Deletes option is removed from the legacy Admin Console and the Management API (propagateDeletes, nodeletes fields). (SWAR-8609, SWAR-8615)

  • Swarm Configuration: Run the Storage Settings Checker before upgrading to this version, to identify configuration issues.

    • The Storage Settings Checker now requires Python 3 to be installed. (SWAR-8742) 

    • crier.deadVolumeWall has been unpublished for reimplementation. (SWAR-8640)

  • S3 Backup Restore: The S3 Backup Restore Tool has been migrated to Python 3.6. If the tool is installed, uninstall it and install the new version. (SWAR-8703) 

  • Upgrade Process: During the upgrade to 11.1, it may not be possible to monitor the cluster via the Swarm UI. Workaround: Use the legacy Admin Console (port 90) during upgrade. (SWAR-8716)

  • Differences in scsp.forceLegacyNonce configuration depending on the version being upgraded from (SWAR-9020):

  • If currently running a Swarm Storage version prior to 11.1 and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:

    Before upgrading, set scsp.forceLegacyNonce=true in the node.cfg file. After the upgrade, when the cluster is fully up, update scsp.forceLegacyNonce=false using swarmctl and change scsp.forceLegacyNonce=false in the node.cfg file.

    If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:

    Before upgrading, verify scsp.forceLegacyNonce=false is in the node.cfg file and verify using swarmctl that scsp.forceLegacyNonce=false in the cluster.

Use swarmctl to Check or Change Settings

Use 'swarmctl -C scsp.forceLegacyNonce' to check the value of scsp.forceLegacyNonce.

Use 'swarmctl -C scsp.forceLegacyNonce -V False' to set the value to false.

For more details, see https://support.cloud.caringo.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.

Impacts for 11.0

  • Upgrading Elasticsearch: You may use Elasticsearch 2.3.3 with Storage 11.0 if you cannot move to 5.6 now, but plan the migration immediately (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release, and testing with Swarm 11 is discontinuing.

  • Propagate Deletes Deprecated: The option to disable Propagate Deletes on Replication Feeds is deprecated; use Object Versioning to preserve deleted content. Do not disable Propagate Deletes when versioning is enabled or when defining an S3 Backup. (SWAR-8609)

  • Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.

    • Changed settings:

      • ec.segmentConsolidationFrequency (ecSegmentConsolidationFrequency in SNMP) has an improved default (10), which you must apply to your cluster when you upgrade. (SWAR-8483)

      • cluster.name is now required. Add it to the cluster.cfg file. (SWAR-8466).

      • metrics.nodeExporterFrequency (metricsExporterFrequency in SNMP) is now a persisted cluster setting. (SWAR-8467).

    • Removed settings:

      • chassis.processes is allowed but is ignored.

    • Numerous settings are now promoted to cluster-level (versus node-level) scope, so you can manage them via Settings > Cluster in the Swarm UI (SWAR-8457):

      • console.expiryErrInterval

      • console.expiryWarnInterval

      • console.indexErrorLevel

      • console.indexWarningLevel

      • console.port

      • console.reportStyleUrl

      • console.spaceErrorLevel

      • console.spaceWarnLevel

      • console.styleUrl

      • feeds.retry

      • feeds.statsReportInterval

      • health.parallelWriteTimeout

      • log.obscureUUIDs

      • metrics.enableNodeExporter

      • network.dnsDomain

      • network.dnsServers

      • network.icmpAcceptRedirects

      • network.igmpVersion

      • network.mtu

      • startup.certificates

Impacts for 10.2

  • Upgrading Elasticsearch — You may continue to use Elasticsearch 2.3.3 with Storage 10.2 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release. the upgrade to Elasticsearch 5.6 must be completed before upgrading to Gateway 6.0.

  • Configuration Settings — Run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues. Note these changes:

    • ec.protectionLevel is now persisted. (SWAR-8231)

    • index.ovMinNodes=3 is the new default for the overlay index, in support of Swarm 10's new architecture. To keep your overlay index operational, set this new value in your cluster, through the UI or by SNMP (overlayMinNodes). (SWAR-8278)

    • metrics.enableNodeExporter can be set to True, which enables the Prometheus Node Exporter on that node. (SWAR-8408, SWAR-8578)

    • metrics.nodeExporterFrequency, a new dynamic setting, sets how frequently to refresh Swarm-specific Prometheus metrics in Elasticsearch; it defaults to 0, which disables this export. (SWAR-8408).

Impacts for 10.1

  • Upgrading Elasticsearch — Continue to use Elasticsearch 2.3.3 with Storage 10.1 until able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release. Complete the upgrade to Elasticsearch 5.6 before upgrading to Gateway 6.0.

  • Configuration Settings — Run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues.

    • metrics.enableNodeExporter=true enables Swarm to run the Prometheus node exporter on port 9100. (SWAR-8170)

  • IP address update delay — When upgrading from Swarm 9 to the new architecture of Swarm 10, note the "ghosts" of previously used IP addresses may appear in the Storage UI; these resolve within 4 days. (SWAR-8351)

  • Update MIBs on CSN — Before upgrading to Storage 10.x, the MIBs on the CSN must be updated. From the Swarm Support tools bundle, run the platform-update-mibs.sh script. (CSN-1872)

Impacts for 10.0

  • Upgrading Elasticsearch: You may continue to use Elasticsearch 2.3.3 with Storage 10.0 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release.

  • Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.

    • Changes for the new single-IP dense architecture:

      • network.ipAddress - multiple IP addresses now disallowed

      • chassis.processes - removed; multi-server configurations are no longer supported

      • ec.protectionLevel - new value "volume"

      • ec.subclusterLossTolerance - removed

    • Changes for security (see next section)

      • security.administrators, security.operators - removed 'snmp' user

      • snmp.rwCommunity, snmp.roCommunity - new settings for 'snmp' user

      • startup.certificates - new setting to hold any and all public keys

    • New settings:

      • disk.atimeEnabled

      • health.parallelWriteTimeout

      • search.pathDelimiter

  • Required SNMP Security Change: Remove the snmp key from the security.administrators setting, and update snmp.rwCommunity with its value. Nodes that contain only the snmp key in the security.administrators setting does not boot. If you changed the default value of the snmp key in the security.operators setting, update snmp.roCommunity with that value and then remove the snmp key from security.operators. In the security.operators setting, 'snmp' is a reserved key, and it cannot be an authorized console operator name. (SWAR-8097)

  • EC Protection

    • Best practice: Use ec.protectionLevel=node, which distributes segments across the cluster's physical/virtual machines. Do not use ec.protectionLevel=subcluster unless you already have subclusters defined and are sure the specified EC encoding is supported. A new level, ec.protectionLevel=volume, allows EC writes to succeed if you have a small cluster with fewer than (k+p)/p nodes. (Swarm always seeks the highest protection possible for EC segments, regardless of the level you set.)

    • Optimize hardware for EC by verifying there are more than k+p subclusters/nodes (as set by ec.protectionLevel); for example, with policy.ecEncoding=5:2, you need at least 8 subclusters/nodes. When Swarm cannot distribute EC segments adequately for protection, EC writes can fail despite ample free space. (SWAR-7985)

    • Setting ec.protectionLevel=subcluster without creating subclusters (defining node.subcluster across sets of nodes) causes a critical error and lowers the protection level to 'node'. (SWAR-8175)

  • Small Clusters: Verify the following settings if using 10 or fewer Swarm nodes. Do not use fewer than 3 in production.
    Important: If you need to change any, do so before upgrading to Swarm 10.

    • policy.replicas: The min and default values for numbers of replicas to keep in your cluster must not exceed your number of nodes. For example, a 3-node cluster may have only min=2 or min=3.

    • EC Encoding and Protection: For EC encoding, verify you have enough nodes to support the cluster's encoding (policy.ecEncoding). For EC writes to succeed with fewer than (k+p)/p nodes, use the new level, ec.protectionLevel=volume.

    • Best Practice: Keep at least one physical machine in your cluster beyond the minimum number needed. This allows for one machine to be down for maintenance without compromising the constraint.

  • Cluster in a Box: Swarm supports a "cluster in a box" configuration as long as that box is running a virtual machine host and Swarm instances are running in 3 or more VMs. Each VM boots separately and has its own IP address. Follow the recommendations for small clusters, substituting VMs for nodes. If you have two physical machines, use the "cluster in a box" configuration, but move to direct booting of Swarm with 3 or more.

  • Offline Node Status: Because Swarm 10's new architecture reduces the number of IP addresses in your storage cluster, you may see the old IPs and subclusters reporting as Offline nodes until they timeout in 4 days (crier.forgetOfflineInterval), which is expected.

Info

The Multipath support is obselete from Swarm 10 onward.

Watch Items and Known Issues

The following watch items are known:

  • Volumes newly formatted in Swarm 11.1, 11.2, or 11.3 to use encryption-at-rest cannot be downgraded to Swarm 11.0 or earlier without a special procedure to prevent data loss. Contact DataCore Support before any such downgrade with encrypted volumes. (SWAR-8941)

  • Infrequent WARNING messages, "Node/Volume entry not published due to lock contention (...); action will be retried," may appear in logs. Unless they are frequent, they may be ignored. (SWAR-8802)

  • If a node mounts an encrypted volume that is missing the encryption key in the configuration, the node fails to mount all disks in the node. (SWAR-8762)

  • S3 Backup feeds do not backup logical objects greater than 5 GB. (SWAR-8554)

  • When restarting a cluster of virtual machines that are UEFI-booted (versus legacy BIOS), the chassis shut down but do not come back up. (SWAR-8054)

  • With multipath-enabled hardware, the Swarm console Disk Volume Menu may erroneously show too many disks, having multiplied the actual disks in use by the number of possible paths to them. (SWAR-7248)

These are standing operational limitations:

  • If downgrading from Swarm 11.0, CRITICAL errors may appear on the feeds. To stop the errors, edit the existing feed definition names via the Swarm UI or legacy Admin Console. (SWAR-8543)

  • If the Elasticsearch cluster is wiped, the Storage UI shows no NFS config. Contact DataCore Support for help repopulating the SwarmFS config information. (SWAR-8007)

  • If a bucket is deleted, any incomplete multipart upload into that bucket leaves the parts (unnamed streams) in the domain. To find and delete them, use the s3cmd utility (search the Support site for "s3cmd" for guidance). (SWAR-7690)

  • Removing subcluster assignments in the CSN UI creates invalid config parameters preventing the unassigned nodes from booting. (SWAR-7675)

  • Logs showed the error "FEEDS WARNING: calcFeedInfo(etag=xxx) cannot find domain xxx, which is needed for a domains-specific replication feed". The root cause is fixed; if such warnings are received, contact DataCore Support so the issue can be resolved. (SWAR-7556)

  • If a feed is subject to a prolonged outage, a node reboot may be required for it to resume progress after the outage is cleared. If progress is not resolved after the reboot, contact DataCore Support. This has been resolved in 12.1.0 (SWAR-9062)

  • If Elasticsearch 6.8.6 blocks an index due to low disk space, this needs to be issued against each index (index_*, csmeter*, metrics*) in the read_only_allow_delete state. This is no longer an issue after upgrading to Swarm 12 / Elasticsearch 7 as it automatically unblocks when disk space frees up. (SWAR-8944)

    curl -i -XPUT "<ESSERVERIP>:9200/<INDEXNAME>/_settings" -d '{"index.blocks.read_only_allow_delete" : null}' -H "Content-Type: application/json"

Upgrading Swarm

Note these installation issues when upgrading Swarm:

  • The elasticsearch-curator package may show an error during an upgrade, which is a known curator issue. Workaround: Reinstall the curator. (SWAR-7439)

    yum reinstall elasticsearch-curator
  • Do not install the Swarm Search RPM before installing Java. If Gateway startup fails with "Caringo script plugin is missing from indexer nodes", uninstall and reinstall the Swarm Search RPM. (SWAR-7688)

Proceed to How to Upgrade Swarm to upgrade Swarm 9 or higher.

Important

Contact DataCore Support for guidance if needing to migrate from Swarm 8.x or earlier.


  • No labels