Swarm Storage 10.1 Release

New Features

Swarm 10 Performance — With Storage 10.1, the performance both for writes and for erasure-coded object reads is improved for Swarm 10's density-friendly single-IP architecture, the result of optimizations in how Swarm nodes write to volumes under the new design. (SWAR-8357)
Memory Handling — Swarm has improved memory handling, especially with bursts and high loads, and 503 Service Unavailable responses are less likely. (SWAR-8335)
Hardware Diagnostics — This release includes a preview of the Prometheus Node Exporter, for monitoring and diagnostics on the machines in the Swarm cluster. Prometheus is an open-source systems monitoring and alerting toolkit. It allows viewing of statistics available for the system, even under failure conditions. Prometheus scrapes metrics from instrumented jobs, running rules over this data to record aggregated time series or to generate alerts. Grafana and other API consumers can allow visualizing the collected data. The new setting metrics.enableNodeExporter enables Swarm to run the Prometheus node exporter on port 9100. As a preview, the settings and implementation are subject to change; for more about this preview, contract DataCore Support. (SWAR-8170)
Bulk Reformatting — Retiring volumes in order to implement encryption at rest requires you to then reformat and remount the volumes. Contact DataCore Support for a utility to streamline this process. (SWAR-8088)
ES Cluster Configuration — The automation script for configuring Elasticsearch for Swarm generates the complete set of unique configuration files for each node in the Elasticsearch cluster. (SWAR-8028)

Additional Changes

These items are other changes and improvements including those from testing and user feedback.

OSS Updates — Storage 10.0 includes updates to third-party components. See Third-Party Components for 10.0 for the complete listing of packages and versions.
SCSP Errors — Several new error tokens and error improvements have been added for this release. See Error Response Headers.
Blocked Feeds
- Swarm can now detect the disappearance of the Elasticsearch index associated with a feed and mark the feed as Blocked; such feeds may need to be deleted and recreated. (SWAR-6885)
- Blocked feeds are retried every 20 minutes, but changing the definition for a blocked feed now triggers an immediate attempt with the new definition, which might clear the blockage. (SWAR-8232)
- The handling and reporting of feeds blocked due to internal software issues is improved. (SWAR-8267)
Fixed
- The correct behavior for the network.ntpControlKey setting was restored. (SWAR-8371)
- After an upgrade to version 10.0, the Chassis Details in the Swarm UI did not update the node listings correctly for a period of time. (SWAR-8352)
- When SwarmFS was not in use, clicking on the NFS settings link in the Storage UI resulted in an error. (SWAR-8350)
- The correct behavior for the drive light toggle in the Swarm UI was restored. (SWAR-8336)
- The SNMP MIB entries for the largest stream (volLargestStreamMB and volLargestStreamUUID) are not populated. (SWAR-8331)
- Rapid updates of objects written with replicate=immediate might result in some replicas not being found temporarily. (SWAR-8249)
- Deletes of unnamed objects did not update the Elasticsearch index, leaving a stale entry. (SWAR-8218)
- Deletes not processed by a feed within two weeks are not propagated to the feed's destination (Elasticsearch or the replication target). (SWAR-7950)

Upgrade Impacts

These items are changes to the product function requiring operational or development changes for integrated applications.

Impacts for 10.1

Upgrading Elasticsearch — You may continue to use Elasticsearch 2.3.3 with Storage 10.1 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 will end in a future release. Before you upgrade to Gateway 6.0, however, you must complete the upgrade to Elasticsearch 5.6.
Configuration Settings — Run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues.
- metrics.enableNodeExporter=true enables Swarm to run the Prometheus node exporter on port 9100. (SWAR-8170)
IP address update delay — When upgrading from Swarm 9 to the new architecture of Swarm 10, note the "ghosts" of previously used IP addresses may appear in the Storage UI; these resolve within 4 days. (SWAR-8351)
Update MIBs on CSN — Before upgrading to Storage 10.x, the MIBs on the CSN must be updated. From the Swarm Support tools bundle, run the platform-update-mibs.sh script. (CSN-1872)

Impacts for 10.0

Upgrading Elasticsearch: You may continue to use Elasticsearch 2.3.3 with Storage 10.0 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release.
Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.
- Changes for the new single-IP dense architecture:
  - network.ipAddress - multiple IP addresses now disallowed
  - chassis.processes - removed; multi-server configurations are no longer supported
  - ec.protectionLevel - new value "volume"
  - ec.subclusterLossTolerance - removed
- Changes for security (see next section)
  - security.administrators, security.operators - removed 'snmp' user
  - snmp.rwCommunity, snmp.roCommunity - new settings for 'snmp' user
  - startup.certificates - new setting to hold any and all public keys
- New settings:
  - disk.atimeEnabled
  - health.parallelWriteTimeout
  - search.pathDelimiter
Required SNMP Security Change: Remove the snmp key from the security.administrators setting, and update snmp.rwCommunity with its value. Nodes that contain only the snmp key in the security.administrators setting does not boot. If you changed the default value of the snmp key in the security.operators setting, update snmp.roCommunity with that value and then remove the snmp key from security.operators. In the security.operators setting, 'snmp' is a reserved key, and it cannot be an authorized console operator name. (SWAR-8097)
EC Protection
- Best practice: Use ec.protectionLevel=node, which distributes segments across the cluster's physical/virtual machines. Do not use ec.protectionLevel=subcluster unless you already have subclusters defined and are sure the specified EC encoding is supported. A new level, ec.protectionLevel=volume, allows EC writes to succeed if you have a small cluster with fewer than (k+p)/p nodes. (Swarm always seeks the highest protection possible for EC segments, regardless of the level you set.)
- Optimize hardware for EC by verifying there are more than k+p subclusters/nodes (as set by ec.protectionLevel); for example, with policy.ecEncoding=5:2, you need at least 8 subclusters/nodes. When Swarm cannot distribute EC segments adequately for protection, EC writes can fail despite ample free space. (SWAR-7985)
- Setting ec.protectionLevel=subcluster without creating subclusters (defining node.subcluster across sets of nodes) causes a critical error and lowers the protection level to 'node'. (SWAR-8175)
Small Clusters: Verify the following settings if using 10 or fewer Swarm nodes. Do not use fewer than 3 in production.
Important: If you need to change any, do so before upgrading to Swarm 10.
- policy.replicas: The min and default values for numbers of replicas to keep in your cluster must not exceed your number of nodes. For example, a 3-node cluster may have only min=2 or min=3.
- EC Encoding and Protection: For EC encoding, verify you have enough nodes to support the cluster's encoding (policy.ecEncoding). For EC writes to succeed with fewer than (k+p)/p nodes, use the new level, ec.protectionLevel=volume.
- Best Practice: Keep at least one physical machine in your cluster beyond the minimum number needed. This allows for one machine to be down for maintenance without compromising the constraint.
Cluster in a Box: Swarm supports a "cluster in a box" configuration as long as that box is running a virtual machine host and Swarm instances are running in 3 or more VMs. Each VM boots separately and has its own IP address. Follow the recommendations for small clusters, substituting VMs for nodes. If you have two physical machines, use the "cluster in a box" configuration, but move to direct booting of Swarm with 3 or more.
Offline Node Status: Because Swarm 10's new architecture reduces the number of IP addresses in your storage cluster, you may see the old IPs and subclusters reporting as Offline nodes until they timeout in 4 days (crier.forgetOfflineInterval), which is expected.

Info

The Multipath support is obselete from Swarm 10 onward.

For Swarm 9 impacts, see Swarm Storage 9 Releases.

Watch Items and Known Issues

The following operational limitations and watch items exist in this release.

During a rolling reboot of a small cluster, erroneous CRITICAL errors may appear on the console, claiming EC objects have insufficient protection. These errors may be disregarded. (SWAR-8421)
The rate at which nodes retire is slower in Swarm 10.x than 9.6. (SWAR-8386)
When restarting a cluster of UEFI-booted (versus legacy BIOS) virtual machines, the chassis shut down but do not come back up. (SWAR-8054)
If you wipe your Elasticsearch cluster, the Storage UI will show no NFS config. Contract DataCore Support for help repopulating your SwarmFS config information. (SWAR-8007)
If you delete a bucket, any incomplete multipart upload into the bucket leaves the parts (unnamed streams) in the domain. To find and delete them, use the s3cmd utility (search the Support site for "s3cmd" for guidance). (SWAR-7690)
Dell DX hardware will have less chassis-level monitoring information available via SNMP. If this is a concern, contract DataCore Support. (SWAR-7606)
Logs showed the error "FEEDS WARNING: calcFeedInfo(etag=xxx) couldn't find domain xxx, which is needed for a domains-specific replication feed". The root cause is fixed; if you received such warnings, contact DataCore Support so the issue can be resolved. (SWAR-7556)
With multipath-enabled hardware, the Swarm console Disk Volume Menu may erroneously show too many disks, having multiplied the actual disks in use by the number of possible paths to them. (SWAR-7248)

Upgrading from 9.x

Important

Do not begin the upgrade until you complete the following:

Plan upgrade impacts — Review and plan for this release's upgrade impacts (above) and the impacts for each of the releases since the version you are running. For Swarm 9 impacts, see Swarm Storage 9 Releases.
Finish volume retires — Do not start any elective volume retirements during the upgrade. Wait until the upgrade is complete before initiating any retires.
Run checker script — Swarm 10 includes a migration checker script to run before upgrading from Swarm 9; it reports configuration setting issues and deprecations to be addressed. (SWAR-8230) See Storage Settings Checker.

If you need to upgrade from Swarm 8.x or earlier, contract DataCore Support for guidance.

Download the correct bundle for your site. Swarm distributions bundle together the core components needed for implementation and later updates; the latest versions are available in the Downloads section on the DataCore Support Portal.
Two bundles are available:
- Platform CSN 8.3 Full Install or Update (for CSN environments) — Flat structure for scripted install/update on a CSN (see CSN Upgrades).
- Swarm 10 Software Bundle (Platform 9.x and custom environments) — Contains complete updates of all core components, organized hierarchically by component.
Download the comprehensive PDF of Swarm Documentation matching the bundle distribution date, or use the online HTML version from the Documentation Archive.
Choose the type of upgrade. Swarm supports rolling upgrades (a single cluster running mixed versions during the upgrade process) and requires no data conversion unless specifically noted for a particular release. Upgrades can be performed without scheduling an outage or bringing down the cluster. Restart the nodes one at a time with the new version and the cluster continues serving applications during the upgrade process.
- Rolling upgrade: Reboot one node at a time and wait for its status to show as "OK" in the UI before rebooting the next node.
- Alternative: Reboot the entire cluster at once after the software on all USB flash drives or the centralized configuration location has been updated.
Choose whether to upgrade Elasticsearch 2.3.3 at this time.
- To upgrade to Elasticsearch 5.6 with an existing cluster, reindex Search data and migrate any Metrics data to be kept. See Migrating from Older Elasticsearch for details. (SWAR-7395)
Note these installation issues:
- The elasticsearch-curator package may show an error during an upgrade, which is a known curator issue. Workaround: Reinstall the curator: yum reinstall elasticsearch-curator (SWAR-7439)
- Do not install the Swarm Search RPM before installing Java. If Gateway startup fails with "Caringo script plugin is missing from indexer nodes", uninstall and reinstall the Swarm Search RPM. (SWAR-7688)
- During a rolling upgrade from 9.0.x–9.2.x, you may see intermittent "WriterMissingRemoteMD5 error token" errors from a client write operation through the Gateway or on writes with gencontentmd5 (or the equivalent). To prevent this, set autoRepOnWrite=0 during the upgrade and restore autoRepOnWrite=1 after it completes. (SWAR-7756)
Review the Application and Configuration Guidance.

Note

Contact DataCore Support for new installs of Platform Server and for optional Swarm client components, such as SwarmFS Implementation, that have separate distributions.