New Features
- Swarm 10 Performance — With Storage 10.1, the performance both for writes and for erasure-coded object reads is improved for Swarm 10's density-friendly single-IP architecture, the result of optimizations in how Swarm nodes write to volumes under the new design. (SWAR-8357)
- Memory Handling — Swarm has improved memory handling, especially with bursts and high loads, and 503 Service Unavailable responses are less likely. (SWAR-8335)
- Hardware Diagnostics — This release includes a preview of the Prometheus Node Exporter, for monitoring and diagnostics on the machines in your Swarm cluster. Prometheus is an open-source systems monitoring and alerting toolkit that lets you view what statistics are available for your system, even under failure conditions. Prometheus scrapes metrics from instrumented jobs, running rules over this data to record aggregated time series or to generate alerts. Grafana and other API consumers can let you visualize the collected data. The new setting
metrics.enableNodeExporter
enables Swarm to run the Prometheus node exporter on port 9100. As a preview, the settings and implementation are subject to change; for more about this preview, contact Support. (SWAR-8170) - Bulk Reformatting — Retiring volumes in order to implement encryption at rest requires you to then reformat and remount the volumes. You can now contact Support for a utility to streamline this process. (SWAR-8088)
- ES Cluster Configuration — The automation script for configuring Elasticsearch for Swarm now generates the complete set of unique configuration files for each node in the Elasticsearch cluster. (SWAR-8028)
Additional Changes
These items are other changes and improvements including those that come from testing and user feedback.
- OSS Updates — Storage 10.0 includes updates to third-party components. See Third-Party Components for 10.0 for the complete listing of packages and versions.
SCSP Errors — Several new error tokens and error improvements have been added for this release. See Error Response Headers.
- Blocked Feeds
- Swarm can now detect the disappearance of the Elasticsearch index associated with a feed and mark the feed as Blocked; such feeds may need to be deleted and recreated. (SWAR-6885)
- Blocked feeds are retried every 20 minutes, but changing the definition for a blocked feed now triggers an immediate attempt with the new definition, which might clear the blockage. (SWAR-8232)
- The handling and reporting of feeds that have become blocked due internal software issues has been improved. (SWAR-8267)
- Fixed
- The correct behavior for the network.ntpControlKey setting was restored. (SWAR-8371)
- After an upgrade to version 10.0, the Chassis Details in the Swarm UI did not update the node listings correctly for a period of time. (SWAR-8352)
- When SwarmFS was not in use, clicking on the NFS settings link in the Storage UI resulted in an error. (SWAR-8350)
- The correct behavior for the drive light toggle in the Swarm UI was restored. (SWAR-8336)
- The SNMP MIB entries for the largest stream (volLargestStreamMB and volLargestStreamUUID) were not being populated. (SWAR-8331)
- Rapid updates of objects written with
replicate=immediate
might result in some replicas not being found temporarily. (SWAR-8249) - Deletes of unnamed objects did not update the Elasticsearch index, leaving a stale entry. (SWAR-8218)
- Deletes that were not processed by a feed within two weeks were not propagated to the feed's destination (Elasticsearch or the replication target). (SWAR-7950)
Upgrade Impacts
These items are changes to the product function that may require operational or development changes for integrated applications.
Impacts for 10.1
- Upgrading Elasticsearch — You may continue to use Elasticsearch 2.3.3 with Storage 10.1 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 will end in a future release. Before you upgrade to Gateway 6.0, however, you must complete the upgrade to Elasticsearch 5.6.
- Configuration Settings — Be sure to run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues.
metrics.enableNodeExporter=true
enables Swarm to run the Prometheus node exporter on port 9100. (SWAR-8170)
- IP address update delay — When upgrading from Swarm 9 to the new architecture of Swarm 10, note that the "ghosts" of previously used IP addresses might appear in the Storage UI; these will resolve within 4 days. (SWAR-8351)
- Update MIBs on CSN — Before upgrading to Storage 10.x, the MIBs on the CSN must be updated. From the Swarm Support tools bundle, run the
platform-update-mibs.sh
script. (CSN-1872)
Impacts for 10.0
Upgrading Elasticsearch: You may continue to use Elasticsearch 2.3.3 with Storage 10.0 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release.
Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.
Changes for the new single-IP dense architecture:
network.ipAddress
- multiple IP addresses now disallowedchassis.processes
- removed; multi-server configurations are no longer supportedec.protectionLevel
- new value "volume"ec.subclusterLossTolerance
- removed
Changes for security (see next section)
security.administrators, security.operators
- removed 'snmp' usersnmp.rwCommunity, snmp.roCommunity
- new settings for 'snmp' userstartup.certificates
- new setting to hold any and all public keys
New settings:
disk.atimeEnabled
health.parallelWriteTimeout
search.pathDelimiter
Required SNMP Security Change: Remove the
snmp
key from thesecurity.administrators
setting, and updatesnmp.rwCommunity
with its value. Nodes that contain only thesnmp
key in thesecurity.administrators
setting does not boot. If you changed the default value of the snmp key in thesecurity.operators
setting, updatesnmp.roCommunity
with that value and then remove thesnmp
key fromsecurity.operators
. In thesecurity.operators
setting, 'snmp
' is a reserved key, and it cannot be an authorized console operator name. (SWAR-8097)EC Protection
Best practice: Use
ec.protectionLevel=node
, which distributes segments across the cluster's physical/virtual machines. Do not useec.protectionLevel=subcluster
unless you already have subclusters defined and are sure the specified EC encoding is supported. A new level,ec.protectionLevel=volume
, allows EC writes to succeed if you have a small cluster with fewer than (k+p)/p nodes. (Swarm always seeks the highest protection possible for EC segments, regardless of the level you set.)Optimize hardware for EC by verifying there are more than k+p subclusters/nodes (as set by
ec.protectionLevel
); for example, withpolicy.ecEncoding=5:2
, you need at least 8 subclusters/nodes. When Swarm cannot distribute EC segments adequately for protection, EC writes can fail despite ample free space. (SWAR-7985)Setting
ec.protectionLevel=subcluster
without creating subclusters (definingnode.subcluster
across sets of nodes) causes a critical error and lowers the protection level to 'node'. (SWAR-8175)
Small Clusters: Verify the following settings if using 10 or fewer Swarm nodes. Do not use fewer than 3 in production.
Important: If you need to change any, do so before upgrading to Swarm 10.policy.replicas: The
min
anddefault
values for numbers of replicas to keep in your cluster must not exceed your number of nodes. For example, a 3-node cluster may have onlymin=2
ormin=3
.EC Encoding and Protection: For EC encoding, verify you have enough nodes to support the cluster's encoding (
policy.ecEncoding
). For EC writes to succeed with fewer than (k+p)/p nodes, use the new level,ec.protectionLevel=volume
.Best Practice: Keep at least one physical machine in your cluster beyond the minimum number needed. This allows for one machine to be down for maintenance without compromising the constraint.
Cluster in a Box: Swarm supports a "cluster in a box" configuration as long as that box is running a virtual machine host and Swarm instances are running in 3 or more VMs. Each VM boots separately and has its own IP address. Follow the recommendations for small clusters, substituting VMs for nodes. If you have two physical machines, use the "cluster in a box" configuration, but move to direct booting of Swarm with 3 or more.
Offline Node Status: Because Swarm 10's new architecture reduces the number of IP addresses in your storage cluster, you may see the old IPs and subclusters reporting as Offline nodes until they timeout in 4 days (
crier.forgetOfflineInterval
), which is expected.
Info
The Multipath support is obselete from Swarm 10 onward.
For Swarm 9 impacts, see Swarm Storage 9 Releases.
Watch Items and Known Issues
The following operational limitations and watch items exist in this release.
- During a rolling reboot of a small cluster, erroneous CRITICAL errors may appear on the console, claiming that EC objects have insufficient protection. These errors may be disregarded. (SWAR-8421)
- The rate at which nodes retire is slower in Swarm 10.x than 9.6. (SWAR-8386)
- When restarting a cluster of virtual machines that are UEFI-booted (versus legacy BIOS), the chassis shut down but do not come back up. (SWAR-8054)
- If you wipe your Elasticsearch cluster, the Storage UI will show no NFS config. Contact Support for help repopulating your SwarmFS config information. (SWAR-8007)
- If you delete a bucket, any incomplete multipart upload into that bucket will leave its parts (unnamed streams) in the domain. To find and delete them, use the s3cmd utility (search the Support site for "s3cmd" for guidance). (SWAR-7690)
- Dell DX hardware will have less chassis-level monitoring information available via SNMP. If this is a concern, contact Support. (SWAR-7606)
- Logs showed the error "FEEDS WARNING: calcFeedInfo(etag=xxx) couldn't find domain xxx, which is needed for a domains-specific replication feed". The root cause is fixed; if you received such warnings, contact Support so that your issue can be resolved. (SWAR-7556)
- With multipath-enabled hardware, the Swarm console Disk Volume Menu may erroneously show too many disks, having multiplied the actual disks in use by the number of possible paths to them. (SWAR-7248)
Upgrading from 9.x
Important
Do not begin the upgrade until you complete the following:
- Plan upgrade impacts — Review and plan for this release's upgrade impacts (above) and the impacts for each of the releases since the version you are running. For Swarm 9 impacts, see Swarm Storage 9 Releases.
- Finish volume retires — Do not start any elective volume retirements during the upgrade. Wait until the upgrade is complete before initiating any retires.
- Run checker script — Swarm 10 includes a migration checker script to run before upgrading from Swarm 9; it reports configuration setting issues and deprecations that must be addressed. (SWAR-8230) See Storage Settings Checker.
If you need to upgrade from Swarm 8.x or earlier, contact Support for guidance.
- Download the correct bundle for your site. Swarm distributions bundle together the core components that you need for both implementation and later updates; the latest versions are available in the Downloads section on the DataCore Support Portal.
There are two bundles available:- Platform CSN 8.3 Full Install or Update (for CSN environments) — Flat structure for scripted install/update on a CSN (see CSN Upgrades).
Swarm 10 Software Bundle (Platform 9.x and custom environments) — Contains complete updates of all core components, organized hierarchically by component.
Note
Contact Support for new installs of Platform Server and for optional Swarm client components, such as SwarmFS Implementation, that have separate distributions.
- Download the comprehensive PDF of Swarm Documentation that matches your bundle distribution date, or use the online HTML version from the Documentation Archive.
Choose your type of upgrade. Swarm supports rolling upgrades (a single cluster running mixed versions during the upgrade process) and requires no data conversion unless specifically noted for a particular release. This means that you can upgrade without scheduling an outage or bringing down the cluster. Just restart your nodes one at a time with the new version and the cluster will continue serving applications during the upgrade process.
- Rolling upgrade: Reboot one node at a time and wait for its status to show as "OK" in the UI before rebooting the next node.
- Alternative: Reboot the entire cluster at once after the software on all USB flash drives or the centralized configuration location has been updated.
- Choose whether to upgrade Elasticsearch 2.3.3 at this time.
- To upgrade to Elasticsearch 5.6 with an existing cluster, you must reindex your Search data and migrate any Metrics data that you want to keep. See Migrating from Older Elasticsearch for details. (SWAR-7395)
- Note these installation issues:
- The elasticsearch-curator package may show an error during an upgrade, which is a known curator issue. Workaround: Reinstall the curator:
yum reinstall elasticsearch-curator
(SWAR-7439) - Do not install the Swarm Search RPM before installing Java. If Gateway startup fails with "Caringo script plugin is missing from indexer nodes", uninstall and reinstall the Swarm Search RPM. (SWAR-7688)
- During a rolling upgrade from 9.0.x–9.2.x, you may see intermittent "WriterMissingRemoteMD5 error token" errors from a client write operation through the Gateway or on writes with gencontentmd5 (or the equivalent). To prevent this, set
autoRepOnWrite=0
during the upgrade and restoreautoRepOnWrite=1
after it completes. (SWAR-7756)
- The elasticsearch-curator package may show an error during an upgrade, which is a known curator issue. Workaround: Reinstall the curator:
- Review the Application and Configuration Guidance.