New Features
Swarm 10 Performance
The rate at which nodes retire is now improved over both version 10.1 and version 9.6 of Swarm Storage. (SWAR-8386)
Swarm has boosted the performance of erasure-coded range reads under high loads. (SWAR-8182)
Prometheus Node Exporter — The Prometheus Node Exporter preview has configuration enhancements.
The service is now enabled by default (
metrics.enableNodeExporter=True
), which makes basic hardware queries across nodes available without reboot.A new setting,
metrics.nodeExporterFrequency
, sets how frequently to refresh Swarm-specific metrics in Elasticsearch; it defaults to 0, which disables this export. (SWAR-8408)
Swarm Management
The new node-level Swarm configuration setting
security.securePhysicalConsole
allows locking out access to the console's System Menu commands. This security measure is for nodes located where they could be at risk for unauthorized viewing and tampering. (10.2.1: SWAR-5309)To ease upgrades to Swarm 10, the cluster-wide setting
ec.protectionLevel
is now a persisted setting, so that it can be changed on demand via Swarm UI or SNMP. The setting is no longer managed within and across config files, requiring consistency and cluster restarts. (SWAR-8231)For better management of multipart uploads, both
castor-system-uploadid
andcastor-system-partnumber
now allow query args to use either hyphens or underscores in the field name, as is supported for metadata headers such as content-type. (SWAR-8274)Swarm now raises alerts on objects that have persistent feed-related failures, such as objects that cannot be indexed in Elasticsearch or be remotely replicated. To investigate the cause for such failures, examine the details in the logs. (SWAR-8383)
The
versions
query argument on listing queries now acceptsversions=previous
to limit results to only the past versions of an object. (SWAR-6847)Swarm now accepts named objects whose path name relative to the bucket looks like a UUID (32-character hexadecimal). (SWAR-8199)
Additional Changes
These items are other changes and improvements including those that come from testing and user feedback.
OSS Versions — See Third-Party Components for 10.2.1 for the complete listing of packages and versions.
The Linux kernel is updated to 4.19.37 and the mpt3sas driver is updated to 26.100.00.00. (10.2.1: SWAR-8480)
Intel network drivers are updated, ixgbe to 5.5.5 and i40e to 2.7.29. (10.2.1: SWAR-8498)
Fixed
A 3-node cluster would not retire a volume efficiently if it contained objects that required 3 replicas. (10.2.1: SWAR-8482)
Changing the
metrics.target
host from an Elasticsearch 2.3.3 cluster to a 5.6.12 cluster did not trigger the needed update of the index schemas before new data was indexed. (SWAR-8426)An SNMP shutdown request for a Swarm node instead caused it to be rebooted. (SWAR-8422)
Maintenance activities on the Elasticsearch cluster created erroneous reports of an index missing in the Swarm cluster. (SWAR-8413)
Swarm now installs the python requests package needed for the metrics migration script that is used during migration to Elasticsearch 5.6. (SWAR-8407)
An issue caused Swarm to erroneously report low memory. (SWAR-8399)
Swarm search queries would hang if the associated Search feed referred to invalid or unavailable Elasticsearch nodes. (SWAR-8200)
Upgrade Impacts
These items are changes to the product function that may require operational or development changes for integrated applications. Address the upgrade impacts for each of the versions since the one you are currently running:
Impacts for 10.2
Upgrading Elasticsearch — You may continue to use Elasticsearch 2.3.3 with Storage 10.2 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 will end in a future release. Before you upgrade to Gateway 6.0, however, you must complete the upgrade to Elasticsearch 5.6.
Configuration Settings — Be sure to run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues. Note these changes:
ec.protectionLevel
is now persisted. (SWAR-8231)index.ovMinNodes=3
is the new default for the overlay index, in support of Swarm 10's new architecture. To keep your overlay index operational, set this new value in your cluster, through the UI or by SNMP (overlayMinNodes). (SWAR-8278)metrics.enableNodeExporter
can be set to True, which enables the Prometheus Node Exporter on that node. (SWAR-8408, SWAR-8578)metrics.nodeExporterFrequency
, a new dynamic setting, sets how frequently to refresh Swarm-specific Prometheus metrics in Elasticsearch; it defaults to 0, which disables this export. (SWAR-8408).
Impacts for 10.1
Upgrading Elasticsearch — Continue to use Elasticsearch 2.3.3 with Storage 10.1 until able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release. Complete the upgrade to Elasticsearch 5.6 before upgrading to Gateway 6.0.
Configuration Settings — Run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues.
metrics.enableNodeExporter=true
enables Swarm to run the Prometheus node exporter on port 9100. (SWAR-8170)
IP address update delay — When upgrading from Swarm 9 to the new architecture of Swarm 10, note the "ghosts" of previously used IP addresses may appear in the Storage UI; these resolve within 4 days. (SWAR-8351)
Update MIBs on CSN — Before upgrading to Storage 10.x, the MIBs on the CSN must be updated. From the Swarm Support tools bundle, run the
platform-update-mibs.sh
script. (CSN-1872)
Impacts for 10.0
Upgrading Elasticsearch: You may continue to use Elasticsearch 2.3.3 with Storage 10.0 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release.
Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.
Changes for the new single-IP dense architecture:
network.ipAddress
- multiple IP addresses now disallowedchassis.processes
- removed; multi-server configurations are no longer supportedec.protectionLevel
- new value "volume"ec.subclusterLossTolerance
- removed
Changes for security (see next section)
security.administrators, security.operators
- removed 'snmp' usersnmp.rwCommunity, snmp.roCommunity
- new settings for 'snmp' userstartup.certificates
- new setting to hold any and all public keys
New settings:
disk.atimeEnabled
health.parallelWriteTimeout
search.pathDelimiter
Required SNMP Security Change: Remove the
snmp
key from thesecurity.administrators
setting, and updatesnmp.rwCommunity
with its value. Nodes that contain only thesnmp
key in thesecurity.administrators
setting does not boot. If you changed the default value of the snmp key in thesecurity.operators
setting, updatesnmp.roCommunity
with that value and then remove thesnmp
key fromsecurity.operators
. In thesecurity.operators
setting, 'snmp
' is a reserved key, and it cannot be an authorized console operator name. (SWAR-8097)EC Protection
Best practice: Use
ec.protectionLevel=node
, which distributes segments across the cluster's physical/virtual machines. Do not useec.protectionLevel=subcluster
unless you already have subclusters defined and are sure the specified EC encoding is supported. A new level,ec.protectionLevel=volume
, allows EC writes to succeed if you have a small cluster with fewer than (k+p)/p nodes. (Swarm always seeks the highest protection possible for EC segments, regardless of the level you set.)Optimize hardware for EC by verifying there are more than k+p subclusters/nodes (as set by
ec.protectionLevel
); for example, withpolicy.ecEncoding=5:2
, you need at least 8 subclusters/nodes. When Swarm cannot distribute EC segments adequately for protection, EC writes can fail despite ample free space. (SWAR-7985)Setting
ec.protectionLevel=subcluster
without creating subclusters (definingnode.subcluster
across sets of nodes) causes a critical error and lowers the protection level to 'node'. (SWAR-8175)
Small Clusters: Verify the following settings if using 10 or fewer Swarm nodes. Do not use fewer than 3 in production.
Important: If you need to change any, do so before upgrading to Swarm 10.policy.replicas: The
min
anddefault
values for numbers of replicas to keep in your cluster must not exceed your number of nodes. For example, a 3-node cluster may have onlymin=2
ormin=3
.EC Encoding and Protection: For EC encoding, verify you have enough nodes to support the cluster's encoding (
policy.ecEncoding
). For EC writes to succeed with fewer than (k+p)/p nodes, use the new level,ec.protectionLevel=volume
.Best Practice: Keep at least one physical machine in your cluster beyond the minimum number needed. This allows for one machine to be down for maintenance without compromising the constraint.
Cluster in a Box: Swarm supports a "cluster in a box" configuration as long as that box is running a virtual machine host and Swarm instances are running in 3 or more VMs. Each VM boots separately and has its own IP address. Follow the recommendations for small clusters, substituting VMs for nodes. If you have two physical machines, use the "cluster in a box" configuration, but move to direct booting of Swarm with 3 or more.
Offline Node Status: Because Swarm 10's new architecture reduces the number of IP addresses in your storage cluster, you may see the old IPs and subclusters reporting as Offline nodes until they timeout in 4 days (
crier.forgetOfflineInterval
), which is expected.
Info
The Multipath support is obselete from Swarm 10 onward.
For Swarm 9 impacts, see Swarm Storage 9 Releases.
Watch Items and Known Issues
The following operational limitations and watch items exist in this release.
Under some conditions, Swarm might start without mounting some of its volumes. If this happens, reboot the node. (10.2.1: SWAR-8597)
The OS in 10.2.1 cannot mount USB flash drives and so cannot read node.cfg files from them. If you boot Swarm from a USB drive, contact DataCore Support for a corrected version. (10.2.1: SWAR-8501)
During a rolling reboot of a small cluster, erroneous CRITICAL errors may appear on the console, claiming that EC objects have insufficient protection. These errors may be disregarded. (SWAR-8421)
When restarting a cluster of virtual machines that are UEFI-booted (versus legacy BIOS), the chassis shut down but do not come back up. (SWAR-8054)
If you wipe your Elasticsearch cluster, the Storage UI will show no NFS config. Contact DataCore Support for help repopulating your SwarmFS config information. (SWAR-8007)
If you delete a bucket, any incomplete multipart upload into that bucket will leave its parts (unnamed streams) in the domain. To find and delete them, use the s3cmd utility (search the Support site for "s3cmd" for guidance). (SWAR-7690)
Logs showed the error "FEEDS WARNING: calcFeedInfo(etag=xxx) couldn't find domain xxx, which is needed for a domains-specific replication feed". The root cause is fixed; if you received such warnings, contact DataCore Support so the issue can be resolved. (SWAR-7556)
With multipath-enabled hardware, the Swarm console Disk Volume Menu may erroneously show too many disks, having multiplied the actual disks in use by the number of possible paths to them. (SWAR-7248)
Note these installation issues:
The elasticsearch-curator package may show an error during an upgrade, which is a known curator issue. Workaround: Reinstall the curator:
yum reinstall elasticsearch-curator
(SWAR-7439)Do not install the Swarm Search RPM before installing Java. If Gateway startup fails with "Caringo script plugin is missing from indexer nodes", uninstall and reinstall the Swarm Search RPM. (SWAR-7688)
Upgrading Swarm
To upgrade Swarm 9 or higher, proceed to How to Upgrade Swarm.
Important
If you need to upgrade from Swarm 8.x or earlier, contact DataCore Support for guidance.