New Features
Performance — This release of Swarm Storage enhances both memory management and cluster performance:
Swarm cluster startup has been optimized to guarantee the fastest sequencing. Now volume mounting must complete and the persistent settings must be processed before any needed recovery activities can commence. (SWAR-8911)
Swarm nodes shut down faster, allowing for quicker rebooting of Swarm clusters. (SWAR-8891)
Swarm nodes with limited physical memory can now respond better under high client loads. (SWAR-8870)
Swarm's memory management has been improved, which enables higher loads for client writes. (SWAR-8816)
Stability — This release also includes changes that improve Swarm stability and administration:
Better handling of newly added hotplug volumes results in clients receiving fewer 503 Service Unavailable responses. (SWAR-8887)
HP cycles now cleanse all traces of removed volumes from the cluster, greatly reducing the chance that recovery can be started erroneously for a volume already recovered. (SWAR-8836)
Reworking of cluster operations has reduced spurious "Cannot contact node" announcements during maintenance rebooting of multiple nodes. (SWAR-8848)
When secure logging (
security.secureLogging
) is enabled, Swarm removes more sensitive information from AUDIT-level messages. (SWAR-8790)
Additional Changes
These items are other changes, including those that come from testing and user feedback.
OSS Versions — See Third-Party Components for 11.3 for the complete listing of packages and versions.
Fixed in 11.3.0
Drive light plug-in control is restored for hardware in mpt3sas enclosures, including Western Digital Ultrastar Serv60. (SWAR-8934)
For some feed statistics, feed accounting resets and requires a reboot to correct the statistic. (SWAR-8854)
Upgrade Impacts
Use the supported versions of Swarm components if running an older version of Elasticsearch:
Elasticsearch 6.8.6 | Swarm Storage 11.1 - 11.3 | Gateway 6.3 | SwarmFS 2.4 | Recommended configuration. |
Elasticsearch 5.6.12 | Swarm Storage 10.0 - 11.3 | Gateway 6.0 - 6.3 | SwarmFS 2.4 | Plan to migrate to Elasticsearch 6. |
Elasticsearch 2.3.3 | Swarm Storage 9.6 - 11.3 | Gateway 5.4 | SwarmFS 2.1 |
These items are changes to the product function that may require operational or development changes for integrated applications. Address the upgrade impacts for each of the versions since the version currently running:
Impacts for 11.3
Upgrading Elasticsearch — Use Elasticsearch 5.6.12/2.3.3 with Storage 11 if moving to ES 6 immediately is not possible, but start the migration now (see Migrating from Older Elasticsearch). Support for ES 5.6.12/2.3.3 ends in a future release, and testing for 2.3.3 with Swarm 11 is discontinued. Important: Always upgrade Swarm Search and Metrics at the same time ES is upgrade . Do not run an ES 5 Search or Metrics Curator against ES 6.
Rolling upgrade — During a rolling upgrade from a version older than 11.1, the mixed state in Swarm versions among nodes may cause errors in the Swarm UI (and in management API calls). Use the legacy Admin Console (port 90) to monitor the rolling upgrade. (SWAR-8716)
Settings changes — The setting
health.parallelWriteTimeout
, which was disabled by default, now defaults to 1 month. It sets when to time out an uncompleted multipart upload, triggering clean up of the unused parts. Do not disable (0) if using SwarmFS. (SWAR-8902)Encryption-at-rest —If upgrading from Swarm 11.0 or earlier and encryption-at-rest is used, contact DataCore Support to verify a roll back to the prior version is possible, if needed. (SWAR-8941)
Differences in
scsp.forceLegacyNonce
configuration depending on the version being upgraded from (SWAR-9020):If currently running a Swarm Storage version prior to 11.1, and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:
Before upgrading, set
scsp.forceLegacyNonce=true
in thenode.cfg
file. After the upgrade, when the cluster is fully up, updatescsp.forceLegacyNonce=false
usingswarmctl
and changescsp.forceLegacyNonce=false
in thenode.cfg
file.If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:
Before upgrading, verify the
scsp.forceLegacyNonce=false
is in the node.cfg file and verify usingswarmctl
thatscsp.forceLegacyNonce=false
in the cluster.Use swarmctl to check or change settings
Use
'swarmctl -C scsp.forceLegacyNonce'
to check the value ofscsp.forceLegacyNonce
.Use
'swarmctl -C scsp.forceLegacyNonce -V False'
to set the value tofalse
.For more details, see https://support.cloud.caringo.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.
Impacts for 11.2
Upgrading Elasticsearch: Elasticsearch 5.6.12/2.3.3 may be used with Storage 11 if move to ES 6 cannot be performed immediately, but start the migration now (see Migrating from Older Elasticsearch). Support for ES 5.6.12/2.3.3 ends in a future release, and testing for 2.3.3 with Swarm 11 is discontinued.
Important
Always upgrade Swarm Search and Metrics at the same time upgrading ES. Do not run an ES 5 Search or Metrics Curator against ES 6.
Rolling Upgrade: During a rolling upgrade, the mixed state in Swarm versions among nodes may cause errors in the Swarm UI (and in management API calls). Use the legacy Admin Console (port 90) to monitor the rolling upgrade. (SWAR-8716)
Settings Changes - These settings are new with this release:
scsp.defaultFeedSendTimeout
, (default 30 seconds) a non-persisted node-level setting that sets the timeout on a feed SEND request, if the timeout=true query argument is provided. (SWAR-8441).chassis.name
, (default blank), a node-level setting that stores a user-defined chassis name. (SWAR-8823)
Differences in
scsp.forceLegacyNonce
configuration depending on the version upgrading from (SWAR-9020):If currently running a Swarm Storage version prior to 11.1, and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:
Before upgrading, set
scsp.forceLegacyNonce=true
in thenode.cfg
file. After the upgrade, when the cluster is fully up, updatescsp.forceLegacyNonce=false
usingswarmctl
and changescsp.forceLegacyNonce=false
in thenode.cfg
file.If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:
Before upgrading, verify
scsp.forceLegacyNonce=false
is in the node.cfg file and verify usingswarmctl
thatscsp.forceLegacyNonce=false
in the cluster.
Use swarmctl to Check or Change Settings
Use 'swarmctl -C scsp.forceLegacyNonce'
to check the value of scsp.forceLegacyNonce
.
Use 'swarmctl -C scsp.forceLegacyNonce -V False'
to set the value to false
.
For more details, see https://support.cloud.caringo.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.
Impacts for 11.1
Upgrading Elasticsearch: Use Elasticsearch 5.6.12/2.3.3 with Storage 11.1 if moving to ES 6 immediately is not possible, but start the migration now (see Migrating from Older Elasticsearch). Support for ES 5.6.12/2.3.3 ends in a future release, and testing for 2.3.3 with Swarm 11 is discontinued. Important: Always upgrade Swarm Search and Metrics at the same time ES is upgraded. Do not run an ES 5 Search or Metrics Curator against ES 6.
Swarm Search and Metrics: This release includes new versions of Swarm Search and Metrics RPMs. Both require Python 3 to be installed on the ES servers they run on.
For Swarm Metrics on RHEL/CentOS 7.7, first install this dependency:
yum install epel-release
Python 3: Install Python 3 if is not automatically installed with RHEL/CentOS 7.
Propagate Delete Removed: For Replication Feeds, the Propagate Deletes option is removed from the legacy Admin Console and the Management API (propagateDeletes, nodeletes fields). (SWAR-8609, SWAR-8615)
Swarm Configuration: Run the Storage Settings Checker before upgrading to this version, to identify configuration issues.
The Storage Settings Checker now requires Python 3 to be installed. (SWAR-8742)
crier.deadVolumeWall has been unpublished for reimplementation. (SWAR-8640)
S3 Backup Restore: The S3 Backup Restore Tool has been migrated to Python 3.6. If the tool is installed, uninstall it and install the new version. (SWAR-8703)
Upgrade Process: During the upgrade to 11.1, it may not be possible to monitor the cluster via the Swarm UI. Workaround: Use the legacy Admin Console (port 90) during upgrade. (SWAR-8716)
Differences in
scsp.forceLegacyNonce
configuration depending on the version being upgraded from (SWAR-9020):If currently running a Swarm Storage version prior to 11.1 and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:
Before upgrading, set
scsp.forceLegacyNonce=true
in thenode.cfg
file. After the upgrade, when the cluster is fully up, updatescsp.forceLegacyNonce=false
usingswarmctl
and changescsp.forceLegacyNonce=false
in thenode.cfg
file.If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:
Before upgrading, verify
scsp.forceLegacyNonce=false
is in the node.cfg file and verify usingswarmctl
thatscsp.forceLegacyNonce=false
in the cluster.
Use swarmctl to Check or Change Settings
Use 'swarmctl -C scsp.forceLegacyNonce'
to check the value of scsp.forceLegacyNonce
.
Use 'swarmctl -C scsp.forceLegacyNonce -V False'
to set the value to false
.
For more details, see https://support.cloud.caringo.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.
Impacts for 11.0
Upgrading Elasticsearch: You may use Elasticsearch 2.3.3 with Storage 11.0 if you cannot move to 5.6 now, but plan the migration immediately (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release, and testing with Swarm 11 is discontinuing.
Propagate Deletes Deprecated: The option to disable Propagate Deletes on Replication Feeds is deprecated; use Object Versioning to preserve deleted content. Do not disable Propagate Deletes when versioning is enabled or when defining an S3 Backup. (SWAR-8609)
Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.
Changed settings:
ec.segmentConsolidationFrequency
(ecSegmentConsolidationFrequency
in SNMP) has an improved default (10), which you must apply to your cluster when you upgrade. (SWAR-8483)cluster.name
is now required. Add it to thecluster.cfg
file. (SWAR-8466).metrics.nodeExporterFrequency
(metricsExporterFrequency
in SNMP) is now a persisted cluster setting. (SWAR-8467).
Removed settings:
chassis.processes
is allowed but is ignored.
Numerous settings are now promoted to cluster-level (versus node-level) scope, so you can manage them via Settings > Cluster in the Swarm UI (SWAR-8457):
console.expiryErrInterval
console.expiryWarnInterval
console.indexErrorLevel
console.indexWarningLevel
console.port
console.reportStyleUrl
console.spaceErrorLevel
console.spaceWarnLevel
console.styleUrl
feeds.retry
feeds.statsReportInterval
health.parallelWriteTimeout
log.obscureUUIDs
metrics.enableNodeExporter
network.dnsDomain
network.dnsServers
network.icmpAcceptRedirects
network.igmpVersion
network.mtu
startup.certificates
Impacts for 10.2
Upgrading Elasticsearch — You may continue to use Elasticsearch 2.3.3 with Storage 10.2 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release. the upgrade to Elasticsearch 5.6 must be completed before upgrading to Gateway 6.0.
Configuration Settings — Run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues. Note these changes:
ec.protectionLevel
is now persisted. (SWAR-8231)index.ovMinNodes=3
is the new default for the overlay index, in support of Swarm 10's new architecture. To keep your overlay index operational, set this new value in your cluster, through the UI or by SNMP (overlayMinNodes). (SWAR-8278)metrics.enableNodeExporter
can be set to True, which enables the Prometheus Node Exporter on that node. (SWAR-8408, SWAR-8578)metrics.nodeExporterFrequency
, a new dynamic setting, sets how frequently to refresh Swarm-specific Prometheus metrics in Elasticsearch; it defaults to 0, which disables this export. (SWAR-8408).
Impacts for 10.1
Upgrading Elasticsearch — Continue to use Elasticsearch 2.3.3 with Storage 10.1 until able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release. Complete the upgrade to Elasticsearch 5.6 before upgrading to Gateway 6.0.
Configuration Settings — Run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues.
metrics.enableNodeExporter=true
enables Swarm to run the Prometheus node exporter on port 9100. (SWAR-8170)
IP address update delay — When upgrading from Swarm 9 to the new architecture of Swarm 10, note the "ghosts" of previously used IP addresses may appear in the Storage UI; these resolve within 4 days. (SWAR-8351)
Update MIBs on CSN — Before upgrading to Storage 10.x, the MIBs on the CSN must be updated. From the Swarm Support tools bundle, run the
platform-update-mibs.sh
script. (CSN-1872)
Impacts for 10.0
Upgrading Elasticsearch: You may continue to use Elasticsearch 2.3.3 with Storage 10.0 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release.
Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.
Changes for the new single-IP dense architecture:
network.ipAddress
- multiple IP addresses now disallowedchassis.processes
- removed; multi-server configurations are no longer supportedec.protectionLevel
- new value "volume"ec.subclusterLossTolerance
- removed
Changes for security (see next section)
security.administrators, security.operators
- removed 'snmp' usersnmp.rwCommunity, snmp.roCommunity
- new settings for 'snmp' userstartup.certificates
- new setting to hold any and all public keys
New settings:
disk.atimeEnabled
health.parallelWriteTimeout
search.pathDelimiter
Required SNMP Security Change: Remove the
snmp
key from thesecurity.administrators
setting, and updatesnmp.rwCommunity
with its value. Nodes that contain only thesnmp
key in thesecurity.administrators
setting does not boot. If you changed the default value of the snmp key in thesecurity.operators
setting, updatesnmp.roCommunity
with that value and then remove thesnmp
key fromsecurity.operators
. In thesecurity.operators
setting, 'snmp
' is a reserved key, and it cannot be an authorized console operator name. (SWAR-8097)EC Protection
Best practice: Use
ec.protectionLevel=node
, which distributes segments across the cluster's physical/virtual machines. Do not useec.protectionLevel=subcluster
unless you already have subclusters defined and are sure the specified EC encoding is supported. A new level,ec.protectionLevel=volume
, allows EC writes to succeed if you have a small cluster with fewer than (k+p)/p nodes. (Swarm always seeks the highest protection possible for EC segments, regardless of the level you set.)Optimize hardware for EC by verifying there are more than k+p subclusters/nodes (as set by
ec.protectionLevel
); for example, withpolicy.ecEncoding=5:2
, you need at least 8 subclusters/nodes. When Swarm cannot distribute EC segments adequately for protection, EC writes can fail despite ample free space. (SWAR-7985)Setting
ec.protectionLevel=subcluster
without creating subclusters (definingnode.subcluster
across sets of nodes) causes a critical error and lowers the protection level to 'node'. (SWAR-8175)
Small Clusters: Verify the following settings if using 10 or fewer Swarm nodes. Do not use fewer than 3 in production.
Important: If you need to change any, do so before upgrading to Swarm 10.policy.replicas: The
min
anddefault
values for numbers of replicas to keep in your cluster must not exceed your number of nodes. For example, a 3-node cluster may have onlymin=2
ormin=3
.EC Encoding and Protection: For EC encoding, verify you have enough nodes to support the cluster's encoding (
policy.ecEncoding
). For EC writes to succeed with fewer than (k+p)/p nodes, use the new level,ec.protectionLevel=volume
.Best Practice: Keep at least one physical machine in your cluster beyond the minimum number needed. This allows for one machine to be down for maintenance without compromising the constraint.
Cluster in a Box: Swarm supports a "cluster in a box" configuration as long as that box is running a virtual machine host and Swarm instances are running in 3 or more VMs. Each VM boots separately and has its own IP address. Follow the recommendations for small clusters, substituting VMs for nodes. If you have two physical machines, use the "cluster in a box" configuration, but move to direct booting of Swarm with 3 or more.
Offline Node Status: Because Swarm 10's new architecture reduces the number of IP addresses in your storage cluster, you may see the old IPs and subclusters reporting as Offline nodes until they timeout in 4 days (
crier.forgetOfflineInterval
), which is expected.
Info
The Multipath support is obselete from Swarm 10 onward.
Watch Items and Known Issues
The following watch items are known:
Volumes newly formatted in Swarm 11.1, 11.2, or 11.3 to use encryption-at-rest cannot be downgraded to Swarm 11.0 or earlier without a special procedure to prevent data loss. Contact DataCore Support before any such downgrade with encrypted volumes. (SWAR-8941)
Infrequent WARNING messages, "Node/Volume entry not published due to lock contention (...); action will be retried," may appear in logs. Unless they are frequent, they may be ignored. (SWAR-8802)
If a node mounts an encrypted volume that is missing the encryption key in the configuration, the node fails to mount all disks in the node. (SWAR-8762)
S3 Backup feeds do not backup logical objects greater than 5 GB. (SWAR-8554)
When restarting a cluster of virtual machines that are UEFI-booted (versus legacy BIOS), the chassis shut down but do not come back up. (SWAR-8054)
With multipath-enabled hardware, the Swarm console Disk Volume Menu may erroneously show too many disks, having multiplied the actual disks in use by the number of possible paths to them. (SWAR-7248)
These are standing operational limitations:
If downgrading from Swarm 11.0, CRITICAL errors may appear on the feeds. To stop the errors, edit the existing feed definition names via the Swarm UI or legacy Admin Console. (SWAR-8543)
If the Elasticsearch cluster is wiped, the Storage UI shows no NFS config. Contact DataCore Support for help repopulating the SwarmFS config information. (SWAR-8007)
If a bucket is deleted, any incomplete multipart upload into that bucket leaves the parts (unnamed streams) in the domain. To find and delete them, use the s3cmd utility (search the Support site for "s3cmd" for guidance). (SWAR-7690)
Removing subcluster assignments in the CSN UI creates invalid config parameters preventing the unassigned nodes from booting. (SWAR-7675)
Logs showed the error "FEEDS WARNING: calcFeedInfo(etag=xxx) cannot find domain xxx, which is needed for a domains-specific replication feed". The root cause is fixed; if such warnings are received, contact DataCore Support so the issue can be resolved. (SWAR-7556)
If a feed is subject to a prolonged outage, a node reboot may be required for it to resume progress after the outage is cleared. If progress is not resolved after the reboot, contact DataCore Support. This has been resolved in 12.1.0 (SWAR-9062)
If Elasticsearch 6.8.6 blocks an index due to low disk space, this needs to be issued against each index (
index_*
,csmeter*
,metrics*
) in theread_only_allow_delete
state. This is no longer an issue after upgrading to Swarm 12 / Elasticsearch 7 as it automatically unblocks when disk space frees up. (SWAR-8944)curl -i -XPUT "<ESSERVERIP>:9200/<INDEXNAME>/_settings" -d '{"index.blocks.read_only_allow_delete" : null}' -H "Content-Type: application/json"
Upgrading Swarm
Note these installation issues when upgrading Swarm:
The elasticsearch-curator package may show an error during an upgrade, which is a known curator issue. Workaround: Reinstall the curator. (SWAR-7439)
yum reinstall elasticsearch-curator
Do not install the Swarm Search RPM before installing Java. If Gateway startup fails with "Caringo script plugin is missing from indexer nodes", uninstall and reinstall the Swarm Search RPM. (SWAR-7688)
Proceed to How to Upgrade Swarm to upgrade Swarm 9 or higher.
Important
Contact DataCore Support for guidance if needing to migrate from Swarm 8.x or earlier.