New Features
Expanded SEND — SCSP SEND, an admin-only method that allows forcing an object to be written immediately in another cluster, now works with every type of and any number of Swarm feeds: replication, search, and S3 backup. The expanded functionality works through several new query arguments, two to specify which feed IDs or types to target and one to control the timeout (if any) for the request to complete. See SCSP SEND. (SWAR-8441)
Elasticsearch — This release focuses on changes that make it easier to monitor and manage Elasticsearch, Swarm Search, and Swarm Metrics:
The Swarm Search RPM installation now checks and warns if
firewalld
is enabled, reminding to check the firewall rules for ports 9200 and 9300, which are needed by Elasticsearch. (SWAR-8416)Swarm dynamically updates DNS lookups after Elasticsearch nodes are restarted. (SWAR-8817)
The Swarm Metrics curator is now independent of HTTP_PROXY and related shell environment variables and so is less subject to disruption. (SWAR-8452)
The Swarm Metrics curator has improved defaults for its logging, increased to 10 logs and up to 10 MB. (SWAR-8401)
This release also includes changes to help with Swarm management and performance:
Swarm now ships with the Prometheus Node Exporter enabled and configured to work by default, to simplify implementation and avoid rebooting. To disable the Node Exporter on a node, set "metrics.enabledNodeExporter=False" in the node's configuration file; to disable across the entire cluster, set metrics.nodeExporterFrequency to 0. (SWAR-8578)
Swarm's inter-process locking process has been reworked, granting a small performance gain for larger clusters and a reduction in related WARNING-level log messages. (SWAR-8835)
Swarm has restored performance for clients who have not yet migrated from legacy authentication/authorization. (SWAR-8810)
Additional Changes
These items are other changes, including those that come from testing and user feedback.
OSS Versions — See Third-Party Components for 11.2 for the complete listing of packages and versions.
Fixed in 11.2.0
The multipart write 202 response now includes Location headers of the resulting manifests that are analogous to the Location headers of a normal EC write. (SWAR-8886)
Resolved an error in the assessment of licensed space usage that prevented a node from accepting writes. (SWAR-8869)
Resolved an issue related to TCP window sizes that can cause socket disconnects, pauses, and hangs. (SWAR-8847)
Resolved an issue that can lead to a node crash in large clusters. (SWAR-8832)
Basic auth of the admin user for special administrative SCSP requests did not correctly handle a stored hashed admin password. (SWAR-8814)
Infrequent WARNING messages, "Node/Volume entry not published due to lock contention (...); action will be retried," may appear in logs. (SWAR-8802)
Resolved an issue causing rebooted Swarm nodes to allow client requests before mounting all volumes. (SWAR-8801)
Upgrade Impacts
Use the supported versions of Swarm components for the target version of Elasticsearch:
Elasticsearch 6.8.6 | Swarm Storage 11.1 - 11.2 | Gateway 6.3 | SwarmFS 2.4 | Recommended configuration. |
Elasticsearch 5.6.12 | Swarm Storage 10.0 - 11.2 | Gateway 6.0 - 6.3 | SwarmFS 2.4 | Plan to migrate to Elasticsearch 6. |
Elasticsearch 2.3.3 | Swarm Storage 9.6 - 11.2 | Gateway 5.4 | SwarmFS 2.1 |
These items are changes to the product function that may require operational or development changes for integrated applications. Address the upgrade impacts for each of the versions since the currently running version:
Impacts for 11.2
Upgrading Elasticsearch — Elasticsearch 5.6.12/2.3.3 may be used with Storage 11 if move to ES 6 cannot be performed immediately, but start the migration now (see Migrating from Older Elasticsearch). Support for ES 5.6.12/2.3.3 ends in a future release, and testing for 2.3.3 with Swarm 11 is discontinued. Important: Always upgrade Swarm Search and Metrics at the same time upgrading ES. Do not run an ES 5 Search or Metrics Curator against ES 6.
Rolling upgrade — During a rolling upgrade, the mixed state in Swarm versions among nodes may cause errors in the Swarm UI (and in management API calls). Use the legacy Admin Console (port 90) to monitor the rolling upgrade. (SWAR-8716)
Settings changes — These settings are new with this release:
scsp.defaultFeedSendTimeout
, (default 30 seconds) a non-persisted node-level setting that sets the timeout on a feed SEND request, if the timeout=true query argument is provided. (SWAR-8441).chassis.name
, (default blank), a node-level setting that stores a user-defined chassis name. (SWAR-8823)
Differences in
scsp.forceLegacyNonce
configuration depending on the version upgrading from (SWAR-9020):If currently running a Swarm Storage version prior to 11.1, and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:
Before upgrading, set
scsp.forceLegacyNonce=true
in thenode.cfg
file. After the upgrade, when the cluster is fully up, updatescsp.forceLegacyNonce=false
usingswarmctl
and changescsp.forceLegacyNonce=false
in thenode.cfg
file.If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:
Before upgrading, verify that
scsp.forceLegacyNonce=false
is in the node.cfg file and verify usingswarmctl
thatscsp.forceLegacyNonce=false
in the cluster.Use swarmctl to check or change settings
Use
'swarmctl -C scsp.forceLegacyNonce'
to check the value ofscsp.forceLegacyNonce
.Use
'swarmctl -C scsp.forceLegacyNonce -V False'
to set the value tofalse
.For more details, see https://support.cloud.caringo.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.
Impacts for 11.1
Upgrading Elasticsearch: Use Elasticsearch 5.6.12/2.3.3 with Storage 11.1 if moving to ES 6 immediately is not possible, but start the migration now (see Migrating from Older Elasticsearch). Support for ES 5.6.12/2.3.3 ends in a future release, and testing for 2.3.3 with Swarm 11 is discontinued. Important: Always upgrade Swarm Search and Metrics at the same time ES is upgraded. Do not run an ES 5 Search or Metrics Curator against ES 6.
Swarm Search and Metrics: This release includes new versions of Swarm Search and Metrics RPMs. Both require Python 3 to be installed on the ES servers they run on.
For Swarm Metrics on RHEL/CentOS 7.7, first install this dependency:
yum install epel-release
Python 3: Install Python 3 if is not automatically installed with RHEL/CentOS 7.
Propagate Delete Removed: For Replication Feeds, the Propagate Deletes option is removed from the legacy Admin Console and the Management API (propagateDeletes, nodeletes fields). (SWAR-8609, SWAR-8615)
Swarm Configuration: Run the Storage Settings Checker before upgrading to this version, to identify configuration issues.
The Storage Settings Checker now requires Python 3 to be installed. (SWAR-8742)
crier.deadVolumeWall has been unpublished for reimplementation. (SWAR-8640)
S3 Backup Restore: The S3 Backup Restore Tool has been migrated to Python 3.6. If the tool is installed, uninstall it and install the new version. (SWAR-8703)
Upgrade Process: During the upgrade to 11.1, it may not be possible to monitor the cluster via the Swarm UI. Workaround: Use the legacy Admin Console (port 90) during upgrade. (SWAR-8716)
Differences in
scsp.forceLegacyNonce
configuration depending on the version being upgraded from (SWAR-9020):If currently running a Swarm Storage version prior to 11.1 and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:
Before upgrading, set
scsp.forceLegacyNonce=true
in thenode.cfg
file. After the upgrade, when the cluster is fully up, updatescsp.forceLegacyNonce=false
usingswarmctl
and changescsp.forceLegacyNonce=false
in thenode.cfg
file.If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:
Before upgrading, verify
scsp.forceLegacyNonce=false
is in the node.cfg file and verify usingswarmctl
thatscsp.forceLegacyNonce=false
in the cluster.
Use swarmctl to Check or Change Settings
Use 'swarmctl -C scsp.forceLegacyNonce'
to check the value of scsp.forceLegacyNonce
.
Use 'swarmctl -C scsp.forceLegacyNonce -V False'
to set the value to false
.
For more details, see https://support.cloud.caringo.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.
Impacts for 11.0
Upgrading Elasticsearch: You may use Elasticsearch 2.3.3 with Storage 11.0 if you cannot move to 5.6 now, but plan the migration immediately (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release, and testing with Swarm 11 is discontinuing.
Propagate Deletes Deprecated: The option to disable Propagate Deletes on Replication Feeds is deprecated; use Object Versioning to preserve deleted content. Do not disable Propagate Deletes when versioning is enabled or when defining an S3 Backup. (SWAR-8609)
Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.
Changed settings:
ec.segmentConsolidationFrequency
(ecSegmentConsolidationFrequency
in SNMP) has an improved default (10), which you must apply to your cluster when you upgrade. (SWAR-8483)cluster.name
is now required. Add it to thecluster.cfg
file. (SWAR-8466).metrics.nodeExporterFrequency
(metricsExporterFrequency
in SNMP) is now a persisted cluster setting. (SWAR-8467).
Removed settings:
chassis.processes
is allowed but is ignored.
Numerous settings are now promoted to cluster-level (versus node-level) scope, so you can manage them via Settings > Cluster in the Swarm UI (SWAR-8457):
console.expiryErrInterval
console.expiryWarnInterval
console.indexErrorLevel
console.indexWarningLevel
console.port
console.reportStyleUrl
console.spaceErrorLevel
console.spaceWarnLevel
console.styleUrl
feeds.retry
feeds.statsReportInterval
health.parallelWriteTimeout
log.obscureUUIDs
metrics.enableNodeExporter
network.dnsDomain
network.dnsServers
network.icmpAcceptRedirects
network.igmpVersion
network.mtu
startup.certificates
Impacts for 10.2
Upgrading Elasticsearch — You may continue to use Elasticsearch 2.3.3 with Storage 10.2 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release. the upgrade to Elasticsearch 5.6 must be completed before upgrading to Gateway 6.0.
Configuration Settings — Run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues. Note these changes:
ec.protectionLevel
is now persisted. (SWAR-8231)index.ovMinNodes=3
is the new default for the overlay index, in support of Swarm 10's new architecture. To keep your overlay index operational, set this new value in your cluster, through the UI or by SNMP (overlayMinNodes). (SWAR-8278)metrics.enableNodeExporter
can be set to True, which enables the Prometheus Node Exporter on that node. (SWAR-8408, SWAR-8578)metrics.nodeExporterFrequency
, a new dynamic setting, sets how frequently to refresh Swarm-specific Prometheus metrics in Elasticsearch; it defaults to 0, which disables this export. (SWAR-8408).
Impacts for 10.1
Upgrading Elasticsearch — Continue to use Elasticsearch 2.3.3 with Storage 10.1 until able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release. Complete the upgrade to Elasticsearch 5.6 before upgrading to Gateway 6.0.
Configuration Settings — Run the Storage Settings Checker before any Swarm 10 upgrade to identify configuration issues.
metrics.enableNodeExporter=true
enables Swarm to run the Prometheus node exporter on port 9100. (SWAR-8170)
IP address update delay — When upgrading from Swarm 9 to the new architecture of Swarm 10, note the "ghosts" of previously used IP addresses may appear in the Storage UI; these resolve within 4 days. (SWAR-8351)
Update MIBs on CSN — Before upgrading to Storage 10.x, the MIBs on the CSN must be updated. From the Swarm Support tools bundle, run the
platform-update-mibs.sh
script. (CSN-1872)
Impacts for 10.0
Upgrading Elasticsearch: You may continue to use Elasticsearch 2.3.3 with Storage 10.0 until you are able to move to 5.6 (see Migrating from Older Elasticsearch). Support for ES 2.3.3 ends in a future release.
Configuration Settings: Run the Storage Settings Checker to identify these and other configuration issues.
Changes for the new single-IP dense architecture:
network.ipAddress
- multiple IP addresses now disallowedchassis.processes
- removed; multi-server configurations are no longer supportedec.protectionLevel
- new value "volume"ec.subclusterLossTolerance
- removed
Changes for security (see next section)
security.administrators, security.operators
- removed 'snmp' usersnmp.rwCommunity, snmp.roCommunity
- new settings for 'snmp' userstartup.certificates
- new setting to hold any and all public keys
New settings:
disk.atimeEnabled
health.parallelWriteTimeout
search.pathDelimiter
Required SNMP Security Change: Remove the
snmp
key from thesecurity.administrators
setting, and updatesnmp.rwCommunity
with its value. Nodes that contain only thesnmp
key in thesecurity.administrators
setting does not boot. If you changed the default value of the snmp key in thesecurity.operators
setting, updatesnmp.roCommunity
with that value and then remove thesnmp
key fromsecurity.operators
. In thesecurity.operators
setting, 'snmp
' is a reserved key, and it cannot be an authorized console operator name. (SWAR-8097)EC Protection
Best practice: Use
ec.protectionLevel=node
, which distributes segments across the cluster's physical/virtual machines. Do not useec.protectionLevel=subcluster
unless you already have subclusters defined and are sure the specified EC encoding is supported. A new level,ec.protectionLevel=volume
, allows EC writes to succeed if you have a small cluster with fewer than (k+p)/p nodes. (Swarm always seeks the highest protection possible for EC segments, regardless of the level you set.)Optimize hardware for EC by verifying there are more than k+p subclusters/nodes (as set by
ec.protectionLevel
); for example, withpolicy.ecEncoding=5:2
, you need at least 8 subclusters/nodes. When Swarm cannot distribute EC segments adequately for protection, EC writes can fail despite ample free space. (SWAR-7985)Setting
ec.protectionLevel=subcluster
without creating subclusters (definingnode.subcluster
across sets of nodes) causes a critical error and lowers the protection level to 'node'. (SWAR-8175)
Small Clusters: Verify the following settings if using 10 or fewer Swarm nodes. Do not use fewer than 3 in production.
Important: If you need to change any, do so before upgrading to Swarm 10.policy.replicas: The
min
anddefault
values for numbers of replicas to keep in your cluster must not exceed your number of nodes. For example, a 3-node cluster may have onlymin=2
ormin=3
.EC Encoding and Protection: For EC encoding, verify you have enough nodes to support the cluster's encoding (
policy.ecEncoding
). For EC writes to succeed with fewer than (k+p)/p nodes, use the new level,ec.protectionLevel=volume
.Best Practice: Keep at least one physical machine in your cluster beyond the minimum number needed. This allows for one machine to be down for maintenance without compromising the constraint.
Cluster in a Box: Swarm supports a "cluster in a box" configuration as long as that box is running a virtual machine host and Swarm instances are running in 3 or more VMs. Each VM boots separately and has its own IP address. Follow the recommendations for small clusters, substituting VMs for nodes. If you have two physical machines, use the "cluster in a box" configuration, but move to direct booting of Swarm with 3 or more.
Offline Node Status: Because Swarm 10's new architecture reduces the number of IP addresses in your storage cluster, you may see the old IPs and subclusters reporting as Offline nodes until they timeout in 4 days (
crier.forgetOfflineInterval
), which is expected.
Info
The Multipath support is obselete from Swarm 10 onward.
Watch Items and Known Issues
The following operational limitations and watch items exist in this release.
If a node mounts an encrypted volume that is missing the encryption key in the configuration, the node fails to mount all of the disks in the node. (SWAR-8762)
S3 Backup feeds do not yet backup logical objects larger than 5 GB. (SWAR-8554)
If downgrading from Swarm 11.0, CRITICAL errors may appear on the feeds. To stop the errors, edit the existing feed definition names via the Swarm UI or legacy Admin Console. (SWAR-8543)
When restarting a cluster of virtual machines that are UEFI-booted (versus legacy BIOS), the chassis shut down but do not come back up. (SWAR-8054)
If wiping the Elasticsearch cluster, the Storage UI shows no NFS config. Contact DataCore Support for help repopulating the SwarmFS config information. (SWAR-8007)
If bucket is deleted, any incomplete multipart upload into that bucket leaves the parts (unnamed streams) in the domain. To find and delete them, use the s3cmd utility (search the Support site for "s3cmd" for guidance). (SWAR-7690)
Logs showed the error "FEEDS WARNING: calcFeedInfo(etag=xxx) cannot find domain xxx, which is needed for a domains-specific replication feed". The root cause is fixed; if such warnings are received, contact DataCore Support so the issue can be resolved. (SWAR-7556)
With multipath-enabled hardware, the Swarm console Disk Volume Menu may erroneously show too many disks, having multiplied the actual disks in use by the number of possible paths to them. (SWAR-7248)
Note these installation issues:
The elasticsearch-curator package may show an error during an upgrade, which is a known curator issue. Workaround: Reinstall the curator:
yum reinstall elasticsearch-curator
(SWAR-7439)Do not install the Swarm Search RPM before installing Java. If Gateway startup fails with "Caringo script plugin is missing from indexer nodes", uninstall and reinstall the Swarm Search RPM. (SWAR-7688)
Upgrading Swarm
Proceed to How to Upgrade Swarm to upgrade Swarm 9 or higher.
Important
Contact DataCore Support for guidance if needing to migrate from Swarm 8.x or earlier.