Swarm Storage 12.0 Release

New Features

Performance Gains: Swarm 12 offers overall performance improvements, most pronounced through Gateway, and for small object GETs.  

  • For efficiency, allow Gateway to perform redirects of GET requests to volume processes by enabling the new scsp.enableVolumeRedirects setting. If using SCSP and want to change the client application to take advantage of these redirects, contact DataCore Support. (SWAR-8758) (CLOUD-3205)

  • Inter-process communication within the cluster has been significantly streamlined, which boosts performance in dense clusters. (SWAR-8940)

  • Connection balancing between SCSP processes is improved, which helps performance. (SWAR-8933)

  • When there are surges in new SCSP connections, Swarm can now preemptively close new SCSP connections to protect the target node from crashing. (SWAR-8965)

  • Performance under high loads has been improved, including including 503, 404, and file descriptor exhaustion. (SWAR-8971, SWAR-8969)

S3 Backup Improvements: Swarm 12 includes significant expansion of S3 Backup capabilities:

  • Backup to Glacier: To lower the cost of disaster recovery, Swarm S3 Backup feeds can now target buckets that use non-standard "cold" storage classes, AWS S3 Glacier and S3 Glacier Deep Archive.

Note

Glacier may be more cost-effective at scale due to rounding policies of Deep Archive. See . (SWAR-8923)

  • When the S3 Restore tool recovers data from backup buckets that use Glacier storage classes, it uses additional configuration settings to support retrieval from archives. Recovery from cold storage may need multiple runs to complete. See . (SWAR-8967)

  • For the AWS transition to virtual hosted-style URLs, Swarm S3 Backup now supports the bucket-in-host request style. In the S3 Backup feed definition, "host" and "bucket" are still entered separately. (SWAR-8917)

  • For greater efficiency, S3 Backup feeds now skip backing up objects that the health processor has queued to delete. (SWAR-8931)

  • S3 Backup feeds have better logging and error handling, and the S3 Restore tool has improved messaging. (SWAR-8905, SWAR-8962, SWAR-8960).

Elasticsearch 7: Swarm 12.0 ships with and uses Elasticsearch 7.5.2, along with new versions of Swarm Search and Metrics RPMs. Upgrading requires no reindexing of your ES 6.8.6 data, so you can upgrade Elasticsearch in place, using the configuration script provided. (SWAR-8894, SWAR-8893).

  • Shard Control: The new Swarm setting search.numberOfShards allows adjusting the number of shards you want on new search indices as you scale your implementation (see ). The setting has no effect on existing indices; to change the shard count, create a new search feed or delete the existing ES index and Refresh the feed. See (SWAR-7276).

Feed Logging and Diagnostics - This release reworked logging to help you manage your feeds:

  • Replication feeds now have improved diagnostic logging for Gateway and proxy errors. (SWAR-8951, SWAR-8811)

  • Feeds that report "persistently failing" errors have better information to help with troubleshooting. (SWAR-8829)

  • Improved logging helps identify connection problems with Elasticsearch. (SWAR-8909)

  • Swarm monitors for Elasticsearch indices put in a read-only state due to insufficient file space on one or more Elasticsearch nodes. (SWAR-8944)

Networking and Booting - Multiple improvements to boot processes have made cluster starts faster, leaner, and sturdier:

  • DHCP lease management is improved, resulting in faster boot times. (SWAR-8867)

  • Boot times for VMs are faster because of better initialization of the kernel entropy pool. (SWAR-8926)

  • NTP handling is improved wherever network.timeSource is unspecified. (SWAR-8987)

  • Volumes using encryption-at-rest have better handling and future-proofing for upgrades. (SWAR-8941)

  • For clarity, network interface names now report as the native Linux kernel names and are no longer renamed to legacy "eth*" names. These native NIC names are referenced by the System Console menu, SNMP, and Prometheus. (SWAR-8021)

  • New setting snmp.enabled helps to disable SNMP cluster-wide and supports containers. (SWAR-8898)

Health Processing and Monitoring - Several enhancements support health processing and cluster administration:

  • Defragmentation to release trapped space is stopped when a volume is too full for it to proceed effectively. This does not affect the volume's ability to offload content. (SWAR-8787)

  • SwarmFS object uploads that are stalled “in progress”, are now timeout to allow consolidation and clean up of the uploaded parts. (SWAR-7699).

  • To help anticipate problems with storage drives, the driveTable in SNMP has three new columns: drivePowerOnHours (the drive's power-on hours), driveTempC (the drive's temperature in Celsius), and driveCompromisedCount (the sum of five SMART values; a non-zero sum may indicate an impending drive failure). (SWAR-8734)

  • Numerous improvements aid in support, such as crash handling, crash reporting, and clearer dmesg dumps. (SWAR-8979, SWAR-8988, SWAR-8798).

  • The Support tool swarmctl, which you can download as part of the Swarm Support Tool bundle (swarm-support-tools.tgz), has expanded support for cluster capacity alerting, SMART dumps, and volume tests. (SWAR-8806, SWAR-8731, SWAR-8769). 

  • Swarm generates trimmer logs, having removed overly frequent ERROR messages. (SWAR-8840)

Additional Changes

These items are other changes, including those that come from testing and user feedback.

OSS Versions

See  for the complete listing of packages and versions for this release.

  • The Linux kernel is upgraded to v5.4.61 and firmware is upgraded to 1.190. (SWAR-8956)

  • Intel network drivers i40e and ixgbe are updated. (SWAR-8845)

  • Debian 10 ("Buster") updates are incorporated into this version. (SWAR-8788)

Fixed in 12.0

  • An issue related to memory corruption can result in spurious errors and false 404 Not Found responses in some cases. (12.0.1: SWAR-9077)

  • In Swarm 12.0, range reads requests during high loads may return results of the correct length but with an erroneous carriage return (CR) and line feed (LF) character inserted at the beginning of the body. Contact DataCore Support if you experience this issue. (12.0.1: SWAR-9045)

  • A node rebooting into a cluster with a different IP address appeared as offline under its former IP address in the Swarm UI and the legacy Admin Console. (SWAR-8955)

  • Volume retires can become stuck due to remaining objects needing lifepoint or other EC-related conversions. (SWAR-8945)

  • The legacy Admin Console now supports the deletion of multiple feeds at a time. (SWAR-8805)

  • Infrequent WARNING messages may appear in logs: "Node/Volume entry not published due to lock contention (...); action will be retried." (SWAR-8802)

Upgrade Impacts

Required

Complete the migration to Swarm 11.3 and ES 6.8.6 before upgrading to Swarm 12 if running Elasticsearch 5.6.12 or 2.3.3. See , Upgrading from Unsupported Elasticsearch.

These items are changes to the product function that may require operational or development changes for integrated applications. Address the upgrade impacts for each of the versions since the one you are currently running:

Impacts for 12.0

  • Upgrading Elasticsearch: Once on Elasticsearch 6.8.6 and using the new index as primary (see ), proceed with your Swarm 12 upgrade to Elasticsearch 7. Reminder: Always upgrade Swarm Search and Metrics at the same time ES is upgraded. 

  • Rolling Upgrade: During a rolling upgrade from a version older than 11.1, the mixed state in Swarm versions among nodes may cause errors in the Swarm UI, swarmctl tool, and management API calls. Use the legacy Admin Console (port 90) to monitor the rolling upgrade. (SWAR-8716)

  • Settings Changes 

    • New: scsp.enableVolumeRedirects (for use with Content Gateway)

    • New: search.numberOfShards 

    • New: snmp.enabled

    • Changed: network.dnsDomain is no longer required when network.dnsServers is defined; name servers may be defined without a domain. (SWAR-3415)

  • Replicated Clusters: If you use replication feeds between remote clusters, upgrade and downgrade versions of Storage in those clusters at the same time. This guarantees any objects Swarm 12 converts from replication to erasure-coding protection using version 12.0+ mechanisms are handled properly. (SWAR-8957)

  • Encryption-at-Rest: If you are about to upgrade from Swarm 11.0 or earlier and you use encryption-at-rest, contact DataCore Support to verify smoothly rolling back to the prior version if needed. (SWAR-8941)

  • Named NICs: You need to change the "castor_net" kernel argument if you have defined a custom list of included NIC names. Example: "castor_net=active-backup:eth0,eth1" (SWAR-8021)

  • Upgrading with CSN NetBoot Protection: The streamlining of network interface handling in 12.0 can affect the upgrading of some CSN implementations. If you run NetBoot protection on a single-network CSN, all MAC addresses for the storage nodes must be included in the DHCP allow-list; if not, the Swarm 11 nodes can fail to get a DHCP network address from the CSN when upgrading to 12. Follow this one-time process if this occurs:

    1. Temporarily disable the network protection.

    2. Reboot the nodes (which assigns new IPs where needed, as available in your range).

    3. Add the new MAC addresses (which you can list from the System Menu) to the DHCP allow-list, and restart the DHCP service.

    4. Re-enable network protection, and boot any storage nodes that failed to restart.

  • Invalid Licenses: Swarm 12.0 no longer supports Dell OEM-style licenses, and it does not boot if the configured license is invalid or expired. Contact DataCore Support for a new license. (SWAR-9036, SWAR-9050)

  • Chassis ID Limitation: Before upgrading storage nodes to 12.0.x, contact DataCore Support to verify the nodes are able to join the network correctly. There is an issue with some chassis IDs preventing them from completing the boot up. This is corrected in Swarm 12.1.0. (SWAR-9121)

  • Invalid or Expired Swarm Licenses: Swarm 12.0 does not boot if the configured license is invalid or expired. Use a valid license. This is corrected in Swarm 12.1.0. (SWAR-9050)

  • Differences in scsp.forceLegacyNonce configuration depending on the version being upgraded from (SWAR-9020)

  • If currently running a Swarm Storage version prior to 11.1, and upgrading to 11.1, 11.2, 11.3, 12.0 or 12.1:

    Before upgrading, set scsp.forceLegacyNonce=true in the node.cfg file. After the upgrade, when the cluster is fully up, update scsp.forceLegacyNonce=false using swarmctl and change scsp.forceLegacyNonce=false in the node.cfg file.

    If currently running a Swarm Storage version 11.1, 11.2, 11.3, 12.0 or 12.1 and upgrading to another version from that list:

    Before upgrading, verify scsp.forceLegacyNonce=false is in the node.cfg file and verify using swarmctl that scsp.forceLegacyNonce=false in the cluster.

Use swarmctl to Check or Change Settings

Use 'swarmctl -C scsp.forceLegacyNonce' to check the value of scsp.forceLegacyNonce.

Use 'swarmctl -C scsp.forceLegacyNonce -V False' to set the value to false.

For more details, see https://support.cloud.datacore.com/tools/Tech-Support-Scripts-Bundle-swarmctl.pdf.

Watch Items and Known Issues

The following watch items are known:

  • When you use certificates with HAProxy, S3 Backup restoration to the cluster may be blocked if the certificate is not located where Swarm expects it. (SWAR-8996)

  • If a node mounts an encrypted volume that is missing the encryption key in the configuration, the node fails to mount all disks in the node. (SWAR-8762)

  • S3 Backup feeds do not backup logical objects greater than 5 GB; those writes fail with a CRITICAL log message. (SWAR-8554)

  • When restarting a cluster of virtual machines that are UEFI-booted (versus legacy BIOS), the chassis shuts down and does not come back up. (SWAR-8054)

These are standing operational limitations:

  • If you wipe your Elasticsearch cluster, the Storage UI shows no NFS config. Contact DataCore Support for help repopulating your SwarmFS config information. (SWAR-8007)

  • If you delete a bucket, any incomplete multipart upload into that bucket leaves the parts (unnamed streams) in the domain. To find and delete them, use the s3cmd utility (search the Support site for "s3cmd" for guidance). (SWAR-7690)

  • Removing subcluster assignments in the CSN UI creates invalid config parameters preventing unassigned nodes from booting. (SWAR-7675)

  • You may see false 404 Not Found and other SCSP errors during rolling reboot in versions 11.1 through 12.0.1. To mitigate this problem, set scsp.forceLegacyNonce=False in the cluster configuration. You need to remove this setting before upgrading to 12.1.0 or later. (SWAR-9020)

  • A feed SEND request for a replication feed that is changing its state to "blocked" during the request can potentially run indefinitely, rather than giving an error condition. This may impact the Remote Synchronous Write (RSW) feature used by the Gateway. This issue is addressed in Swarm 12.1.0. (SWAR-9019)

  • During a node reboot, such as a rolling reboot of the cluster, a newly booted node can temporarily return an empty result set for a listing query. (SWAR-9083)

  • If a feed is subject to a prolonged outage, a node reboot may be required for it to resume progress after the outage is cleared. If progress is not resolved after the reboot, contact DataCore Support. This has been resolved in 12.1.0 (SWAR-9062)

  • When editing and saving a search feed in Swarm UI you may get a red error box mentioning "respondsToLists". You need to use the Swarm console instead to edit this feed. (SWAR-9065)

Upgrading Swarm

Consider these installation issues when upgrading Swarm:

  • The ‘elasticsearch-curator’ package may show an error during an upgrade, which is a known curator issue. Workaround: Reinstall the curator: yum reinstall elasticsearch-curator (SWAR-7439)

  • Do not install the Swarm Search RPM before installing Java. If Gateway startup fails with "Caringo script plugin is missing from indexer nodes", uninstall and reinstall the Swarm Search RPM. (SWAR-7688)

Proceed to  to upgrade Swarm 9 or higher. Contact DataCore Support for guidance on migrating from Swarm 8.x or earlier.

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.