Swarm Storage 15.0 Release

New Features

  • Optional Multicast: The cluster can be configured via the Storage Cluster Services (SCS 1.4) to not use multicast for inter-node communication. Changing between modes requires a full, non-rolling cluster reboot. (CUP-622)

  • EC Performance: Depending on loads, hardware, and EC configuration, customers should see a significant performance improvement for EC read and small object reads and writes. Very large EC object writes may suffer slightly degraded performance. (SWAR-9486, SWAR-9543, and SWAR-9541)

  • EC Objects Conversion: The setting ec.convertToPolicy was added in Swarm 14.1 to convert EC objects to the current policy encoding at a rate based on ec.conversionPercentage, In Swarm 15.0 EC objects smaller than the minimum EC encoding size (based on policy) will be converted to whole replicas. This conversion results in more efficient storage and access of these objects. (SWAR-7444).

Note

In Swarm 15.0, the conversion applies to both historical and current versions. After the conversion, it takes health.segLifepointUpdateInterval (default is one day) until HP deletes the original EC segments.

  • Push Threads on Replication and S3 Backup Feeds: The meaning of push threads in replication feed and S3 backup feed definitions has changed to push threads per Swarm Storage node. Formerly, it was push threads per volume. The new default is 20 (previously 6). Customers should consider make a compensating change in their replication and S3 backup feed definitions. This change allows better throttling of these feeds in situations where bandwidth is limited. (SWAR-8672)

  • Pausing a Misbehaving Replication/S3 Backup Feeds: A new feature has been added to pause a replication feed or S3 backup feed that is being subject to high rates of disconnections. The default limit for the setting feeds.pauseDisconnectPerHourLimit is 1000. This feature is intended to limit wasted retrying and limit trapped space creation in replication feed target clusters. When this occurs, a CRITICAL log message is issued. The feed can be resumed when the underlying issue has been resolved. (SWAR-9442)

  • Measuring Disk Performance: A fio-based tool is added to measure disk performance, which is visible in the port 90 console and via the management API. (SWAR-9439)

Additional Changes

Changes include versions and fixes coming from testing and user feedback:

OSS Versions

See https://caringo.atlassian.net/wiki/spaces/public/pages/3072983097 for the complete listing of packages and versions for this release.

Fixed in 15.0

  • Overlay Index Inflation: Fixed an issue where the overlay index would not be fully populated during post-boot inflation, resulting in transient false 404 responses. (SWAR-9463)

  • Overlay Index Monitoring: The overlay index-related stats are added to the node exporter and SNMP for monitoring the overlay index and its relation to performance. (SWAR-9465)

  • Updating the Search Feed is Not a Blocker: An update to the search feed definition is possible to perform via port 90 console or Management API, without any blockage. This was a regression in 14.1 releases. (SWAR-9515)

  • Incomplete Retires: Fixed an issue in 14.1 where retires would stop on remaining context objects in clusters with fewer than 16 nodes and, in some cases, remaining EC segments. (SWAR-9437)

  • SCSP Processes: Improved resilience to unresponsive SCSP processes, reducing the operational impact on the cluster. (SWAR-9531)

  • Versioned Objects Reclaim: Fixed an issue in 15.0.1 with EC versioned objects where they could become erroneously unavailable. (SWAR-9578)

  • Node Instability: Addressed a cause of node instability in 15.0.1 which was introduced in 15.0.0 due to EC conversions and consolidations. (SWAR-9542)

Upgrade Impacts

Required

Complete the migration to Swarm 11.3 and ES 6.8.6 before upgrading to Swarm 15 if running older Elasticsearch (5.6.12 or 2.3.3). See here for upgrading from an unsupported Elasticsearch version.

These items are changes to the product function that may require operational or development changes for integrated applications. Address the upgrade impacts for each of the versions since the currently running version:

Impacts for 15.0

  • The default log level has been changed from 40 to 30 and customers are encouraged to run with that log level in most cases. The cluster’s persistent settings stream remembers the log level so updates are manually applied to each cluster. (SWAR-9592)

  • Several improvements have been made in this release and 14.1 that improve the space efficiency of Elasticsearch indices and improved listing performance with Gateway 7.10 and later. Customers should review the following settings before creating new search feeds and potentially create a new search feed to re-index their clusters to take advantage of these capabilities.

    • The setting search.enableDelimiterPaths, introduced in 14.1, indexes more efficient path information in new search indices. Its default is now true, but customers may need to manually change this setting as the value is persisted in the cluster. For customers with existing search feeds (created prior to Swarm 14.1), search.enableDelimiterPaths being set displays a CRITICAL message after booting:
      "Setting search.enableDelimiterPaths=True, but the index <index name> doesn't support the 'paths' field. A new Elasticsearch index is required."
      This CRITICAL is advisory and does not impact the existing search feed. (SWAR-9495)

    • The setting search.enableCustomMetadataTyping, introduced in 14.1, helps to limit space usage by Elasticsearch in the common case where custom metadata typing is not used. Its default is now false, but customers may need to manually change this setting as the value is persisted in the cluster. (SWAR-9499)

    • Increased the field limit on new Elasticsearch indices to 2000. (SWAR-9414)

    • Customers should also review their search.numberOfShards setting before creating new search feeds.

  • For customers using object locking, introduced in 14.1:

    • Duplicated lifecycle policy RuleId values are now detected and will cause a 400 response for invalid updates to buckets with such rules. (SWAR-9407)

    • Lifecycle policy NamePrefix rules (for Non-ASCII characters) did not match the intended objects in Swarm 14.1. This issue is resolved in 15.0. (SWAR-9413)

    • Time zone-specific date formatting in lifecycle policies now support time zone specifiers such as YYYY-MM-DDT00:00:00Z, YYYY-MM-DDT00:00:00.000Z, and YYYY-MM-DDT00:00:00. (SWAR-9408)

  • Split Lock Errors: Cluster nodes with newer CPUs may have required a kernel parameter to stop “split lock detection” warnings in their servers' dmesg log. With this upgrade the kernel parameter is no longer required. (SWAR-9459)

  • Log Messages: Log messages now include the IP address of the storage node emitting the message. (SWAR-9533)

  • Inaccessible Objects: Objects with the tilde (~) character in the path name written with Swarm v11.0.x or before may not be accessible with later releases. The object is still present, but not accessible by name, and requests return a 404 Not Found. Contact support to make these objects accessible. (SWAR-9430)

Cumulative Impacts

Address all upgrade impacts for each version released since the version being upgraded from.

Watch Items and Known Issues

The following watch items are known:

  • Configuring elasticsearch.yml's network.host (https://www.elastic.co/guide/en/elasticsearch/reference/7.16/important-settings.html#network.host) to "__site__" might not choose the right IP to allow master election if the server is multi-homed. Modify the elasticsearch.yml to enter a specific IP for the node and the configuration script will preserve it. (SWAR-9350) If you run into this issue, the fix is to:

    • systemctl stop elasticsearch on all ES nodes

    • remove all the contents of the path.data directory

    • change network.host: <IP of ES NIC in the Storage VLAN>

    • systemctl start elasticsearch

  • Verify the configured “java.io.tmpdir” in “jvm.options” is writable to Elasticsearch for customers using Elasticsearch instances that fail to start with JNA warnings in Elasticsearch logs. Change “java.io.tmpdir” to /var/log/elasticsearch as per desired security preferences. (SWAR-9347)

  • Swarm versions 10.0 onward are vulnerable to kernel issues manifested on some Intel CPUs. Symptoms include lowered performance, long mount times, and cluster instability. Swarm versions 14.1 and later provide a workaround for this issue, see https://caringo.atlassian.net/wiki/spaces/KB/pages/2973204604). (SWAR-9055)

These are standing operational limitations:

  • The Storage UI shows no NFS config if the Elasticsearch cluster is wiped. Contact DataCore Support for help in repopulating the SwarmFS config information. (SWAR-8007)

  • Any incomplete multipart upload into a bucket leaves the parts (unnamed streams) in the domain if a bucket is deleted. To find and delete those parts, use the s3cmd utility (search the Support site for "s3cmd" guidance). (SWAR-7690)

  • The chassis shuts down but does not come back up when restarting a cluster of virtual machines that are UEFI-booted (versus legacy BIOS). (SWAR-8054)

  • Invalid config parameters that prevent the unassigned nodes from booting are created if subcluster assignments are removed in the CSN UI. (SWAR-7675)

To upgrade Swarm 9 or higher, proceed to . For migration from Swarm 8.x or earlier, contact DataCore Support for guidance.

 

 

 

 

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.