New Features
Internode communication: Internode communication has been optimized for better load handling.
Emergency Defragmentation: A new setting has been introduced to enable high-priority defragmentation without extra configurations.
RPC Improvement: An exponential back-off is implemented in case of failed RPCs to help under heavy load conditions.
SSD Trim Support: Storage nodes now support SAS and SATA-based SSDs. See also [DRAFT] Enhanced SSD Support with Automated TRIM Feature (SWAR-9883)
ES8 Support: Swarm now supports Elasticsearch (ES) 8 and later. (SWAR-9536)
New Settings
node.maxRPCPollsLost: Resolved Swarm node reboots under load by introducing
node.maxRPCPollsLost
to throttle on RPC status loss. (SWAR-10319)disk.enforceEmergencyDefrag: Added the setting to enable high-priority defragmentation without additional configurations. (SWAR-9276)
cip.multicastThreads: <Description of the setting> This setting follows the below formula:
2^(x+1) ~= number of CPUs cip.multicastThreads = x
New Headers
Castor-System-Delete-Marker-TmBorn: Returns the time born of a deletemarker.
Completion-Content-Length: Returns the size of the uploaded object on Multipart Completion.
Castor-System-Indexing-DocId: Returns the supposed ES Document ID for the object.
Fixed in 17.0
Default PDI Enablement: Per Domain Indexing (PDI) is enabled by default. (SWAR-10359)
Search Feed Deletion Improvement: Deleting a Search Feed created with
search.perDomainIndex=True
now removes all associated domain indices in Elasticsearch. Use the Storage UI for this, not the legacy port 90 console. (SWAR-10264)Uninitialized / Used disks: DataCore recommends checking and formatting the disk before attaching it to a Swarm Storage Node to prevent potential data loss when hot-plugging partitioned, used, or unformatted disks/volumes. (SWAR-10206)
Memory Leak Fix: Fixed memory leak issue which occurred over time in the loaded cluster. (SWAR-10124)
Node Reboot During Heavy Load: Fixed an issue that caused Swarm Storage Nodes to reboot under heavy load due to SCSP process failures. (SWAR-10319)
Recovery Failure: Fixed an issue where volumes pending for mount were added to the recovery suspend list, causing no recovery. (SWAR-10061)
New API Schema fields: Added new API schema fields (
usedMBytes
,trappedMBytes
, andusedSpaceMB
). (SWAR-10400)409 Error Due to Full Volumes: Fixed an issue where Swarm kept trying to use full volumes and occasionally the retry failed with a 409 error under heavy load scenarios. (SWAR-10249)
False 404s: Fixed false 404s error that occurs sometimes in Swarm responses during heavy load situations. (SWAR-10120)
Port Overflow: Fixed a port overflow issue that occurred when a cluster remained up for an extended period, affecting new processes. (SWAR-10276)
Improved Health Report Handling: Health reports will now be sent without dmesg if it becomes unresponsive. (SWAR-10185)
Watch Items and Known Issues
The following watch items are known:
Customizations to an Elasticsearch 6.8.6 /etc/elasticsearch/elasticsearch.yml path.data and network.host fields will be lost when running the configuration script to upgrade to Elasticsearch 7 if the upgrade is incomplete. This can happen if the new Elasticsearch 7 rpm is not in the current directory and cannot be downloaded. Reapply your customizations as this will not affect upgrades starting with Elasticsearch 7.5.2. Users need to back up their customizations before running the upgrade script. (SWAR-9977)
Caution
Contact DataCore Support if you are still using Elasticsearch 6.8.6.
When using search.perDomainIndex=True (under Support guidance), the number of supported domains is limited based on the number of data nodes in the Elasticsearch cluster and search.numberOfShards. For example, five data nodes support 5x600 shards at search.numberOfShards=5; each domain requires 5x2 (primary and replica) shards. Also remember Gateway creates daily csmeter indices, 1 shard x 2 each, for 100 days (retentionDays) plus a csmeterlock index. So the maximum number of domains supported is about 275. After that new domain indices cannot be created resulting in a domain listing returning a 503 ReaderUnavailableIndex. The castor.log will show errors EFD19, EIP15, and EIP02.
For example, "Validation Failed: this action would add [6] shards, but the cluster currently has maximum [999]/[1000] normal open shards". Error reporting will be improved in a future release. (SWAR-10172)
These are standing operational limitations:
The Storage UI shows no NFS config if the Elasticsearch cluster is wiped. Contact DataCore Support for help in repopulating the SwarmFS config information. (SWAR-8007)
Any incomplete multipart upload into a bucket leaves the parts (unnamed streams) in the domain if the bucket is deleted. To find and delete those parts, use the
s3cmd
utility (search the Support site for "s3cmd
" guidance). (SWAR-7690)The chassis shuts down but does not come back up when restarting a cluster of virtual machines that are UEFI-booted (versus legacy BIOS). (SWAR-8054)
Invalid config parameters that prevent the unassigned nodes from booting are created if subcluster assignments are removed in the CSN UI. (SWAR-7675)
Customers need to wait 1 minute or more to get the changes done in feed definition to be effective throughout the cluster. (SWAR-10007)
Swarm does not support Elasticsearch HTTPS and authentication features yet but Elasticsearch servers should remain on the internal storage network only. (SWAR-9934)
To upgrade Swarm 9 or higher, proceed to How to Upgrade Swarm. For migration from Swarm 8.x or earlier, contact DataCore Support for guidance.
Instructions for rpm v15.2 and above on CSN
The user must follow the below steps if using rpm version 15.2 or above on the CSN:
Edit the
/etc/caringo/netboot/netboot.cfg
file on the CSN.Verify that the KernelOptions parameter includes the new maximum size for the ramdisk.
kernelOptions = castor_net=active-backup: ramdisk_size=190000
Use a space separator between “active-backup:” and ramdisk_size=190000 as used in the above command.
Restart netboot.
service netboot restart
Deprecation
The search.caseInsensitive is now deprecated and will be removed in a future release. (SWAR-10085)
The fields
usedBytes
,trappedBytes
, andusedSpace
are now deprecated and will be removed in future releases. (SWAR-10400)
OSS Versions
See [DRAFT] Third-Party Components for Storage 17.0 for the complete listing of packages and versions for this release.