Following is a collection of best practices and reminders for various stages and areas of a Swarm implementation.

Top-Level Planning

Verify all network requirements are met
Configure the Switch/VLAN, such as IGMP Snooping and Spanning Tree
Decide IP assignments, both cluster and client-facing
Assign Multicast Group
Create a detailed diagram of the intended implementation. See Use Cases and Architectures
For cluster naming and domains, use IANA FQDN format (cluster.example.com), and align them with DNS
Follow conventions for SSL/TLS certificates
Plan Authentication/Authorization and the user store (LDAP, AD, PAM, Tokens). See Content Gateway Authentication
Decide how data is segmented across tenants, domains, and buckets. See Migrating from Traditional Storage
Define policies for client access and data flow; which clients are used to create and access data?
Choose an approach to integrate clients and applications; are there multiple access protocols to the same data (namespace)?

Storage Cluster Best Practices

Itemize and account for performance requirements, if any
Plan for both maintenance (drive replacement, live upgrades) and disruption (server failure, drive failure) scenarios
- Verify protection scheme choices are aligned with available resources
Select both monitoring and notification approaches
Capture utilization trends to stay ahead of capacity planning (increasing hardware and licensing level)
Create a default domain in the cluster having the same name as the cluster name (this is the “catch all” for enforcing tenancy of objects in the cluster)
Verify that all domains in the cluster use IANA FQDN format, as this has ramifications for DNS, Gateway, S3, and SSL+TLS

Plan to “scale out” alongside Swarm Storage
For best performance and redundancy in production, start out with four ES servers
Allow no Elasticsearch server to go beyond 64 GB of physical memory (this affects Java max heap and performance)
To optimize listing and query performance, use SSD drives
Locate Elasticsearch servers on the same subnet as Storage Cluster (avoid routing to Swarm nodes)
Read the Swarm Release Notes for Storage Cluster regarding associated Elasticsearch changes which may be necessary when performing upgrades
Always use the Elasticsearch packages bundled with the Swarm version deployed

Gateway serves as a “scale out, lightweight reverse proxy” to object storage
Place multiple Gateway servers behind the load balancer
Perform SSL/TLS off-load at the load balancer layer
Verify Gateway servers have unfettered access to Storage Cluster and Elasticsearch nodes
For best performance, place Gateway servers on the Storage Cluster network
Verify Gateway has provided access to LDAP/AD targets that are “network close” (as few hops as possible) and in good working order
Monitor concurrent session count for Capacity Planning; heavy S3 request activity may require additional Elasticsearch resources

Stateless Protocol Translator from NFSv4 to Swarm (SCSP)
Run the latest Swarm version for the best performance
Scale-out vs. export count (memory) and exports that exhibit large concurrent access activity
Verify SwarmFS deployment planning aligns with the authentication and authorization approach for Swarm Storage (Anonymous / Single User / Session Token)
Verify NFS clients can use NFSv4 (other NFS versions not supported)
For best behavior, clients should mount SwarmFS exports using a “timeo” setting of 9000

Use named servers (DNS, FQDN) rather than IP addresses, so future server migrations are easier when configuring FileFly
Enable both header options (Include metadata HTTP headers and Include Content-Disposition), which allows full metadata capture (such as for creating Collections from FileFly data) when installing the FileFly plugins
Deploy FileFly using Gateway (aka CloudScaler) rather than Direct to Swarm if possible
- Gateway allows for authenticated access and data segmentation/policy protection of FileFly data
- Gateway also supports SSL/TLS encapsulation of data in transit
With Scrub tasks (which cleans Swarm of data no longer associated with a FileFly source), verify the grace period aligns with the overall backup policy
After performing any data migration tasks, always run a “DrTool from Source” task
- Running the tool is necessary to verify up-to-date recovery of stubs, which might be accidentally deleted
FileFly can be sensitive to network throughput, so keep the associated source and target systems as “close” to the network as possible, and use the highest bandwidth available
Note the location of the FileFly logs, for troubleshooting