Swarm 15.3 VM Bundle Deployment

Swarm 15.3 VM Bundle Deployment

Introduction and Prerequisites

This document guides to effectively deploy a Swarm cluster using the OVF VM bundle package, as well as summarizes the steps to deploy a Swarm cluster from scratch indicating where to find more information about this type of deployment in DataCore’s documentation portal.

The global architecture and recommendations are the same for both types of installations.

Both the OVF bundle package and the standalone software are available in the DataCore downloads website.

Note

The process described in this document covers a standard and generic deployment of Swarm, focused on small installations and test environments for Proof-of-Concept/Proof-of-Value purposes.

As every single use case may be different, we recommend working with DataCore partners and DataCore Solutions Architects to address any specific configuration requirements or customization needed.

There are two main sections in this document:

  • Swarm deployment using the OVF VM bundle package.

  • Deploying Swarm from scratch.

The present document is based on a traditional deployment of Swarm, where the management and access layer run virtualized in one or more VMware ESXi hosts, while the storage nodes are physical x86-64 machines that will hold the data. See the below diagram.

Swarm-diag.png

Swarm Components

The Swarm stack utilizes several components grouped in two different layers:

  • Storage Layer: Comprised by the Swarm storage nodes which hold the information and take care of data protection.

  • Management and Access Layer: As the name implies, this layer provides both the administration of the Swarm cluster as well as access to the storage for users and client applications. No data storage or caching is happening in this layer.  

Below are the software components of the entire Swarm stack, their functions, and count recommendations for durability and availability purposes:

Swarm Storage Nodes

  • Swarm is an in-purpose built on-premises object storage solution. It runs on standard physical x86-64 servers providing a single pool of resources, supporting billions of objects/files in the same cluster and extending its capabilities to multiple sites (data replication).

  • Swarm will leverage all hardware resources the node (server where it runs) provides: CPU, RAM, network, and any direct-attached disk drives.

  • Minimum recommended storage nodes count: Four (4).

Platform Server - Swarm Cluster Services (SCS)

  • The SCS software provides Swarm cluster configuration and boot services as well as log aggregation and Swarm version management.

  • The SCS is not in the data path, but it does require access to the same layer 2 network as the Swarm storage nodes.

  • Minimum recommended SCS count: One (1).

Best Practice

Create a snapshot or clone the VM once its configuration is completed. Only one SCS instance can be online.

Elasticsearch

  • Provides listing and search capabilities based on object name and object metadata.

  • Minimum recommended Elasticsearch VM count for production environments: Three (3).

  • For functional Proof-of-Concepts, one (1) instance should suffice.

Content Gateway

  • The Content Gateway provides S3 and HTTP access as well as a Content Portal (web interface) that users and administrators can leverage to create buckets, upload data, use collections to perform searches (based on metadata), and many more. Hence, the Content Gateway is in the data path.

  • Content Gateway also enforces multitenancy features such as user authentication against LDAP, Active Directory or Single-Sign-on (SAML), permissions, quotas, and so on.

  • Minimum recommended Content Gateway count for production environments: Two (2).

Important

As Content Gateway is in the data path, at least two instances should be up and running at all times. A load balance mechanism such an HTTP Load Balancer is recommended to distribute requests across all the Content Gateway instances. Alternatively, DNS-RR can be used.

  • For functional Proof-of-Concepts, one (1) instance should suffice.

Telemetry (Optional)

Optional

This section is optional.

  • Prometheus integration and Grafana dashboards.

  • Minimum recommended Telemetry count: Usually one (1), but there could be as many as needed.

Load Balancers (Optional)

Optional

This section is optional.

  • To balance the client load across all the Content Gateway instances, an HTTP Load Balancer in front of the Content Gateways can be leveraged. This load balancer can be a software solution such as HAProxy, NGINX, or others. Also, it could be a hardware-based, appliance one.

Note

The DMZ network, load balancers, and public network items are outside the DataCore offering.

Networking Requirements and Recommendations

Swarm utilizes a dual networking configuration, where there is a Storage (Backend) network and a Service (Frontend) one. As per the diagram above, the Swarm storage nodes are only connected to the Backend network, while the management and access layer components have presence in both (dual-homed). Hence, this Backend/storage network must be configured in VMware ESXi as well.

The Backend network could be just a VLAN in the existent switching environment. However, this VLAN/network has to be dedicated exclusively to Swarm and it is usually isolated from the rest of the network environment. At any rate, no other system outside the Swarm stack should be connected to it.

The switch ports used by the Swarm storage nodes must be in access mode, as the Swarm nodes cannot tag VLAN traffic. Also, ‘port fast’ should be enabled to facilitate the PXE boot process (see below).

Best Practice

If multicast traffic is allowed in this Backend network, IGMP snooping must be disabled. Multicast is no longer required with Swarm 15, but enabling it still remains a best practice.

The Swarm storage nodes will PXE boot (boot over the network) from the SCS virtual machine that holds the image of the Operating System the nodes will use, as well as the cluster configuration. As part of the PXE boot process, the nodes will ask for an IP address via DHCP. The SCS VM will act as that DHCP server in the storage/backend network, no other DHCP server must be present in the Backend network segment.

To maximize availability, network failover (active-backup) configurations are encouraged, for both the Swarm storage and the virtualized management and access layer.

Environment Prerequisites

The following table illustrates the requirements for a typical Swarm deployment.

VM

vCPU

RAM

System Disk

Data Disk

VM

vCPU

RAM

System Disk

Data Disk

SCS

2

4 GB

50 GB

100 GB

Content Gateway

4

8 GB

50 GB

N/A

Swarm Search

4

24 GB

30 GB

450 GB

Swarm Telemetry

1

1 GB

40 GB

50 GB

Note

As each use case may vary, working with DataCore Partners and/or DataCore Solutions Architects to review these requirements is encouraged.

Required

A Swarm license key is required to finish the setup. Contact the DataCore Sales team.

Optionally, the end-user organization should generate a valid SSL certificate to enable HTTPS access.

Site Survey

To configure the Swarm cluster, the following information is required:

Swarm Cluster Name (FQDN)

<CLUSTER_NAME>

DNS Server(s)

<DNS_SERVER_1> <DNS_SERVER_2>

DNS Domain

<DNS_DOMAIN>

NTP Server(s)

<NTP_SERVER_1> <NTP_SERVER_2>

Storage/Backend Network (VLAN) IP Range

<BACKEND_NETMASK>

Service/Frontend Network (VLAN) IP Range

<FRONTEND_NETMASK>

Service/Frontend Network (VLAN) Gateway

<FRONTEND _GATEWAY>

IP Addresses

IP Addresses

Component Name

Frontend net. IP Address

Backend net. IP Address

SCS

<SCS_FRONTEND_IP> 

<SCS_BACKEND_IP> 

Content Gateway

 <GW_FRONTEND_IP> 

<GW_BACKEND_IP>  

Elasticsearch

Optional

<ES_BACKEND_IP>  

Swarm Telemetry

 <TM_FRONTEND_IP> 

 <TM_BACKEND_IP> 

Swarm Nodes

N/A

Auto-assigned by the SCS VM

Swarm Deployment Using the VMware Bundle

The VM bundled is comprised of OVF packages to be deployed in VMware ESXi 7. The operating system and the Swarm software are both pre-installed. They are based in CentOS 7.9.

The pre-configured Backend network/VLAN range is 172.29.0.0/16, but it can be changed as desired.

Warning

Select the backend network carefully. It must not be a range that is already in use by the customer network environment. Unless you plan to deploy a large cluster, you should not use a /16 network. The industry practice is to restrict it to /24.

The default credentials are:

  • SSH and console access: root - datacore

  • Web UIs: admin - datacore

These are the templates included in the VM bundle Swarm-15.3-ESX-7.0-U1-20231010:

  • SCS - PXE-boot the Swarm storage nodes, support tools

    • Template: SwarmClusterServices.ovf 

    • Associated disks: datacore-swarm-15.3.1-ESX-disk1.vmdk, datacore-swarm-15.3.1-ESX-disk2.vmdk

  • Swarmsearch (Elasticsearch) - Indexer and search engine

    • Template: SwarmSearch1.ovf

    • Associated disks: datacore-swarm-15.3.1-ESX-disk5.vmdk, datacore-swarm-15.3.1-ESX-disk6.vmdk

  • Content Gateway - S3 access, Content Portal

    • Template: SwarmContentGateway.ovf

    • Associated disks: datacore-swarm-15.3.1-ESX-disk7.vmdk

  • Telemetry (optional component) - Grafana dashboards

    • Template: SwarmTelemetry.ovf

    • Associated disks: datacore-swarm-15.3.1-ESX-disk3.vmdk, datacore-swarm-15.3.1-ESX-disk4.vmdk

The bundle also includes an OVF template that will deploy all VMs as a vAPP:

datacore-swarm-15.3.1-ESX.ovf

Important

As per VMware requirement, vCenter 7 with DRS enabled needs to be in place to deploy this vAPP.

SCS

Preparation Steps

  1. Deploy SCS VM (SwarmClusterServices.ovf) and its associated virtual disks (vmdk).

Note

The Operating System (CentOS 7.9) and the Swarm software is pre-installed. It has two virtual interfaces, one for the backend network and another for the frontend one.

  1. Edit /etc/sysconfig/network-scripts/ifcfg-ens192, change the IP configuration information for the frontend network.

BOOTPROTO="static"
ONBOOT="yes"
IPADDR=<SCS_FRONTEND_IP>
NETMASK=<FRONTEND_NETMASK>
GATEWAY=<FRONTEND_GATEWAY>
DNS1=<DNS_SERVER_1>
DNS2=<DNS_SERVER_2>

  1. Edit /etc/sysconfig/network-scripts/ifcfg-ens224, change the IP configuration information for the backend network.

BOOTPROTO=static
ONBOOT=yes
IPADDR=<SCS_BACKEND_IP>
NETMASK=<BACKEND_NETMASK>

  1. Run:

ifdown ens192; ifdown ens224

systemctl restart network

or just reboot the VM to make sure it will pick up the changes.

  1. The network configuration can be verified with the command: ip a

Offline Installation

For offline installation (i.e., when no Internet access is available).

  1. edit /etc/hosts the first line should read:

{SCS_External_IP} www.datacore.com

  1. Set the time zone according to your local clock.

timedatectl set-timezone <timezone>

hwclock --systohc

Note

All available time zones can be listed with the command: timedatectl list-timezones

  1. Configure chrony (NTP daemon) to connect to a valid NTP server.

  2. Edit the file /etc/chrony.conf and add the proper IP addresses or names of those NTP servers.

server <NTP_SERVER_1> iburst
server <NTP_SERVER_2> iburst

allow <SCS_BACKEND_NETWORK/SCS_BACKEND_NETMASK> example: allow 172.20.0.0/24

  1. Restart chrony daemon: systemctl restart chronyd

  2. Verify the clock is in sync with: chronyc tracking

SCS Configuration

Once the auxiliary services of SCS have been configured, the SCS set up can take place.

  1. Run the configurator wizard: scsctl init wizard -a

  2. Running step [1/36]: Set site name.

  3. Type the <CLUSTER_NAME>

  4. Missing setting: platform/admin.password

    • Update this setting as a default at group level

    • Click Enter and type the admin password for the cluster:

      • admin.password [type: str ***SECURE***] (Administrative user password)

      • Re-enter to confirm.

  5. Running step [3/36]: Select Swarm-internal interface.

  6. Specify the network interface that will be used for internal Swarm operations:

lo

ens192

> ens224

Select ens224, click Enter.

  1. Running step [6/36]: Define Swarm-internal network.

    1. The internal interface requires a static IP address to be defined on it.

    2. It looks like your internal interface is already configured with an IP address: x.x.x.x/x

    3. Do you wish to continue to use this address and netmask? [Y/N]: Reply Y and click Enter

The provisioning process will commence, it will take a few minutes to complete.

  1. Continue the configuration process running: scsctl diagnostics config scan_missing

  • Missing setting: network_boot/network.dnsServers

    • Update this setting as a default at group level

    • Click Enter and type the IP addresses of the DNS servers below:

      • network.dnsServers [type: array[str]] (Required: DNS servers to be used):

        • <DNS_SERVER_1> <DNS_SERVER_2>

  • Missing setting: platform/network.dnsDomain

    • Update this setting as a default at group level

    • Click Enter and type the DNS domain used

      • network.dnsDomain [type: str] (Required: The DNS domain name that will be used.):

        • <DNS_DOMAIN>

In the next step, the Swarm image will be added and configured.

Run: scsctl repo component add -f /root/swarm-scs-storage-15.3.1.tgz

During this process, the feature “encryption at rest” (EAR) can be configured.

Optional

This is an optional functionality that will encrypt the data when it hits the plate of the disks. It comes at a cost of usually 15-20% performance penalty as the nodes need to use processing power to encrypt/decrypt data.

This guide assumes EAR will be configured. If it is not a requirement, select False on the next step when the wizard asks about disk.encryptNewVolumes configuration.

Also, the configuration steps will ask about multicast traffic, whether it will be allowed. As it is a best-practice to keep it enabled, this guide will follow that.

  • Missing setting: storage/disk.encryptNewVolumes

    • Update this setting as a component-wide default

    • Click Enter

    • disk.encryptNewVolumes [type: bool] (Whether to encrypt new Swarm volumes. Enabling encryptNewVolumes means that any newly-formatted Swarm volume will be encrypted)

      • False > Click Enter

  • Missing setting: storage/cip.multicastEnabled

    • Update this setting as a component-wide default

    • Click Enter

    • cip.multicastEnabled [type: bool] (Whether multicast should be used for communication within Swarm.)

      • True > Click Enter

Finally, the configuration wizard asks for what drives will be used to store data, this guide assumes “all” drives will be used as the server should be dedicated exclusively to Swarm.

  • Missing setting: storage/disk.volumes

    • Update this setting as a component-wide default

    • Click Enter

    • disk.volumes [type: str] (Required: Specifies the volume storage devices for Swarm to use)

      • all

At this stage, the Swarm image is added. The configuration wizard will ask about the cluster name and a description.

added: storage - 15.3.1 (15.3.1)

  • Enter a name for the group (FQDN format encouraged):

    • <CLUSTER_NAME>

  • Enter a description for the group (purpose, etc.). [OPTIONAL]:

    • Test cluster 1

Once the image is installed, run (again): scsctl diagnostics config scan_missing

Since, EAR will be used, the configuration wizard will ask for a name for a primary encryption key and the actual encryption key. If EAR is not a requirement, select “skip for now”

  • Missing setting: storage/disk.encryptionKeyPrimary

    • Update this setting as a default at group level

    • Click Enter

    • disk.encryptionKeyPrimary [type: str ***SECURE***] The mnemonic name of the encryption key.

      • primary

  • Missing setting: storage/disk.encryptionKeys

  • Update this setting as a default at group level

  • Click Enter

  • Name (or Enter/Return to stop adding entries)

    • primary

  • Value:

    • supersecretencryptionkeypleasedonotdistributeit1234

  • Name (or Enter/Return to stop adding entries)

  • Click Enter to finish.

Note

The above key name and value are just examples.

SCS needs to know what IP range can be used to PXE boot the Swarm storage nodes in the backend network. In order to avoid collisions with other Swarm services, we can reserve a number of IP addresses at the beginning or the end of the range, so SCS will not assign those IP addresses to the nodes. In order to do this, run:

scsctl init dhcp --dhcp-reserve-lower=50 --dhcp-reserve-upper=10

adjusting the values to whatever makes sense in the backend network.

I.e., in a /24 network, the above will use from .51 to .244 to PXE boot and assign IP addresses to the Swarm storage nodes.

If the physical servers have SSD/NVMe or smaller drives that are not interesting for Swarm, they can be excluded running:

scsctl storage config set -d ​​"disk.minGB=4096"

As an example, the above command will exclude any drive that is smaller than 4TB.

Unzip and add the license key. This key should be a plain text file:

scsctl license add -f license.txt

It is recommended to enable Swarm node stats for the Telemetry VM (Prometheus/Grafana), in order to do this, run:

scsctl storage config set -d "metrics.enableNodeExporter=true"
scsctl storage config set -d "metrics.nodeExporterFrequency=120"

If the Swarm storage nodes use an Intel Skylake based CPU or similar, run the following:

scsctl network_boot config set kernel.extraArgs=clocksource.max_cswd_read_retries=50 -d
systemctl restart swarm-platform

For more information, see Intel Skylake/Cascade Lake CPU Performance Issue

Finally, create a backup of the SCS configuration. Run:

scsctl backup -o backup-config-<date>

At this point, SCS has been configured and it is ready to PXE boot Swarm storage nodes.

Elasticsearch

Important

Before doing this, deploy the Swarm Search VM template (SwarmSearch1.ovf).

Note

The Operating System (CentOS 7.9) and the Swarm software is pre-installed. It has two virtual interfaces, one for the backend network and another for the frontend one. The latter is disconnected by default as it's not strictly required.

The preconfigured IP address for the backend network is 172.29.1.20/16.

Below are the steps to change it, if it is required:

Important

Make sure that the first virtual network card of the VM is connected to the Backend network.

  1. Update the IP configuration information on /etc/sysconfig/network-scripts/ifcfg-ens192

  2. Remove existent DNS, Gateway and Prefix in that config file and just include:

IPADDR=<ES_BACKEND_IP>|
NETMASK=<BACKEND_NETMASK>

  1. Run:
    ifdown ens192
    systemctl restart network

  2. Edit /etc/elasticsearch/elasticsearch.yml and replace 172.29.1.20 with the IP address configured in the previous step for this VM in the following sections of the file:

network.host: <ES_BACKEND_IP>
discovery.seed_hosts: ["<ES_BACKEND_IP>"]
cluster.initial_master_nodes: ["<ES_BACKEND_IP>"]

  1. Restart the service: systemctl restart elasticsearch

  2. Verify it is up and running with: curl -XGET "http://<ES_BACKEND_IP>:9200/_cat/health?v". The response should be "green" or "yellow".

  3. Set the time zone according to your local clock:

timedatectl set-timezone <timezone>
hwclock --systohc

Note

All available time zones can be listed with the command: timedatectl list-timezones

  1. Point /etc/chrony.d/chrony.conf to the IP address of the SCS VM over the storage/private network.

server <SCS_BACKEND_IP> iburst

systemctl restart chronyd