Swarm 15.3 VM Bundle Deployment
Introduction and Prerequisites
This document guides to effectively deploy a Swarm cluster using the OVF VM bundle package, as well as summarizes the steps to deploy a Swarm cluster from scratch indicating where to find more information about this type of deployment in DataCore’s documentation portal.
The global architecture and recommendations are the same for both types of installations.
Both the OVF bundle package and the standalone software are available in the DataCore downloads website.
Note
The process described in this document covers a standard and generic deployment of Swarm, focused on small installations and test environments for Proof-of-Concept/Proof-of-Value purposes.
As every single use case may be different, we recommend working with DataCore partners and DataCore Solutions Architects to address any specific configuration requirements or customization needed.
There are two main sections in this document:
Swarm deployment using the OVF VM bundle package.
Deploying Swarm from scratch.
The present document is based on a traditional deployment of Swarm, where the management and access layer run virtualized in one or more VMware ESXi hosts, while the storage nodes are physical x86-64 machines that will hold the data. See the below diagram.
Swarm Components
The Swarm stack utilizes several components grouped in two different layers:
Storage Layer: Comprised by the Swarm storage nodes which hold the information and take care of data protection.
Management and Access Layer: As the name implies, this layer provides both the administration of the Swarm cluster as well as access to the storage for users and client applications. No data storage or caching is happening in this layer.
Below are the software components of the entire Swarm stack, their functions, and count recommendations for durability and availability purposes:
Swarm Storage Nodes
Swarm is an in-purpose built on-premises object storage solution. It runs on standard physical x86-64 servers providing a single pool of resources, supporting billions of objects/files in the same cluster and extending its capabilities to multiple sites (data replication).
Swarm will leverage all hardware resources the node (server where it runs) provides: CPU, RAM, network, and any direct-attached disk drives.
Minimum recommended storage nodes count: Four (4).
Platform Server - Swarm Cluster Services (SCS)
The SCS software provides Swarm cluster configuration and boot services as well as log aggregation and Swarm version management.
The SCS is not in the data path, but it does require access to the same layer 2 network as the Swarm storage nodes.
Minimum recommended SCS count: One (1).
Best Practice
Create a snapshot or clone the VM once its configuration is completed. Only one SCS instance can be online.
Elasticsearch
Provides listing and search capabilities based on object name and object metadata.
Minimum recommended Elasticsearch VM count for production environments: Three (3).
For functional Proof-of-Concepts, one (1) instance should suffice.
Content Gateway
The Content Gateway provides S3 and HTTP access as well as a Content Portal (web interface) that users and administrators can leverage to create buckets, upload data, use collections to perform searches (based on metadata), and many more. Hence, the Content Gateway is in the data path.
Content Gateway also enforces multitenancy features such as user authentication against LDAP, Active Directory or Single-Sign-on (SAML), permissions, quotas, and so on.
Minimum recommended Content Gateway count for production environments: Two (2).
Important
As Content Gateway is in the data path, at least two instances should be up and running at all times. A load balance mechanism such an HTTP Load Balancer is recommended to distribute requests across all the Content Gateway instances. Alternatively, DNS-RR can be used.
For functional Proof-of-Concepts, one (1) instance should suffice.
Telemetry (Optional)
Prometheus integration and Grafana dashboards.
Minimum recommended Telemetry count: Usually one (1), but there could be as many as needed.
Load Balancers (Optional)
To balance the client load across all the Content Gateway instances, an HTTP Load Balancer in front of the Content Gateways can be leveraged. This load balancer can be a software solution such as HAProxy, NGINX, or others. Also, it could be a hardware-based, appliance one.
Networking Requirements and Recommendations
Swarm utilizes a dual networking configuration, where there is a Storage (Backend) network and a Service (Frontend) one. As per the diagram above, the Swarm storage nodes are only connected to the Backend network, while the management and access layer components have presence in both (dual-homed). Hence, this Backend/storage network must be configured in VMware ESXi as well.
The Backend network could be just a VLAN in the existent switching environment. However, this VLAN/network has to be dedicated exclusively to Swarm and it is usually isolated from the rest of the network environment. At any rate, no other system outside the Swarm stack should be connected to it.
The switch ports used by the Swarm storage nodes must be in access mode, as the Swarm nodes cannot tag VLAN traffic. Also, ‘port fast’ should be enabled to facilitate the PXE boot process (see below).
The Swarm storage nodes will PXE boot (boot over the network) from the SCS virtual machine that holds the image of the Operating System the nodes will use, as well as the cluster configuration. As part of the PXE boot process, the nodes will ask for an IP address via DHCP. The SCS VM will act as that DHCP server in the storage/backend network, no other DHCP server must be present in the Backend network segment.
To maximize availability, network failover (active-backup) configurations are encouraged, for both the Swarm storage and the virtualized management and access layer.
Environment Prerequisites
The following table illustrates the requirements for a typical Swarm deployment.
VM | vCPU | RAM | System Disk | Data Disk |
---|---|---|---|---|
SCS | 2 | 4 GB | 50 GB | 100 GB |
Content Gateway | 4 | 8 GB | 50 GB | N/A |
Swarm Search | 4 | 24 GB | 30 GB | 450 GB |
Swarm Telemetry | 1 | 1 GB | 40 GB | 50 GB |
Optionally, the end-user organization should generate a valid SSL certificate to enable HTTPS access.
Site Survey
To configure the Swarm cluster, the following information is required:
Swarm Cluster Name (FQDN) | <CLUSTER_NAME> |
---|---|
DNS Server(s) | <DNS_SERVER_1> <DNS_SERVER_2> |
DNS Domain | <DNS_DOMAIN> |
NTP Server(s) | <NTP_SERVER_1> <NTP_SERVER_2> |
Storage/Backend Network (VLAN) IP Range | <BACKEND_NETMASK> |
Service/Frontend Network (VLAN) IP Range | <FRONTEND_NETMASK> |
Service/Frontend Network (VLAN) Gateway | <FRONTEND _GATEWAY> |
IP Addresses | ||
---|---|---|
Component Name | Frontend net. IP Address | Backend net. IP Address |
SCS | <SCS_FRONTEND_IP> | <SCS_BACKEND_IP> |
Content Gateway | <GW_FRONTEND_IP> | <GW_BACKEND_IP> |
Elasticsearch | Optional | <ES_BACKEND_IP> |
Swarm Telemetry | <TM_FRONTEND_IP> | <TM_BACKEND_IP> |
Swarm Nodes | N/A | Auto-assigned by the SCS VM |
Swarm Deployment Using the VMware Bundle
The VM bundled is comprised of OVF packages to be deployed in VMware ESXi 7. The operating system and the Swarm software are both pre-installed. They are based in CentOS 7.9.
The pre-configured Backend network/VLAN range is 172.29.0.0/16, but it can be changed as desired.
The default credentials are:
SSH and console access: root - datacore
Web UIs: admin - datacore
These are the templates included in the VM bundle Swarm-15.3-ESX-7.0-U1-20231010:
SCS - PXE-boot the Swarm storage nodes, support tools
Template: SwarmClusterServices.ovf
Associated disks: datacore-swarm-15.3.1-ESX-disk1.vmdk, datacore-swarm-15.3.1-ESX-disk2.vmdk
Swarmsearch (Elasticsearch) - Indexer and search engine
Template: SwarmSearch1.ovf
Associated disks: datacore-swarm-15.3.1-ESX-disk5.vmdk, datacore-swarm-15.3.1-ESX-disk6.vmdk
Content Gateway - S3 access, Content Portal
Template: SwarmContentGateway.ovf
Associated disks: datacore-swarm-15.3.1-ESX-disk7.vmdk
Telemetry (optional component) - Grafana dashboards
Template: SwarmTelemetry.ovf
Associated disks: datacore-swarm-15.3.1-ESX-disk3.vmdk, datacore-swarm-15.3.1-ESX-disk4.vmdk
The bundle also includes an OVF template that will deploy all VMs as a vAPP:
datacore-swarm-15.3.1-ESX.ovf
SCS
Preparation Steps
Deploy SCS VM (SwarmClusterServices.ovf) and its associated virtual disks (vmdk).
Edit /etc/sysconfig/network-scripts/ifcfg-ens192, change the IP configuration information for the frontend network.
BOOTPROTO="static"
ONBOOT="yes"
IPADDR=<SCS_FRONTEND_IP>
NETMASK=<FRONTEND_NETMASK>
GATEWAY=<FRONTEND_GATEWAY>
DNS1=<DNS_SERVER_1>
DNS2=<DNS_SERVER_2>
Edit /etc/sysconfig/network-scripts/ifcfg-ens224, change the IP configuration information for the backend network.
BOOTPROTO=static
ONBOOT=yes
IPADDR=<SCS_BACKEND_IP>
NETMASK=<BACKEND_NETMASK>
Run:
ifdown ens192; ifdown ens224
systemctl restart network
or just reboot the VM to make sure it will pick up the changes.
The network configuration can be verified with the command: ip a
Offline Installation
For offline installation (i.e., when no Internet access is available).
edit /etc/hosts the first line should read:
{SCS_External_IP} www.datacore.com
Set the time zone according to your local clock.
timedatectl set-timezone <timezone>
hwclock --systohc
Configure chrony (NTP daemon) to connect to a valid NTP server.
Edit the file /etc/chrony.conf and add the proper IP addresses or names of those NTP servers.
server <NTP_SERVER_1> iburst
server <NTP_SERVER_2> iburst
allow <SCS_BACKEND_NETWORK/SCS_BACKEND_NETMASK> example: allow 172.20.0.0/24
Restart chrony daemon: systemctl restart chronyd
Verify the clock is in sync with: chronyc tracking
SCS Configuration
Once the auxiliary services of SCS have been configured, the SCS set up can take place.
Run the configurator wizard: scsctl init wizard -a
Running step [1/36]: Set site name.
Type the <CLUSTER_NAME>
Missing setting: platform/admin.password
Update this setting as a default at group level
Click Enter and type the admin password for the cluster:
admin.password [type: str ***SECURE***] (Administrative user password)
Re-enter to confirm.
Running step [3/36]: Select Swarm-internal interface.
Specify the network interface that will be used for internal Swarm operations:
lo
ens192
> ens224
Select ens224, click Enter.
Running step [6/36]: Define Swarm-internal network.
The internal interface requires a static IP address to be defined on it.
It looks like your internal interface is already configured with an IP address: x.x.x.x/x
Do you wish to continue to use this address and netmask? [Y/N]: Reply Y and click Enter
The provisioning process will commence, it will take a few minutes to complete.
Continue the configuration process running: scsctl diagnostics config scan_missing
Missing setting: network_boot/network.dnsServers
Update this setting as a default at group level
Click Enter and type the IP addresses of the DNS servers below:
network.dnsServers [type: array[str]] (Required: DNS servers to be used):
<DNS_SERVER_1> <DNS_SERVER_2>
Missing setting: platform/network.dnsDomain
Update this setting as a default at group level
Click Enter and type the DNS domain used
network.dnsDomain [type: str] (Required: The DNS domain name that will be used.):
<DNS_DOMAIN>
In the next step, the Swarm image will be added and configured.
Run: scsctl repo component add -f /root/swarm-scs-storage-15.3.1.tgz
During this process, the feature “encryption at rest” (EAR) can be configured.
This guide assumes EAR will be configured. If it is not a requirement, select False on the next step when the wizard asks about disk.encryptNewVolumes configuration.
Also, the configuration steps will ask about multicast traffic, whether it will be allowed. As it is a best-practice to keep it enabled, this guide will follow that.
Missing setting: storage/disk.encryptNewVolumes
Update this setting as a component-wide default
Click Enter
disk.encryptNewVolumes [type: bool] (Whether to encrypt new Swarm volumes. Enabling encryptNewVolumes means that any newly-formatted Swarm volume will be encrypted)
False > Click Enter
Missing setting: storage/cip.multicastEnabled
Update this setting as a component-wide default
Click Enter
cip.multicastEnabled [type: bool] (Whether multicast should be used for communication within Swarm.)
True > Click Enter
Finally, the configuration wizard asks for what drives will be used to store data, this guide assumes “all” drives will be used as the server should be dedicated exclusively to Swarm.
Missing setting: storage/disk.volumes
Update this setting as a component-wide default
Click Enter
disk.volumes [type: str] (Required: Specifies the volume storage devices for Swarm to use)
all
At this stage, the Swarm image is added. The configuration wizard will ask about the cluster name and a description.
added: storage - 15.3.1 (15.3.1)
Enter a name for the group (FQDN format encouraged):
<CLUSTER_NAME>
Enter a description for the group (purpose, etc.). [OPTIONAL]:
Test cluster 1
Once the image is installed, run (again): scsctl diagnostics config scan_missing
Since, EAR will be used, the configuration wizard will ask for a name for a primary encryption key and the actual encryption key. If EAR is not a requirement, select “skip for now”
Missing setting: storage/disk.encryptionKeyPrimary
Update this setting as a default at group level
Click Enter
disk.encryptionKeyPrimary [type: str ***SECURE***] The mnemonic name of the encryption key.
primary
Missing setting: storage/disk.encryptionKeys
Update this setting as a default at group level
Click Enter
Name (or Enter/Return to stop adding entries)
primary
Value:
supersecretencryptionkeypleasedonotdistributeit1234
Name (or Enter/Return to stop adding entries)
Click Enter to finish.
SCS needs to know what IP range can be used to PXE boot the Swarm storage nodes in the backend network. In order to avoid collisions with other Swarm services, we can reserve a number of IP addresses at the beginning or the end of the range, so SCS will not assign those IP addresses to the nodes. In order to do this, run:
scsctl init dhcp --dhcp-reserve-lower=50 --dhcp-reserve-upper=10
adjusting the values to whatever makes sense in the backend network.
I.e., in a /24 network, the above will use from .51 to .244 to PXE boot and assign IP addresses to the Swarm storage nodes.
If the physical servers have SSD/NVMe or smaller drives that are not interesting for Swarm, they can be excluded running:
scsctl storage config set -d "disk.minGB=4096"
As an example, the above command will exclude any drive that is smaller than 4TB.
Unzip and add the license key. This key should be a plain text file:
scsctl license add -f license.txt
It is recommended to enable Swarm node stats for the Telemetry VM (Prometheus/Grafana), in order to do this, run:
scsctl storage config set -d "metrics.enableNodeExporter=true"
scsctl storage config set -d "metrics.nodeExporterFrequency=120"
If the Swarm storage nodes use an Intel Skylake based CPU or similar, run the following:
scsctl network_boot config set kernel.extraArgs=clocksource.max_cswd_read_retries=50 -d
systemctl restart swarm-platform
For more information, see Intel Skylake/Cascade Lake CPU Performance Issue
Finally, create a backup of the SCS configuration. Run:
scsctl backup -o backup-config-<date>
At this point, SCS has been configured and it is ready to PXE boot Swarm storage nodes.
Elasticsearch
The preconfigured IP address for the backend network is 172.29.1.20/16.
Below are the steps to change it, if it is required:
Update the IP configuration information on /etc/sysconfig/network-scripts/ifcfg-ens192
Remove existent DNS, Gateway and Prefix in that config file and just include:
IPADDR=<ES_BACKEND_IP>|
NETMASK=<BACKEND_NETMASK>
Run:
ifdown ens192
systemctl restart networkEdit /etc/elasticsearch/elasticsearch.yml and replace 172.29.1.20 with the IP address configured in the previous step for this VM in the following sections of the file:
network.host: <ES_BACKEND_IP>
discovery.seed_hosts: ["<ES_BACKEND_IP>"]
cluster.initial_master_nodes: ["<ES_BACKEND_IP>"]
Restart the service: systemctl restart elasticsearch
Verify it is up and running with: curl -XGET "http://<ES_BACKEND_IP>:9200/_cat/health?v". The response should be "green" or "yellow".
Set the time zone according to your local clock:
timedatectl set-timezone <timezone>
hwclock --systohc
Point /etc/chrony.d/chrony.conf to the IP address of the SCS VM over the storage/private network.
server <SCS_BACKEND_IP> iburst
systemctl restart chronyd
Verify it has been synchronized. Run: chronyc tracking. Alternatively, another NTP server can be used that the VM can reach.
Edit the properties of the VM and check “connect” on the virtual interface connected to the frontend network.
With the above steps, only one Elasticsearch VM will be provisioned. The status will appear in “Yellow” the moment there is any data in Elasticsearch as there is no redundancy. Usually, for Proof-of-Concept or Proof-of-Value scenarios this configuration is enough. However, for production environments the recommendation is to have at least three Elasticsearch VMs up and running forming a cluster by themselves.
The steps to deploy a full Elasticsearch cluster are explained below.
Deploy the SwarmSearch1.ovf template two more times.
Stop elasticsearch service:
systemctl stop elasticsearch
cd /var/lib/elasticsearch
rm -rf nodes
Update /etc/hostname in the two new VMs, i.e., “swarmsearch2”, “swarmsearch3”.
Update the static IP address for the backend adapter ens192.
Delete the existing file /etc/elasticsearch/elasticsearch.yml and run the install wizard
/usr/share/caringo-elasticsearch-search/bin/configure_elasticsearch_with_swarm_search.py
Repeat these steps for every Elasticsearch VM.
For more information, see Configuring Elasticsearch
Once all your Elasticsearch VM’s have been initialized, you can proceed with starting elasticsearch service on all of them:
systemctl start elasticsearch
Once the process has been completed in all VMs, check the health of the Elasticsearch cluster with:
oem-es-maintenance.sh -t <ES_BACKEND_IP>
Three (3) nodes should appear under “node.total” and the status should be “green”.
Swarm Storage Nodes
Before starting the PXE boot process, enter the BIOS of each server that will be the Swarm storage node and check:
The HBA/Disk controller is configured in pass-through mode. Essentially, this is a non-RAID configuration where all the disk drives are presented to the operating system individually. It is also called IT mode, HBA mode, pass thru, or non-RAID.
The network card port connected to the Backend VLAN/network must be enabled for PXE booting, no other port should be PXE-boot enabled. Moreover, there should not be any other port connected to any other network, with the exception of the dedicated port for out of band management (OOB, IPMI, BMC)
Once these has been verified, the PXE boot process can begin.
Start with a single node, making sure it boots properly.
Continue with the rest. A successful Swarm storage node boot looks like this on the screen / IPMI console of the server:
Swarm version, IP address of the node, and “Storage Processes:
RUNNING” should appear on the screen.
Content Gateway
The final step to have everything needed to have a functional Swarm cluster is to get Content Gateway up and running.
Deploy SwarmContentGateway.ovf. The IP addresses must be configured next.
Edit /etc/sysconfig/network-scripts/ifcfg-ens160, change the IP configuration information for the frontend network.
BOOTPROTO="static"
IPADDR=<GW_FRONTEND_IP>
NETMASK=<FRONTEND_NETMASK>
GATEWAY=<FRONTEND_GATEWAY>
DNS1=<DNS_SERVER_1>
DNS2=<DNS_SERVER_2>
Edit /etc/sysconfig/network-scripts/ifcfg-ens192, delete the PREFIX and the GATEWAY lines, and include the IP configuration information for the backend network.
IPADDR=<GW_BACKEND_IP>
NETMASK=<BACKEND_NETMASK>
Run:
ifdown ens160; ifdown ens192
systemctl restart network
or just reboot the VM to make sure it will pick up the changes.
The network configuration can be verified with the command: ip a
Set the time zone according to your local clock.
timedatectl set-timezone <timezone>
hwclock --systohc
Configure chrony (NTP daemon) to connect to a valid NTP server.
Edit the file /etc/chrony.conf and add the proper IP addresses or names of those NTP servers.
server <NTP_SERVER_1> iburst
server <NTP_SERVER_2> iburst
Restart chrony daemon: systemctl restart chronyd
Verify the clock is in sync with: chronyc tracking
The Content Gateway configuration comes next.
To proceed with the Content Gateway configuration:
Edit /etc/caringo/cloudgateway/gateway.cfg
adminDomain = admin.<CLUSTER_NAME>
hosts = <SWARM_NODE1_IP> < SWARM_NODE2_IP > <SWARM_NODE3_IP> < SWARM_NODE4_IP>
indexerHosts = <ES_BACKEND_IP>
managementPassword = <CLUSTER_MGMT_PASSWORD>
For more information, see Content Metering and Setting Quotas
[metering]
enabled = true
[quota]
enabled = true
Finally, run:
/opt/caringo/cloudgateway/bin/initgateway
systemctl enable cloudgateway
systemctl start cloudgateway
systemctl restart haproxy
systemctl status cloudgateway
With these steps completed, Content Gateway should be up and running.
As the final step, let’s configure the desired default protection scheme and connect Swarm to Elasticsearch.
Open a web browser and go to: http://<GW_FRONTEND_IP>:9091/_admin/storage.
Click Storage Management.
Click Cluster and then on Feeds.
On the top right corner, click +Add and select Search Metadata feed.
On server host(s) or IP(s), type all the IP addresses of all the Elasticsearch VMs that are up and running separated by a blank space. <ES_BACKEND_IP>
Click Save. Now the Swarm nodes are connected to Elasticsearch, every time a new object/file gets uploaded to the cluster, its metadata will be also copied to Elasticsearch for search and listing purposes.
To finalize the setup, the default protection scheme should be set. Also, features like lifecycle policies and versioning can be enabled, if desired.
For more information about these features, see Object Versioning and Bucket Lifecycle Policy.
Versioning is required to enable “S3 object locking” (immutability).
For more information, see SCSP Object Locking.
Click Settings and select Cluster.
In the “Policy” section, change the protection scheme as desired, for example with 4 Swarm storage nodes:
policy.eCEncoding 4:2
policy.eCMinStreamSize 1Mb
policy.lifecycle enabled
policy.replicas min:3 max:16 default:3
policy.versioning allowed
Click Save on the top right corner.
Finally, test uploads and downloads using the provided Content Portal.
Open a web browser and go to: http://<GW_FRONTEND_IP>/_admin/portal
Click System Tenant on the upper right corner and click +Add.
To create a bucket:
Click the domain that you just created.
Click +Add this time selecting “Bucket”. Provide a name such as “bucket1” or “test1”.
Click the bucket you just created and click +Add or drop files.
Look for some files of various sizes on the client machine used, from KBs to MBs and upload them.
Click the bucket name at the top.
Swarm utilizes FQDNs to identify which storage domain (endpoint) the client is connecting to. Hence, create DNS entries according to the Storage Domains used in the environment.
At this point Swarm is up and running and its basic functionality has been verified.
Create an S3 Key Pair (Optional)
To access the storage layer using the S3 protocol, a S3 key pair must be created.
It is comprised by the S3 access key and the S3 secret key.
Open a web browser and go to: http://<GW_FRONTEND_IP>/_admin/portal
Click the domain (endpoint) desired, but not on the admin one.
Click the cog/wheel in the top right corner and select Tokens. There will be an +Add button again in the top right corner.
Provide a description, an expiration date, and click the checkmark by “S3 secret key”.
Upon clicking on Add, a green message will appear with all the information needed.
With this information and the name of the domain used, it is possible to create a connection to the Swarm repository over the S3 protocol.
Configuring an SSL Certificate (Optional)
By default, the Content Gateway VM template comes with HAProxy configured and a self-signed SSL certificate.
In order to use a valid one, signed by a proper certification authority, make sure the format of this new certificate is X.509 in PEM format, and it contains in the same file the CERTIFICATE portion as well as the PRIVATE KEY one.
Copy the new .pem file to /etc/pki/tls/certs
Edit /etc/haproxy/haproxy.cfg and update the lines:
bind 0.0.0.0:443 ssl crt /etc/pki/tls/certs/<NEW_CERTIFICATE>.pem
bind 0.0.0.0:91 ssl crt /etc/pki/tls/certs/<NEW_CERTIFICATE>.pem
Finally, restart the HAProxy service:
systemctl restart haproxy
Alternatively, a helper script is provided in /root called GenerateSelfSignedCertificate.sh
To use it, run: GenerateSelfSignedCertificate.sh <storage_domain> and copy the generated .pem file as described above.
Central Logging (Optional)
It is recommended that the Content Gateway logs all actions and its status to the central Syslog server. The SCS can act as the central repository for logs.
To configure this, edit /etc/caringo/cloudgateway/logging.yaml and modify the following lines:
Syslog:
name: audit_syslog
host: <SCS_BACKEND_IP>
name: server_syslog
host: <SCS_BACKEND_IP>
Loggers:
# Global logging configuration
Root:
level: "${logLevel}"
AppenderRef:
ref: file
ref: server_syslog
Logger:
# Audit logger
name: audit
level: info
additivity: false
AppenderRef:
ref: audit
ref: audit_syslog
There is no need to restart the Content Gateway service, the new logging configuration will be applied automatically after just a few seconds.
Telemetry (Optional)
The Telemetry VM provides an all-in-one reference implementation of Prometheus, Alertmanager, and Grafana.
Preparation Steps
Deploy SwarmTelemetry.ovf. IP addresses must be configured next.
Edit /etc/sysconfig/network-scripts/ifcfg-ens33, change the IP configuration information for the frontend network.
BOOTPROTO="static"
IPADDR=<TM_FRONTEND_IP>
NETMASK=<FRONTEND_NETMASK>
GATEWAY=<FRONTEND_GATEWAY>
DNS1=<DNS_SERVER_1>
DNS2=<DNS_SERVER_2>
Edit /etc/sysconfig/network-scripts/ifcfg-ens160, delete the PREFIX and the GATEWAY lines, and include the IP configuration information for the backend network.
IPADDR=<TM_BACKEND_IP>
NETMASK=<BACKEND_NETMASK>
Run:
ifdown ens33; ifdown ens160
systemctl restart network
or just reboot the VM to make sure it will pick up the changes.
The network configuration can be verified with the command: ip a
Set the time zone according to your local clock.
timedatectl set-timezone <timezone>
hwclock --systohc
Configure chrony (NTP daemon) to connect to a valid NTP server.
Edit the file /etc/chrony.conf and add the proper IP addresses or names of those NTP servers.
server <NTP_SERVER_1> iburst
server <NTP_SERVER_2> iburst
Restart chrony daemon: systemctl restart chronyd
Verify the clock is in sync with: chronyc tracking
Prometheus Master Configuration
The next step is to configure Prometheus.
Edit /etc/prometheus/prometheus.yml to include all the IP addresses of all the Swarm components to be monitored, uncomment lines as needed:
# THIS IS THE ELASTICSEARCH EXPORTER DEFINITION
# IP ADDRESS SHOULD BE Telemetry loopback
- job_name: 'elasticsearch'
scrape_interval: 30s
static_configs:
- targets: ['127.0.0.1:9114']
relabel_configs:
- source_labels: [__address__]
regex: "([^:]+):\\d+"
target_label: instance
# THIS IS THE CLOUD CONTENT GATEWAY JOB DEFINITION
# IP ADDRESS SHOULD BE CLOUD GATEWAY STORAGE VLAN IP
- job_name: 'swarmcontentgateway'
static_configs:
- targets: ['<GW_BACKEND_IP>:9100']
relabel_configs:
- source_labels: [__address__]
regex: "([^:]+):\\d+"
target_label: instance
# THIS IS THE CLOUD GATEWAY NODE_EXPORTER JOB DEFINITION
# IP ADDRESS SHOULD BE CLOUD GATEWAY STORAGE VLAN IP
- job_name: 'gateway-nodeexporter'
scrape_interval: 30s
static_configs:
- targets: ['<GW_BACKEND_IP>:9095']
relabel_configs:
- source_labels: [__address__]
regex: "([^:]+):\\d+"
target_label: instance
# THIS IS THE SCS NODE_EXPORTER JOB DEFINITION
# IP ADDRESS SHOULD BE SCS STORAGE VLAN IP
- job_name: 'scs-nodeexporter'
scrape_interval: 30s
static_configs:
- targets: ['<SCS_BACKEND_IP>:9100']
relabel_configs:
- source_labels: [__address__]
regex: "([^:]+):\\d+"
target_label: instance
# THIS IS THE SWARM JOB DEFINITON
# IP ADDRESS SHOULD BE STORAGE VLAN IP
- job_name: 'swarm'
scrape_interval: 30s
static_configs:
- targets: ['<SWARM_NODE1_IP>:9100','<SWARM_NODE2_IP>:9100','<SWARM_NODE3_IP>:9100','<SWARM_NODE4_IP>:9100']
relabel_configs:
- source_labels: [__address__]
regex: "([^:]+):\\d+"
target_label: instance
YAML (.yml) files are quite sensitive to spaces and indentation. The following command will check there are no errors.
promtool check config /etc/prometheus/prometheus.yml
Elasticsearch Node Exporter
To gather statistics and status about Elasticsearch edit:
/usr/lib/systemd/system/elasticsearch_exporter.service updating the IP address of the (first) Elasticsearch VM (instead of the pre-configured 172.29.1.20).
ExecStart = /usr/local/bin/elasticsearch_exporter --es.all --es.cluster_settings --es.indices --es.indices_settings --es.indices_mappings --es.shards --es.snapshots --es.uri
http://<ES_BACKEND_IP>:9200 --es.timeout 20s --web.listen-address :9114
systemctl daemon-reload
Enable and start the service:
systemctl enable elasticsearch_exporter
systemctl start elasticsearch_exporter
Once you have completed the Prometheus master config changes, you can enable the Prometheus service:
systemctl enable prometheus
systemctl restart prometheus
To verify that Prometheus is up and running, open a web browser and go to:
http://<TM_FRONTEND_IP>:9090/targets
This page will show which targets it is currently collecting metrics for and if they are reachable. Click Status and select “Targets”.
Alertmanager Configuration
There are four (4) alerts defined in /etc/prometheus/alert.rules.yml
Service_down: Triggered if any swarm storage node is down for more than 30 minutes.
Gateway_down: Triggered if the cloudgateway service is down for more than 2 minutes.
Elasticsearch_cluster_state: Triggered if the cluster state changed to "red" after 5 minutes.
Swarm_volume_missing: Triggered if reported drive count is decreasing over a period of 10 minutes.
The /etc/prometheus/prometheus.yml contains a section that points to the alertmanager service on port 9093, as well as which alert.rules.yml file to use.
Modify the swarmUI template in /etc/prometheus/alertmanager/template/basic-email.tmpl, this will be used for the email html template showing a button to the chosen URL.
Change the part in bold:
{{ define "__swarmuiURL" }}https://
172.30.10.222:91/_admin/storage/{{ end }}
The configuration for where to send alerts is defined in the file:
/etc/prometheus/alertmanager/alertmanager.yml
By default, the route is disabled as it requires manual input for every specific environment, values such as: SMTP server, username, password (if applicable) etc.
The above configuration file contains an example to configure alerts to be send over a GMAIL account. Adjust the configuration for your own environment and possible own internal SMTP server.
Example configuration for a local SMTP relay in your enterprise environment
name: 'emailchannel'
email_configs:
to: admin@acme.com
from: swarmtelemetry@acme.com
smarthost: smtp.acme.com:25
require_tls: false
send_resolved: true
Once configuration has completed restart the alertmanager:
systemctl restart alertmanager
To verify the alertmanager.yml has the correct syntax run:
amtool check-config /etc/prometheus/alertmanager/alertmanager.yml
It should give the following output:
Checking '/etc/prometheus/alertmanager/alertmanager.yml' SUCCESS
Found:
global config
route
1 inhibit rules
2 receivers
1 templates SUCCESS
Grafana Configuration
The password for the “admin” user can be changed on the configuration file /etc/grafana/grafana.ini, look for admin_password.
For more information, see Documentation | Grafana Labs.
To enable on boot and start the service type:
systemctl enable grafana-server
systemctl restart grafana-server
Grafana has all the Swarm dashboards pre-installed. Open a web browser and go to http://<TM_FRONTEND_IP>
The latest Swarm dashboards are available in the Grafana website.
Dashboard ID | Dashboard Name |
---|---|
16545 | DataCore Swarm AlertManager v15 |
16546 | DataCore Swarm Gateway v7 |
16547 | DataCore Swarm Node View |
16548 | DataCore Swarm System Monitoring v15 |
17057 | DataCore Swarm Search v7 |
19456 | DataCore Swarm Health Processor v1 |
Job Name (Optional)
In /etc/prometheus/prometheus.yml the job_name of the Content Gateway can be defined. This job_name will be displayed on the Content Gateway Grafana dashboard.
If the Content Gateway job_name is changed there are a couple of additional changes required:
Modify the gateway job name in /etc/prometheus/alertmanager/alertmanager.yml it must match what appears in prometheus.yml
routes:
match:
job: <new_job_name> Note: swarmcontentgateway by default
Modify the gateway job name in /etc/prometheus/alert.rules.yml
alert: gateway_down
expr: up{job="<new_job_name>"} == 0
Note: swarmcontentgateway by default
DNS names can be used. In the absence of a DNS server, first modify /etc/hosts file with the desired names for each Swarm storage node and then use those in the configuration file. This is recommended in scenarios where the dashboards are publicly accessible.
Prometheus Retention Time (Optional)
By default, the Prometheus configuration in Telemetry will keep metrics for 30 days, if there is a need to increase or decrease this retention, follow the next steps:
Edit the /root/prometheus.service file.
Select your default retention time for the collected metrics.
Modify the
--storage.tsdb.retention.time=30d
flag to your new desired retention time.
Finally, commit the change:
cp /root/prometheus.service /usr/lib/systemd/system
systemctl daemon-reload
promtool check config /etc/prometheus/prometheus.yml
systemctl restart prometheus
Prometheus Security (Optional)
It may be desirable to restrict Prometheus server to only allow queries from the local host, since Grafana server is running on the same VM. This can be done by editing /root/prometheus.service file and adding the flag --web.listen-address=127.0.0.1:9090
If Prometheus is bind only to localhost, the built-in Prometheus UI on port 9090 will not be accessible remotely.
Swarm Deployment from Scratch
If you are working in an environment where you cannot deploy the standard VMs, whether on a different hypervisor or because of other technological challenges, you have the option of deploying Swarm from the installable components. This should be a last resort, but the option is available if required.
There are no templates for this type of deployment. For each of the Swarm components, the virtual machines must be created. Then, install the Operating System (CentOS / RHEL 7) and finally deploy the Swarm software and configure it.
These steps are described in detail across different sections in the Swarm documentation. The following links provide a good starting point to deploy Swarm from the beginning.
Also, the VM bundle can be used as a guide to configure a new deployment from scratch.
Planning and Storage Nodes Prerequisites
Hardware Requirements for Storage
SCS
Run the Swarm Cluster Services (SCS) Initialization Wizard
Add the Swarm Storage Component
Finalize Swarm Configuration Settings
https://perifery.atlassian.net/wiki/spaces/public/pages/2917138525
https://perifery.atlassian.net/wiki/spaces/public/pages/2917138540
https://perifery.atlassian.net/wiki/spaces/public/pages/2917138553
Elasticsearch
https://perifery.atlassian.net/wiki/spaces/public/pages/2443809601
https://perifery.atlassian.net/wiki/spaces/public/pages/2443809661
https://perifery.atlassian.net/wiki/spaces/public/pages/2443809683
https://perifery.atlassian.net/wiki/spaces/public/pages/2920153159
https://perifery.atlassian.net/wiki/spaces/public/pages/2443814122
Content Gateway
https://perifery.atlassian.net/wiki/spaces/public/pages/2443810099
https://perifery.atlassian.net/wiki/spaces/public/pages/2443810147
https://perifery.atlassian.net/wiki/spaces/public/pages/2443810201
https://perifery.atlassian.net/wiki/spaces/public/pages/2443810287
Telemetry (Prometheus and Grafana)
https://perifery.atlassian.net/wiki/spaces/public/pages/2443812753
© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.