Important
Important
CentOS 7 becomes EOL in June 2024, as a result, it will be required to migrate from CentOS7 to another distribution. DataCore has decided to standardize the Rocky Linux distribution, starting with v8. Therefore, this guide is intended to describe the process of migrating all the components running on CentOS 7 to Rocky Linux 8 (RL8).
Info
This process is intended for Swarm platforms where SCS is the platform server. It is not intended for systems using CSN or systems not running CentOS 7. The process for the Elasticsearch and Gateway components can be used, but as it requires upgrading to the latest versions of each component first, this may not be achievable with older systems. Please contact DataCore support before running this process on any older clusters.
This process effectively calls for the backup of key configuration settings and restoration into a vanilla minimal Rocky Linux 8 installation, rather than an in-place migration. There are third-party tools that will allow in-place migrations, however, DataCore has not tested any of these processes and does not recommend the use of them as they may leave undesirable artifacts.
General Notes
In this document, CentOS 7 has been abbreviated to C7, and Rocky Linux 8 to RL8.
The guide below assumes that the components are all running as virtual machines. If you are running with physical servers, please contact DataCore Support for further advice.
These processes typically assume the reuse of the same IPs and hostnames for both Frontend (also known as Public or External) and Backend (also known as Swarm) networks. The frontend IPs and hostnames can be changed if desired, but please remember to change DNS and/or load-balancers to point to the new frontend IPs. Backend IPs can also be changed in all components EXCEPT Elasticsearch, but please pay attention to any specific notes in the individual process. These IPs must be in the existing backend subnet, but outside of the desired DHCP range.
These processes only copy across configuration files that are created by Swarm installation procedures. It is up to the user to ensure that the process is adapted to account for any additional packages or other server configurations that have been performed outside of the standard Swarm installation.
The processes require that the C7 instance MUST be upgraded to the same major.minor version of Swarm component that will be deployed. This is further described in each section.
The guide below assumes that either a minimal RL8 server has been installed, or that the respective RL8 OVA has been deployed from the DataCore VM bundle.
Prerequisites
A (temporary) frontend IP. This allows copying relevant configuration files and an easy transition from C7 to RL8 instance. This IP can either be temporary, DHCP-assigned, or the desired final frontend IP.
The rsync must be available on both the C7 and the RL8 instances.
If deploying by installing a minimal RL8 instance, this MUST be internet-connected. There is no offline version of this guide at present. Therefore, it is necessary to deploy using the OVA method if internet connection is not possible.
It is recommended that a valid backup and/or VM snapshot are made prior to commencing any upgrades or changes.
SCS Server
Gather Information
On the C7 instance, verify that the support bundle will aid DataCore Support in case of an issue or a requirement to revert changes.
cd /root/dist && ./updateBundle.sh cd /root && /root/dist/techsupport-bundle-grab.sh -U ${SWARM_ADMIN_USER} -A ${SWARM_PASSWORD}
Note down the IPs, Site Name, and Administrator credentials which will be entered EXACTLY the same during the init wizard phase.
Configuration | CentOS 7 | Rocky 8 | How to Find |
---|---|---|---|
Hostname |
| ||
Frontend (External) IP |
| ||
Backend (Swarm) IP | Either | ||
admin.userName | This process is only tested for systems where If you use a different name, please contact DataCore Support for guidance. |
| |
admin.password |
| ||
site_name |
| ||
group name (cluster name) | scsctl storage group list | ||
dhcp_range | Either |
Prepare the (Old) C7 SCS Server
Upgrade SCS to the same version as the packages to be installed on RL8 or the OVA images. It is assumed that you are currently at v1.7.1-el7.
If the existing SCS server is less than 1.7.1, perform the upgrade to 1.7.1. Full instructions for the upgrade can be found here.If static IP addresses were previously assigned before this migration, skip this step and proceed to the “boot/reboot” step (see step 3).
Use case #1: Allow DHCP to assign IP addresses
When the cluster nodes are rebooted, DHCP will assign an IP address to each node. If the nodes have already been booted with manually assigned IPs, the assigned IP addresses will remain the same. If IP addresses were not manually assigned, but existing DHCP leases are still in effect, then currently assigned IP addresses will remain the same. If existing DHCP leases have expired (or this is the first boot of the storage cluster), DHCP will assign IP addresses from the allocated pool in no particular order. When this use case is selected, proceed to the “Boot/reboot” step.Use case #2: Pre-assign an IP address to each storage node
In this case, an IP address must be manually assigned to each storage node before it is rebooted. The procedure for manual assignment is fully explained in Configuring Swarm for Static IPs with Swarm Cluster Services (SCS)).
The summary of the manual assignment procedure is:Get an instance list that provides a node ID to use in the next step.
scsctl storage instance list
Assign an IP address to each node.
scsctl storage config set -d -i {chassis ID} "network.ipV4Address={your.static.address.here}"
If the storage cluster was previously using static IP addresses defined in SCS 1.5 (or an earlier version), remove the customized node.cfg files for each storage node.
scsctl storage config file unset -d -i {chassis ID} node.cfg
Boot/reboot storage nodes.
Create a support bundle. If you run an issue, please contact DataCore Support including the support bundle.
cd /root/dist && ./updateBundle.sh cd /root && /root/dist/techsupport-bundle-grab.sh -U ${SWARM_ADMIN_USER} -A ${SWARM_PASSWORD}
Create a backup of the SCS system configuration and Swarm software repo.
mkdir /root/mig2rl8 scsctl backup create -o /root/mig2rl8/scs_backup_full.YYYYMMDD.tgz
Prepare the (new) RL8 SCS Server
Install a minimal instance of Rocky Linux 8 or deploy the SCS RL8 OVA.
Prepare the basic configuration:
Set the hostname:
hostnamectl set-hostname <HOSTNAME>
Set the timezone, if required:
timedatectl set-timezone <CONTINENT>/<CITY>
Configure the frontend (external) IP, if not configured during installation:
nmcli con mod <NIC_NAME> <SETTING> <SETTING_VALUE>
Do NOT configure the backend (Swarm) network at this time. This will be reconfigured during the SCS initialization.
Ensure that a second virtual disk of at least 100GB is installed and mounted to
/var/log
. This is automatically created and configured during the OVA deployment.Ensure the OS is patched to the latest releases (assuming internet or repo connectivity):
dnf update
.
Configure the (new) RL8 SCS server
Important
The version of SCS installed on the RL8 server must match the major.minor version that is running on the old C7 SCS server, but be aware there is a specific variant for RL8 (-el8).
Verify that the Backup (Swarm) interface is disconnected at the virtual machine level. This allows the configuration below to proceed using the same IP as the old C7 SCS instance, without creating an IP conflict.
This is not necessary if you plan to use a different IP for the Backend Swarm network
Copy the configuration files and the SCS backup to this server.
C7_SCS_FRONTEND_IP=<IP_of_existing_C7_SCS_server> rsync -av -R --ignore-missing-args \ ${C7_SCS_FRONTEND_IP}:/etc/hosts \ ${C7_SCS_FRONTEND_IP}:/etc/chrony.conf \ ${C7_SCS_FRONTEND_IP}:/root/mig2rl8 \ ${C7_SCS_FRONTEND_IP}:/usr/local/bin/node_exporter \ ${C7_SCS_FRONTEND_IP}:/usr/lib/systemd/system/node_exporter.service \ /
Info
This procedure only copies the configuration and backup files. It does NOT copy old log files; any home directories, user information, or customizations other than those configured during initial SCS installation. To copy any additional files, please add them to the command.
This copies the /etc/hosts file - if you are not retaining the frontend IP of the existing C7 instance, this may need to be edited after the rsync command.
RL8 does not automatically enable chronyd, therefore this needs to be manually started. The configuration should have been copied from the old C7 server in the previous set.
systemctl enable chronyd && systemctl start chronyd # And to check it is working correctly chronyc sources
Disable the SELinux.
Check if SELinux is enabled or disabled. In a default RL8 installation, it will be Enforcing. However, if it is already disabled, skip to step 4.
getenforce
Disable SELinux, by editing the
/etc/selinux/config
file, commenting out the lineSELINUX=enforcing
orSELINUX=permissive
, and adding the lineSELINUX=disabled
. Then reboot the server after saving the file.vi /etc/selinux/config ... #SELINUX=enforcing SELINUX=disabled ... reboot
Download the installation package from DataCore, and transfer the SCS installation package to
/root
.cd ${PATH_TO_INSTALLER_PACKAGES} rpm --import RPM-GPG-KEY dnf install -y swarm-scs-VERSION.el8.x86_64.rpm
Caution
This MUST be the -el8 variant. Do NOT install a -el7 variant.
Initialize the SCS server containers and configurations with basic defaults.
scsctl init wizard -a
For Site_Name, use EXACTLY the same name as the old C7 SCS server, and as recorded at the beginning of this process.
For Swarm Administrator Password, use EXACTLY the same password as the old C7 SCS server, and as recorded at the beginning of this process
For the Swarm network IP, you can either specify the same Swarm IP as the existing C7 SCS server or specify a new IP.
** If using a new IP, this must still be in the same subnet and outside of the DHCP range, and you MUST make changes to the Swarm cluster detailed later in this guide.DO NOT perform the “scsctl repo components, add -f [storage bundle file name]” step, or the “scsctl diagnostics config scan_missing” step at this time. Proceed to restore the backup file.
Restore the SCS backup.
scsctl backup restore /root/mig2rl8/scs_backup_full.YYYYMMDD.tgz
Initialize the DHCP server using the same parameters as the old C7 SCS instance and as recorded at the beginning of this process.
scsctl init dhcp --dhcp-reserve-lower ${EXISTING_DHCP_LOWER} --dhcp-reserve-uppper ${EXISTING_DHCP_UPPER}
If required, re-enable SELinux (i.e. reverting step 3). This may take some time to complete.
Re-enable SELinux, by editing the
/etc/selinux/config
file, removing the lineSELINUX=disabled
, and un-commenting out the lineSELINUX=enforcing
orSELINUX=permissive
. Save the file, then reboot the server.vi /etc/selinux/config ... SELINUX=enforcing ... reboot
If the system was deployed with a minimal installation, please install the Swarm support tools.
cd /root curl -O https://support.cloud.datacore.com/tools/updateBundle.sh bash updateBundle.sh fresh rm updateBundle.sh
If the system was deployed with the OVA, the support tools should already be installed, but please update with the command below:
cd /root/dist && ./updateBundle.sh
Migrate to the (New) RL8 SCS server
.At this point, the C7 SCS instance is the live server. This section of the process will finalize the process by disconnecting the C7 instance and making the RL8 server the primary instance.
If you are re-using the same Swarm IP on the backend network, disconnect the backend (Swarm) interface at the VM level for the C7 SCS instance.
Note
At this time, any Swarm logging will be lost.
If, however, you are changing the Swarm IP on the backend network, it is necessary to redirect logging to the RL8 server.
/root/dist/swarmctl -d ${ANY_STORAGE_NODE_IP} -C log.host -V ${RL8_SWARM_IP} -p ${SWARM_ADMIN_USER}:${SWARM_PASSWORD}
At this time, Swarm logging should be directed to this server.
Verify the following:
You are receiving Swarm cluster logs.
tail -F /var/log/datacore/castor.log
You can see the storage nodes and can talk to them as expected.
/root/dist/swarmctl -d ${ANY_STORAGE_NODE_IP}
The settings have migrated correctly.
scsctl storage software list # check current version is active scsctl storage instance list # check it is showing correct scsctl platform config show -d admin.userName # check it is same scsctl platform config show -d admin.password # check it is same
Re-enable and restart the node_exporter, if present.
[[ -e /usr/lib/systemd/system/node_exporter.service ]] && { \ systemctl enable node_exporter; systemctl start node_exporter; }
Info
If you are using a different Swarm backend IP, edit the prometheus.yml config file on your Telemetry server.
If the old C7 SCS backend (Swarm) IP has been retained, it is not strictly necessary to reboot storage nodes, though, it is recommended that you reboot one or all nodes at this time to check that they successfully boot after this migration.
If a new backend (Swarm) IP has been used, it is mandatory to reboot all the storage nodes, to propagate this new IP to the cluster configuration.# To test one storage node /root/dist/swarmctl -d ${ANY_STORAGE_NODE_IP} -Rchassis -p ${SWARM_ADMIN_USER}:${SWARM_PASSWORD} # To perform a rolling reboot on the storage nodes /root/dist/swarmrestart -d ${ANY_STORAGE_NODE_IP} -p ${SWARM_ADMIN_USER}:${SWARM_PASSWORD}
Now, new RL8 SCS is considered “Live”, so the C7 SCS server can be retired or powered off.
Assuming the frontend (public) IP is intended to be the same as the old C7 SCS server; this needs to be reconfigured accordingly, and the interface “bounced” for this to take effect.
nmcli con mod ${SCS_FRONTEND_NIC_NAME} ipv4.addresses ${SCS_FRONTEND_IP}/${FRONTEND_PREFIX} nmcli con mod ${SCS_FRONTEND_NIC_NAME} ipv4.gateway ${GATEWAY} nmcli con mod ${SCS_FRONTEND_NIC_NAME} ipv4.dns ${DNS_SERVER_1},${DNS_SERVER_2} nmcli con mod ${SCS_FRONTEND_NIC_NAME} ipv4.dns-search ${DNS_DOMAIN} nmcli con mod ${SCS_FRONTEND_NIC_NAME} ipv4.method manual nmcli con mod ${SCS_FRONTEND_NIC_NAME} connection.autoconnect yes nmcli con reload nmcli con down ${SCS_FRONTEND_NIC_NAME} && nmcli con up ${SCS_FRONTEND_NIC_NAME}
Any SSH sessions will be terminated when bouncing the interface.
Optional: Upgrade to the Latest Storage Software
Add the new storage bundle to the repo.
scsctl repo component add -f [storage bundle file name]
Activate the new storage bundle version.
scsctl storage software list scsctl storage software activate [storage software version]
Reboot the cluster.
# To perform a rolling reboot on the storage nodes /root/dist/swarmrestart -d ${ANY_STORAGE_NODE_IP} -p ${SWARM_ADMIN_USER}:${SWARM_PASSWORD}
Gateway Server
This process aims to create a new gateway instance and to copy over the configuration files. Then, choose either option:
Option A - adding the (new) RL8 server to the DNS and load-balancer configuration.
Option B - transferring the (old) frontend (public) C7 Gateway IP to the (new) RL8 server.
In a production environment, there will likely be multiple gateway instances that need to be migrated. Therefore, repeat this procedure for each gateway then test each instance thoroughly before proceeding to the next gateway instance.
Gather Information
On the C7 instance, please make a support bundle that will aid DataCore Support in case of an issue or a requirement to revert changes.
cd /root/dist && ./updateBundle.sh cd /root && /root/dist/techsupport-bundle-grab.sh
Note the existing network configuration and plan any new IPs, if required.
Configuration | CentOS 7 | Rocky 8 | How to find |
---|---|---|---|
Hostname |
| ||
Frontend (External) IP |
| ||
Backend (Swarm) IP |
| ||
SCSP Port |
| ||
S3 Port |
| ||
Swarm UI port |
|
Prepare the (new) RL8 Gateway Server
The version of Gateway installed on the RL8 server must match the major.minor version that is running on the old C7 Gateway server.
Install a minimal instance of Rocky Linux 8 or deploy the Gateway RL8 OVA.
Prepare the basic configuration:
Set the hostname:
hostnamectl set-hostname <HOSTNAME>
Set the timezone, if required:
timedatectl set-timezone <CONTINENT>/<CITY>
Configure the frontend (external) IP, if not configured during installation.
nmcli con mod <NIC_NAME> <SETTING> <SETTING_VALUE>
Do NOT configure the backend (Swarm) network at this time.
Verify the OS is patched to the latest releases (assuming internet or repo connectivity):
dnf update
If you have deployed using the Gateway OVA, the software packages are already installed so you can skip this step.
cd ${PATH_TO_INSTALLER_PACKAGES} rpm --import RPM-GPG-KEY dnf install -y caringo-gateway-VERSION.noarch.rpm dnf install -y caringo-gateway-webui-VERSION.noarch.rpm dnf install -y caringo-storage-webui-VERSION.noarch.rpm
If the system was deployed with a minimal installation, please install the Swarm support tools.
cd /root curl -O https://support.cloud.datacore.com/tools/updateBundle.sh bash updateBundle.sh fresh rm updateBundle.sh
If the system was deployed with the OVA, the support tools should already be installed, but please update the following:
cd /root/dist && ./updateBundle.sh
Prepare the (old) C7 Gateway Server
Upgrade Gateway to the same version as the packages to be installed on RL8 or the OVA images. For the purpose of this guide, it is assumed to be 8.0.2 + 7.10.0 + 3.5.0.
Full instructions for the upgrade can be found here, but summarized below:Download the installation package from DataCore, and transfer the GW installation package to
/root
.Upgrade the software and perform the relevant upgrade steps.
cd ${PATH_TO_INSTALLER_PACKAGES} yum install -y caringo-gateway-VERSION.noarch.rpm yum install -y caringo-gateway-webui-VERSION.noarch.rpm yum install -y caringo-storage-webui-VERSION.noarch.rpm
Create a backup of the GW user changes.
mkdir /root/mig2rl8 awk -F: '$3>=1000 && $3<65000 {print $0}' /etc/passwd > /root/mig2rl8/user1000 awk -F: '$3>=1000 {print $0}' /etc/group > /root/mig2rl8/group1000 for USER in $(awk -F: '{print $1}' /root/mig2rl8/user1000); do grep "^$USER:" /etc/shadow; done > /root/mig2rl8/shadow1000
Configure the (new) RL8 Gateway Server
Copy the configuration files and the GW backup to this server.
C7_GW_FRONTEND_IP=<IP_of_existing_C7_GW_server> rsync -av -R --ignore-missing-args \ ${C7_GW_FRONTEND_IP}:/etc/hosts \ ${C7_GW_FRONTEND_IP}:/etc/chrony.conf \ ${C7_GW_FRONTEND_IP}:/root/mig2rl8 \ ${C7_GW_FRONTEND_IP}:/etc/caringo/cloudgateway \ ${C7_GW_FRONTEND_IP}:/etc/sysconfig/cloudgateway \ ${C7_GW_FRONTEND_IP}:/usr/local/bin/node_exporter \ ${C7_GW_FRONTEND_IP}:/usr/lib/systemd/system/node_exporter.service \ /
Note
This procedure only copies the configuration and backup files. It does NOT copy old log files, home directories, user information, or customizations other than those configured during initial GW installation. Please add any additional files to the command if you wish to copy them. This copies the /etc/hosts file - if you are not retaining the frontend IP of the existing C7 instance, this may need to be edited after the rsync command
RL8 does not automatically enable chronyd, therefore, start this manually. The configuration should have been copied from the old C7 server in the previous set.
systemctl enable chronyd && systemctl start chronyd # And to check it is working correctly chronyc sources
Recreate the frontend (public) firewall configurations.
firewall-cmd --zone=public --add-port=${SCSP_PORT}/tcp --permanent firewall-cmd --zone=public --add-port=${S3_PORT}/tcp --permanent firewall-cmd --zone=public --add-port=${SWARMUI_PORT}/tcp --permanent firewall-cmd --zone=public --add-port=9095/tcp --permanent firewall-cmd --zone=public --add-port=9100/tcp --permanent firewall-cmd --reload
If the current C7 system is using HAproxy or another SSL offload engine, please verify that these ports are also added to the public zone on the firewall.
[Optional] Create a backend (swarm) firewall configuration .
firewall-cmd --new-zone swarm --permanent firewall-cmd --reload firewall-cmd --zone=swarm --add-service=http --permanent firewall-cmd --zone=swarm --add-service=ssh --permanent firewall-cmd --zone=swarm --add-port=9095/tcp --permanent firewall-cmd --zone=swarm --add-port=9100/tcp --permanent firewall-cmd --reload
Append the PAM user configurations.
cat /root/mig2rl8/user1000 >> /etc/passwd cat /root/mig2rl8/group1000 >> /etc/group cat /root/mig2rl8/shadow1000 >> /etc/shadow
Migrate to the (new) RL8 Gateway Server
.At this point, the C7 GW instance is still a live server. This section of the process will finalize the process by either adding the new server to the DNS/load-balancer or disconnecting the C7 instance and making the RL8 server the primary instance.
Option A - Adding to DNS/Load-Balancer
The frontend (external) interface was already configured during the initial preparations, so there is nothing that needs to be done here.
Set up the networking for the Backend (Swarm) network. This must be a different IP from the current C7 instance, must be in the Swarm subnet, must be outside of the DHCP range and must be unique.
nmcli con mod ${GW_BACKEND_NIC_NAME} ipv4.addresses ${GW_BACKEND_IP}/${BACKEND_PREFIX} nmcli con mod ${GW_BACKEND_NIC_NAME} ipv4.method manual nmcli con mod ${GW_BACKEND_NIC_NAME} connection.autoconnect yes nmcli con mod ${GW_BACKEND_NIC_NAME} connection.zone swarm nmcli con reload nmcli con down ${GW_BACKEND_NIC_NAME} && nmcli con up ${GW_BACKEND_NIC_NAME}
Proceed to test the gateway.
Option B - Replacing the Existing C7 Instance
Remove (or comment) the C7 gateway’s frontend (external) IP from the DNS and/or Load-Balancer configuration. This is not mandatory, but failure to do so may result in application-level issues trying to access an unresponsive gateway.
For the C7 Gateway instance, disconnect both the frontend and backend network ports, at the VM level.
Re-configure the frontend (external) interface to match the configuration of the old C7 gateway, and the interface will need to be “bounced” for this to take effect.
nmcli con mod ${GW_FRONTEND_NIC_NAME} ipv4.addresses ${GW_FRONTEND_IP}/${FRONTEND_PREFIX} nmcli con mod ${GW_FRONTEND_NIC_NAME} ipv4.gateway ${GATEWAY} nmcli con mod ${GW_FRONTEND_NIC_NAME} ipv4.dns ${DNS_SERVER_1},${DNS_SERVER_2} nmcli con mod ${GW_FRONTEND_NIC_NAME} ipv4.dns-search ${DNS_DOMAIN} nmcli con mod ${GW_FRONTEND_NIC_NAME} ipv4.method manual nmcli con mod ${GW_FRONTEND_NIC_NAME} connection.autoconnect yes nmcli con reload nmcli con down ${GW_FRONTEND_NIC_NAME} && nmcli con up ${GW_FRONTEND_NIC_NAME}
Any ssh sessions will be terminated when bouncing the interface.
Set up the networking for the Backend (Swarm) network. This will be the same as the current C7 instance.
nmcli con mod ${GW_BACKEND_NIC_NAME} ipv4.addresses ${GW_BACKEND_IP}/${BACKEND_PREFIX} nmcli con mod ${GW_BACKEND_NIC_NAME} ipv4.method manual nmcli con mod ${GW_BACKEND_NIC_NAME} connection.autoconnect yes nmcli con mod ${GW_BACKEND_NIC_NAME} connection.zone swarm nmcli con reload nmcli con down ${GW_BACKEND_NIC_NAME} && nmcli con up ${GW_BACKEND_NIC_NAME}
Testing
Verify that the storage nodes are present and talk to them as expected.
/root/dist/swarmctl -d ${ANY_STORAGE_NODE_IP} curl -I http://${ANY_STORAGE_NODE_IP}
Restart the cloudgateway services, and validate they start correctly.
systemctl restart cloudgateway systemctl status cloudgateway
Verify that the gateway is performing correctly.
# This should be unauthorized, but returning the relevant "Cloud Gateway" and "CAStor Cluster" header curl -I http://${GW_FRONTEND_IP}:${SCSP_PORT} # This should return a full set of header information curl -I -u ${VALID_USER}:${VALID_PASSWORD} http://${GW_FRONTEND_IP}:${SCSP_PORT}
Check via a browser that you are able to access http://${GW_FRONTEND_IP}:${SCSP_PORT}/_admin/portal and login as normal.
Perform other tests, as applicable and able, to the gateway’s frontend IP.
Recommend rebooting the server and verifying all services come up as expected following a reboot.
Finalization
Now, this gateway is ready to be made available to the applications again.
Add (or uncomment) the RL8 gateway’s frontend (external) IP into the DNS and/or Load-Balancer configuration.
Remove the C7 gateway’s frontend (external) IP from the DNS and/or Load-Balancer configuration if necessaary.
Power off the C7 gateway server.
Repeat the process for all other gateway servers.
Elasticsearch Server
There are two approaches to this upgrade:
Suspending a single node, transferring the (old) C7 Elasticsearch IP, configuration, and data to a (new) RL8 server, and then resuming with the RL8 server.
Adding a (new) RL8 Elasticsearch server to the Elasticsearch cluster, allowing data to balance across, and then retiring an (old) C7 Elasticsearch server.
In a production environment, there will likely be multiple Elasticsearch nodes, that all need to be migrated. Therefore, repeat this procedure for each node, waiting for each node to complete before proceeding to the next Elasticsearch instance.
Gather Information
On the C7 instance, please make a support bundle that will aid DataCore Support in the event of an issue or a requirement to revert changes.
cd /root/dist && ./updateBundle.sh cd /root && /root/dist/techsupport-bundle-grab.sh
Note the existing network configuration and plan any new IPs, if required.
Configuration | CentOS 7 | Rocky 8 | How to find |
---|---|---|---|
Hostname |
| ||
Frontend (External) IP |
| ||
Backend (Swarm) IP |
| ||
ES Data location |
|
Upgrade the (old) C7 Elasticsearch cluster
Upgrade Elasticsearch cluster to the same version as the packages to be installed on RL8 or the OVA images. For the purpose of this guide, it is assumed to be 7.17.14.
Full instructions for the upgrade can be found here Upgrading Elasticsearch
Prepare the (new) RL8 Elasticsearch server
The version of Elasticsearch installed on the RL8 server must match the major.minor version that is running on the old C7 Gateway server.
Install a minimal instance of Rocky Linux 8 or deploy the Elasticsearch RL8 OVA.
Prepare the basic configuration:
Set the hostname:
hostnamectl set-hostname <HOSTNAME>
Set the timezone, if required:
timedatectl set-timezone <CONTINENT>/<CITY>
Configure the Frontend (External) IP, if not configured during installation:
nmcli con mod <NIC_NAME> <SETTING> <SETTING_VALUE>
Do NOT configure the backend (Swarm) network at this time.
Verify the OS is patched to the latest releases (assuming internet or repo connectivity):
dnf update
Verify that a second virtual disk is installed for Elasticsearch data. This disk must match or must be better in the performance and capacity of the existing C7 Elasticsearch nodes, and must be mounted to the ES data location identified at the beginning of the process (
Defaults to /var/lib/elasticsearch
). This is automatically created and configured during the OVA deployment.If you have deployed using the Elasticsearch OVA, the software packages are already installed so you can skip this step.
cd ${PATH_TO_INSTALLER_PACKAGES} rpm --import RPM-GPG-KEY rpm --import GPG-KEY-elasticsearch dnf install -y elasticsearch-VERSION.rpm dnf install -y caringo-elasticsearch-search-VERSION.noarch.rpm
If the system was deployed with a minimal installation, please install the Swarm support tools.
cd /root curl -O https://support.cloud.datacore.com/tools/updateBundle.sh bash updateBundle.sh fresh rm updateBundle.sh
If the system was deployed with the OVA, the support tools should already be installed, but please update the following:
cd /root/dist && ./updateBundle.sh
Option A - Transferring Data from C7 to (new) RL8 Instance
This process involves copying a potentially large amount of index data from the old C7 instance to the new RL8 instance. This may take some time to complete, which may be undesirable. In virtualized environments, it may be possible to move (or clone) the index disk from the old C7 instance to the RL8 instance. This may be substantially faster but may represent higher risk, especially for rollback. If this method is desirable, please refer to DataCore Support for further assistance.
Prepare the (old) C7 Elasticsearch Server
No Frontend NIC has been disabled or the NIC has been disconnected at the VM layer on many deployments. You need to enable or reconnect this interface to allow copying data to the RL8 server.
Discover the current master node. It is recommended to always upgrade the master node as the last node, so there is only one re-election of the master.
curl http://${ES_BACKEND_IP}:9200/_cat/nodes?v
The master node appears with an “*” in the master column.
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 10.10.0.12 18 62 0 0.01 0.02 0.05 cdfhimrstw * 10.10.0.12 <=== this is the master 10.10.0.11 7 69 0 0.06 0.01 0.00 cdfhimrstw - 10.10.0.11 10.10.0.13 32 62 0 0.00 0.01 0.05 cdfhimrstw - 10.10.0.13
Verify that the Elasticsearch cluster is healthy before commencing any migrations. Do NOT proceed if the status is either YELLOW or RED.
curl http://${ES_BACKEND_IP}:9200/_cluster/health?pretty
Disable shard allocation and balancing on Elasticsearch cluster.
curl -XPUT -H'Content-Type: application/json' http://${ES_BACKEND_IP}:9200/_cluster/settings -d '{ "transient":{ "cluster.routing.allocation.enable": "none" } }'
Stop the Elasticsearch service on this C7 node.
systemctl stop elasticsearch
Copy configuration and ES data to the (new) RL8 instance.
rsync -av -R --ignore-missing-args \ /etc/hosts \ /etc/chrony.conf \ /etc/firewalld/services/elasticserach.xml \ /etc/elasticsearch/elasticsearch.yml \ /etc/elasticsearch/jvm.options \ /etc/security/limits.d/10-caringo-elasticsearch.conf \ /etc/sysconfig/elasticsearch \ /etc/systemd/system/elasticsearch.service.d/override.conf \ ${ES_DATA_PATH} \ root@${RL8_FRONTEND_IP}:/
Disconnect the backend (Swarm) network at the VM level.
Configure the (new) RL8 Elasticsearch Server
RL8 does not automatically enable chronyd, therefore, start it manually. The configuration should have been copied from the old C7 server in the previous set.
systemctl enable chronyd && systemctl start chronyd # And to check it is working correctly chronyc sources
Create a zone for the Swarm network, and allow Elasticsearch and SSH traffic through this interface.
firewall-cmd --new-zone swarm --permanent firewall-cmd --reload firewall-cmd --zone=swarm --add-service=elasticsearch --permanent firewall-cmd --zone=swarm --add-service=ssh --permanent firewall-cmd --reload
Set up the networking for the Backend (Swarm) network. This must be the same as the IP on the current C7 instance.
nmcli con mod ${ES_BACKEND_NIC_NAME} ipv4.addresses ${ES_BACKEND_IP}/${BACKEND_PREFIX} nmcli con mod ${ES_BACKEND_NIC_NAME} ipv4.method manual nmcli con mod ${ES_BACKEND_NIC_NAME} connection.autoconnect yes nmcli con mod ${ES_BACKEND_NIC_NAME} connection.zone swarm nmcli con reload nmcli con down ${ES_BACKEND_NIC_NAME} && nmcli con up ${ES_BACKEND_NIC_NAME}
Verify the storage nodes are present and talk to them as expected.
/root/dist/swarmctl -d ${ANY_STORAGE_NODE_IP} curl -I http://${ANY_STORAGE_NODE_IP}
Migrate to the (new) RL8 Elasticsearch Server
The Frontend (External) interface was already configured during the initial preparations so there is nothing that needs to be done here
Start the Elasticsearch service and verify that it has started correctly.
systemctl start elasticsearch systemctl status elasticsearch
Enable Elasticsearch service to start on system boot.
systemctl enable elasticsearch
Verify the Elasticsearch node is joined to the Elasticsearch cluster.
curl http://${ES_BACKEND_IP}:9200/_cluster/health?pretty curl http://${ES_BACKEND_IP}:9200/_cat/nodes?v
Enable shard allocation and balancing on Elasticsearch cluster.
curl -XPUT -H'Content-Type: application/json' http://${ES_BACKEND_IP}:9200/_cluster/settings -d '{ "transient":{ "cluster.routing.allocation.enable": "all" } }'
Wait until the health status of Elasticsearch changes from YELLOW to GREEN.
curl http://${ES_BACKEND_IP}:9200/_cluster/health?pretty
If required, disconnect the frontend (external) network on the (old) C7 instance, at the VM level,
Re-configure the frontend (external) interface on the RL8 instance to match the previous configuration of the (old) C7 ES instance. The interface needs to be “bounced” for this to take effect.nmcli con mod ${ES_FRONTEND_NIC_NAME} ipv4.addresses ${ES_FRONTEND_IP}/${FRONTEND_PREFIX} nmcli con mod ${ES_FRONTEND_NIC_NAME} ipv4.gateway ${GATEWAY} nmcli con mod ${ES_FRONTEND_NIC_NAME} ipv4.dns ${DNS_SERVER_1},${DNS_SERVER_2} nmcli con mod ${ES_FRONTEND_NIC_NAME} ipv4.dns-search ${DNS_DOMAIN} nmcli con mod ${ES_FRONTEND_NIC_NAME} ipv4.method manual nmcli con mod ${ES_FRONTEND_NIC_NAME} connection.autoconnect no nmcli con reload nmcli con down ${ES_FRONTEND_NIC_NAME}
Info
Any SSH sessions will be terminated when bouncing the interface. Please reconnect via the backend (Swarm) network.
It is recommended that the frontend interface is not open. The configuration below takes care of this by setting auto-connect to “no”, and bringing the interface down.
Repeat the above Elasticsearch procedure for all the nodes in the cluster.
Option B - Adding New RL8 Nodes to the Existing Cluster and Retiring Old C7 Nodes
<TBD>