Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 27 Next »

Important

CentOS 7 becomes EOL in June 2024, as a result, it will be required to migrate from CentOS7 to another distribution. DataCore has decided to standardize the Rocky Linux distribution, starting with v8. Therefore, this guide is intended to describe the process of migrating all the components running on CentOS 7 to Rocky Linux 8 (RL8).

Info

This process is intended for Swarm platforms where SCS is the platform server. It is not intended for systems using CSN or for systems not running CentOS 7. The process for the Elasticsearch and Gateway components can be used, but as it requires upgrading to the latest versions of each component first, this may not be achievable with older systems. Please contact DataCore support before running this process on any older clusters.

This process effectively calls for the backup of key configuration settings and restoration into a vanilla minimal Rocky Linux 8 installation, rather than an in-place migration. There are third-party tools that will allow in-place migrations, however, DataCore has not tested any of these processes and does not recommend the use of them as they may leave undesirable artifacts.

General Notes

  • In this document, CentOS 7 has been abbreviated to C7, and Rocky Linux 8 to RL8.

  • The guide below assumes that the components are all running as virtual machines. If you are running with physical servers, please contact DataCore Support for further advice.

  • These processes typically assume the reuse of the same IPs and hostnames for both Frontend (also known as Public or External) and Backend (also known as Swarm) networks. The frontend IPs and hostnames can be changed if desired, but please remember to change DNS and/or load-balancers to point to the new frontend IPs. Backend IPs can also be changed in all components EXCEPT Elasticsearch, but please pay attention to any specific notes in the individual process. These IPs must be in the existing backend subnet, but outside of the desired DHCP range.

  • These processes only copy across configuration files that are created by Swarm installation procedures. It is up to the user to ensure that the process is adapted to account for any additional packages or other server configurations that have been performed outside of the standard Swarm installation.

  • The processes require that the C7 instance MUST be upgraded to the same major.minor version of Swarm component that will be deployed. This is further described in each section.

  • The guide below assumes that either a minimal RL8 server has been installed, or that the respective RL8 OVA has been deployed from the DataCore VM bundle.

Prerequisites

  • A (temporary) frontend IP. This allows copying relevant configuration files and an easy transition from C7 to RL8 instance. This IP can either be temporary, DHCP-assigned, or the desired final frontend IP.

  • The rsync must be available on both the C7 and the RL8 instances.

  • If deploying by installing a minimal RL8 instance, this MUST be internet-connected. There is no offline version of this guide at present. Therefore, it is necessary to deploy using the OVA method if internet connection is not possible.

  • It is recommended that a valid backup and/or VM snapshot are made prior to commencing any upgrades or changes.

SCS Server

Prepare the (Old) C7 SCS Server

Upgrade SCS to the same version as the packages to be installed on RL8 or the OVA images. It is assumed that you are currently at v1.7.1-el7.
If the existing SCS server is less than 1.7.1, perform the upgrade to 1.7.1. Full instructions for the upgrade can be found here.

Gather Information

On the C7 instance, verify that the support bundle will aid DataCore Support in case of an issue or a requirement to revert changes.

cd /root/dist && ./updateBundle.sh
cd /root && /root/dist/techsupport-bundle-grab.sh -U ${SWARM_ADMIN_USER} -A ${SWARM_PASSWORD}

Note down the IPs, Site Name, and Administrator credentials which will be entered EXACTLY the same during the init wizard phase.

Configuration

CentOS 7

Rocky 8

How to Find

Hostname

hostname

Frontend (External) IP

ip a

Backend (Swarm) IP

Either ip a
or scsctl platform config show -d logging.syslogHost

admin.userName

This process is only tested for systems where
admin.userName=admin.

If you use a different name, please contact DataCore Support for guidance.

scsctl platform config show -d admin.userName

admin.password

scsctl platform config show -d admin.password

site_name

scsctl platform group list | awk 'sub("global.platform.","",$1) {print $1}'

group name (cluster name)

scsctl storage group list

dhcp_range

Either history | grep 'scsctl init dhcp'
or discovered from sed -n -e '/subnet /,/\\}/p' /etc/dhcp/dhcpd.conf | grep '^\s*range'

Create a backup of the SCS system configuration and Swarm software repo.

mkdir /root/mig2rl8
scsctl backup create -o /root/mig2rl8/scs_backup_full.YYYYMMDD.tgz

Prepare the (new) RL8 SCS Server

  1. Install a minimal instance of Rocky Linux 8 or deploy the SCS RL8 OVA.

  2. Prepare the basic configuration:

    1. Set the hostname: hostnamectl set-hostname <HOSTNAME>

    2. Set the timezone, if required: timedatectl set-timezone <CONTINENT>/<CITY>

    3. Configure the frontend (external) IP, if not configured during installation:
      nmcli con mod <NIC_NAME> <SETTING> <SETTING_VALUE>

    4. Do NOT configure the backend (Swarm) network at this time. This will be reconfigured during the SCS initialization.

  3. Ensure that a second virtual disk of at least 100GB is installed and mounted to /var/log. This is automatically created and configured during the OVA deployment.

  4. Ensure the OS is patched to the latest releases (assuming internet or repo connectivity): dnf update.

Configure the (new) RL8 SCS server

Important

  • The version of SCS installed on the RL8 server must match the major.minor version that is running on the old C7 SCS server, but be aware there is a specific variant for RL8 (-el8).

  1. Copy the configuration files and the SCS backup to this server.

    C7_SCS_FRONTEND_IP=<IP_of_existing_C7_SCS_server>
    rsync -av -R --ignore-missing-args \
       ${C7_SCS_FRONTEND_IP}:/etc/hosts       \
       ${C7_SCS_FRONTEND_IP}:/etc/chrony.conf \
       ${C7_SCS_FRONTEND_IP}:/root/mig2rl8    \
       ${C7_SCS_FRONTEND_IP}:/usr/local/bin/node_exporter \
       ${C7_SCS_FRONTEND_IP}:/usr/lib/systemd/system/node_exporter.service \
       /

Info

  • This procedure only copies the configuration and backup files. It does NOT copy old log files; any home directories, user information, or customizations other than those configured during initial SCS installation. To copy any additional files, please add them to the command.

  • This copies the /etc/hosts file - if you are not retaining the frontend IP of the existing C7 instance, this may need to be edited after the rsync command.

  1. RL8 does not automatically enable chronyd, therefore this needs to be manually started. The configuration should have been copied from the old C7 server in the previous set.

    systemctl enable chronyd && systemctl start chronyd
    
    # And to check it is working correctly
    chronyc sources
  2. Disable the SELinux.

    1. Check if SELinux is enabled or disabled. In a default RL8 installation, it will be Enforcing. However, if it is already disabled, skip to step 4.

      getenforce
    2. Disable SELinux, by editing the /etc/selinux/config file, commenting out the line SELINUX=enforcing or SELINUX=permissive, and adding the line SELINUX=disabled. Then reboot the server after saving the file.

      vi /etc/selinux/config
      ...
      #SELINUX=enforcing
      SELINUX=disabled
      ...
      
      reboot
  3. Download the installation package from DataCore, and transfer the SCS installation package to /root.

    cd ${PATH_TO_INSTALLER_PACKAGES}
    rpm --import RPM-GPG-KEY
    dnf install -y swarm-scs-VERSION.el8.x86_64.rpm

Caution

This MUST be the -el8 variant. Do NOT install a -el7 variant.

  1. Initialize the SCS server containers and configurations with basic defaults.

    scsctl init wizard -a
    1. For Site_Name, use EXACTLY the same name as the old C7 SCS server, and as recorded at the beginning of this process.

    2. For Swarm Administrator Password, use EXACTLY the same password as the old C7 SCS server, and as recorded at the beginning of this process

    3. For the Swarm network IP, you can either specify the same Swarm IP as the existing C7 SCS server or specify a new IP.
      ** If using a new IP, this must still be in the same subnet and outside of the DHCP range, and you MUST make changes to the Swarm cluster detailed later in this guide.

    4. DO NOT perform the “scsctl repo components, add -f [storage bundle file name]” step, or the “scsctl diagnostics config scan_missing” step at this time. Proceed to restore the backup file.

  2. Restore the SCS backup.

    scsctl backup restore /root/mig2rl8/scs_backup_full.YYYYMMDD.tgz
  3. Initialize the DHCP server using the same parameters as the old C7 SCS instance and as recorded at the beginning of this process.

    scsctl init dhcp --dhcp-reserve-lower ${EXISTING_DHCP_LOWER} --dhcp-reserve-uppper ${EXISTING_DHCP_UPPER}
  4. If required, re-enable SELinux (i.e. reverting step 3). This may take some time to complete.

    Re-enable SELinux, by editing the /etc/selinux/config file, removing the line SELINUX=disabled, and un-commenting out the line SELINUX=enforcing or SELINUX=permissive. Save the file, then reboot the server.

    vi /etc/selinux/config
    ...
    SELINUX=enforcing
    ...
    
    reboot
  5. If the system was deployed with a minimal installation, please install the Swarm support tools.

    cd /root
    curl -O https://support.cloud.datacore.com/tools/updateBundle.sh
    bash updateBundle.sh fresh
    rm updateBundle.sh

    If the system was deployed with the OVA, the support tools should already be installed, but please update with the command below:

    cd /root/dist && ./updateBundle.sh

Migrate to the (New) RL8 SCS Server

At this point, the C7 SCS instance is the live server. This section of the process will finalize the process by disconnecting the C7 instance and making the RL8 server the primary instance.

  1. If you are re-using the same Swarm IP on the backend network, disconnect the backend (Swarm) interface at the VM level for the C7 SCS instance.

Note

At this time, any Swarm logging will be lost.

If, however, you are changing the Swarm IP on the backend network, it is necessary to redirect logging to the RL8 server.

/root/dist/swarmctl -d ${ANY_STORAGE_NODE_IP} -C log.host -V ${RL8_SWARM_IP} -p ${SWARM_ADMIN_USER}:${SWARM_PASSWORD}

At this time, Swarm logging should be directed to this server.

  1. Verify the following:

    1. You are receiving Swarm cluster logs.

      tail -F /var/log/datacore/castor.log
    2. You can see the storage nodes and can talk to them as expected.

      /root/dist/swarmctl -d ${ANY_STORAGE_NODE_IP}
    3. The settings have migrated correctly.

      scsctl storage software list                      # check current version is active
      scsctl storage instance list                      # check it is showing correct
      scsctl platform config show -d admin.userName     # check it is same
      scsctl platform config show -d admin.password     # check it is same
  2. Re-enable and restart the node_exporter, if present.

    [[ -e /usr/lib/systemd/system/node_exporter.service ]] && { \ systemctl enable node_exporter; systemctl start node_exporter; }

Info

If you are using a different Swarm backend IP, edit the prometheus.yml config file on your Telemetry server.

  1. If the old C7 SCS backend (Swarm) IP has been retained, it is not strictly necessary to reboot storage nodes, though, it is recommended that you reboot one or all nodes at this time to check that they successfully boot after this migration.
    If a new backend (Swarm) IP has been used, it is mandatory to reboot all the storage nodes, to propagate this new IP to the cluster configuration.

    # To test one storage node
    /root/dist/swarmctl -d ${ANY_STORAGE_NODE_IP} -Rchassis -p ${SWARM_ADMIN_USER}:${SWARM_PASSWORD}
    
    # To perform a rolling reboot on the storage nodes
    /root/dist/swarmrestart -d ${ANY_STORAGE_NODE_IP} -p ${SWARM_ADMIN_USER}:${SWARM_PASSWORD}
  2. Now, new RL8 SCS is considered “Live”, so the C7 SCS server can be retired or powered off.

  3. Assuming the frontend (public) IP is intended to be the same as the old C7 SCS server; this needs to be reconfigured accordingly, and the interface “bounced” for this to take effect.

    nmcli con mod ${SCS_FRONTEND_NIC_NAME} ipv4.addresses ${SCS_FRONTEND_IP}/${FRONTEND_PREFIX}
    nmcli con mod ${SCS_FRONTEND_NIC_NAME} ipv4.gateway ${GATEWAY}
    nmcli con mod ${SCS_FRONTEND_NIC_NAME} ipv4.dns ${DNS_SERVER_1},${DNS_SERVER_2} 
    nmcli con mod ${SCS_FRONTEND_NIC_NAME} ipv4.dns-search ${DNS_DOMAIN} 
    nmcli con mod ${SCS_FRONTEND_NIC_NAME} ipv4.method manual
    nmcli con mod ${SCS_FRONTEND_NIC_NAME} connection.autoconnect yes
    nmcli con reload
    nmcli con down ${SCS_FRONTEND_NIC_NAME} && nmcli con up ${SCS_FRONTEND_NIC_NAME}

Any SSH sessions will be terminated when bouncing the interface.

Optional: Upgrade to the Latest Storage Software

  1. Add the new storage bundle to the repo.

    scsctl repo component add -f [storage bundle file name]
  2. Activate the new storage bundle version.

    scsctl storage software list
    scsctl storage software activate [storage software version]
  3. Reboot the cluster.

    # To perform a rolling reboot on the storage nodes
    /root/dist/swarmrestart -d ${ANY_STORAGE_NODE_IP} -p ${SWARM_ADMIN_USER}:${SWARM_PASSWORD}

Gateway Server

This process aims to create a new gateway instance and to copy over the configuration files. Then, choose either option:

  • Option A - adding the (new) RL8 server to the DNS and load-balancer configuration.

  • Option B - transferring the (old) frontend (public) C7 Gateway IP to the (new) RL8 server.

In a production environment, there will likely be multiple gateway instances that need to be migrated. Therefore, repeat this procedure for each gateway, then test each instance thoroughly before proceeding to the next gateway instance.

Gather Information

On the C7 instance, please make a support bundle that will aid DataCore Support in case of an issue or a requirement to revert changes.

cd /root/dist && ./updateBundle.sh
cd /root && /root/dist/techsupport-bundle-grab.sh

Note the existing network configuration and plan any new IPs, if required.

Configuration

CentOS 7

Rocky 8

How to find

Hostname

hostname

Frontend (External) IP

ip a

Backend (Swarm) IP

ip a

SCSP Port

sed -n '/\[scsp\]/,/\[/p' /etc/caringo/cloudgateway/gateway.cfg | grep bindPort

S3 Port

sed -n '/\[s3\]/,/\[/p' /etc/caringo/cloudgateway/gateway.cfg | grep bindPort

Swarm UI port

sed -n '/\[cluster_admin\]/,/\[/p' /etc/caringo/cloudgateway/gateway.cfg | grep bindPort

Prepare the (new) RL8 Gateway Server

The version of Gateway installed on the RL8 server must match the major.minor version that is running on the old C7 Gateway server.

  1. Install a minimal instance of Rocky Linux 8 or deploy the Gateway RL8 OVA.

  2. Prepare the basic configuration:

    1. Set the hostname: hostnamectl set-hostname <HOSTNAME>

    2. Set the timezone, if required: timedatectl set-timezone <CONTINENT>/<CITY>

    3. Configure the frontend (external) IP, if not configured during installation.
      nmcli con mod <NIC_NAME> <SETTING> <SETTING_VALUE>

    4. Do NOT configure the backend (Swarm) network at this time.

  3. Verify the OS is patched to the latest releases (assuming internet or repo connectivity): dnf update

  4. If you have deployed using the Gateway OVA, the software packages are already installed so you can skip this step.

    cd ${PATH_TO_INSTALLER_PACKAGES}
    rpm --import RPM-GPG-KEY
    dnf install -y caringo-gateway-VERSION.noarch.rpm
    dnf install -y caringo-gateway-webui-VERSION.noarch.rpm
    dnf install -y caringo-storage-webui-VERSION.noarch.rpm
  5. If the system was deployed with a minimal installation, please install the Swarm support tools.

    cd /root
    curl -O https://support.cloud.datacore.com/tools/updateBundle.sh
    bash updateBundle.sh fresh
    rm updateBundle.sh

    If the system was deployed with the OVA, the support tools should already be installed, but please update the following:

    cd /root/dist && ./updateBundle.sh

Prepare the (old) C7 Gateway Server

  1. Upgrade Gateway to the same version as the packages to be installed on RL8 or the OVA images. For the purpose of this guide, it is assumed to be 8.0.2 + 7.10.0 + 3.5.0.
    Full instructions for the upgrade can be found here, but summarized below:

    1. Download the installation package from DataCore, and transfer the GW installation package to /root.

    2. Upgrade the software and perform the relevant upgrade steps.

      cd ${PATH_TO_INSTALLER_PACKAGES}
      yum install -y caringo-gateway-VERSION.noarch.rpm
      yum install -y caringo-gateway-webui-VERSION.noarch.rpm
      yum install -y caringo-storage-webui-VERSION.noarch.rpm
  2. Create a backup of the GW user changes.

    mkdir /root/mig2rl8
    awk -F: '$3>=1000 && $3<65000 {print $0}' /etc/passwd > /root/mig2rl8/user1000
    awk -F: '$3>=1000 {print $0}' /etc/group > /root/mig2rl8/group1000
    for USER in $(awk -F: '{print $1}' /root/mig2rl8/user1000); do grep "^$USER:" /etc/shadow; done > /root/mig2rl8/shadow1000

Configure the (new) RL8 Gateway Server

  1. Copy the configuration files and the GW backup to this server.

    C7_GW_FRONTEND_IP=<IP_of_existing_C7_GW_server>
    rsync -av -R --ignore-missing-args \
       ${C7_GW_FRONTEND_IP}:/etc/hosts \
       ${C7_GW_FRONTEND_IP}:/etc/chrony.conf \
       ${C7_GW_FRONTEND_IP}:/root/mig2rl8 \
       ${C7_GW_FRONTEND_IP}:/etc/caringo/cloudgateway \
       ${C7_GW_FRONTEND_IP}:/etc/sysconfig/cloudgateway \
       ${C7_GW_FRONTEND_IP}:/usr/local/bin/node_exporter \
       ${C7_GW_FRONTEND_IP}:/usr/lib/systemd/system/node_exporter.service \
       /

Note

  • This procedure only copies the configuration and backup files. It does NOT copy old log files, home directories, user information, or customizations other than those configured during initial GW installation. Please add any additional files to the command if you wish to copy them. This copies the /etc/hosts file - if you are not retaining the frontend IP of the existing C7 instance, this may need to be edited after the rsync command

  1. RL8 does not automatically enable chronyd, therefore, start this manually. The configuration should have been copied from the old C7 server in the previous set.

    systemctl enable chronyd && systemctl start chronyd
    
    # And to check it is working correctly
    chronyc sources
  2. Recreate the frontend (public) firewall configurations.

    firewall-cmd --zone=public --add-port=${SCSP_PORT}/tcp --permanent
    firewall-cmd --zone=public --add-port=${S3_PORT}/tcp --permanent
    firewall-cmd --zone=public --add-port=${SWARMUI_PORT}/tcp --permanent
    firewall-cmd --zone=public --add-port=9095/tcp --permanent
    firewall-cmd --zone=public --add-port=9100/tcp --permanent
    firewall-cmd --reload

If the current C7 system is using HAproxy or another SSL offload engine, please verify that these ports are also added to the public zone on the firewall.

  1. [Optional] Create a backend (swarm) firewall configuration .

    firewall-cmd --new-zone swarm --permanent
    firewall-cmd --reload
    firewall-cmd --zone=swarm --add-service=http --permanent
    firewall-cmd --zone=swarm --add-service=ssh --permanent
    firewall-cmd --zone=swarm --add-port=9095/tcp --permanent
    firewall-cmd --zone=swarm --add-port=9100/tcp --permanent
    firewall-cmd --reload
  2. Append the PAM user configurations.

    cat /root/mig2rl8/user1000 >> /etc/passwd
    cat /root/mig2rl8/group1000 >> /etc/group
    cat /root/mig2rl8/shadow1000 >> /etc/shadow

Migrate to the (new) RL8 Gateway Server

.At this point, the C7 GW instance is still a live server. This section of the process will finalize the process by either adding the new server to the DNS/load-balancer or disconnecting the C7 instance and making the RL8 server the primary instance.

Option A - Adding to DNS/Load-Balancer

The frontend (external) interface was already configured during the initial preparations, so there is nothing that needs to be done here.

  1. Set up the networking for the Backend (Swarm) network. This must be a different IP from the current C7 instance, must be in the Swarm subnet, must be outside of the DHCP range and must be unique.

    nmcli con mod ${GW_BACKEND_NIC_NAME} ipv4.addresses ${GW_BACKEND_IP}/${BACKEND_PREFIX}
    nmcli con mod ${GW_BACKEND_NIC_NAME} ipv4.method manual
    nmcli con mod ${GW_BACKEND_NIC_NAME} connection.autoconnect yes
    nmcli con mod ${GW_BACKEND_NIC_NAME} connection.zone swarm
    nmcli con reload
    nmcli con down ${GW_BACKEND_NIC_NAME} && nmcli con up ${GW_BACKEND_NIC_NAME}
  2. Proceed to test the gateway.

Option B - Replacing the Existing C7 Instance

  1. Remove (or comment) the C7 gateway’s frontend (external) IP from the DNS and/or Load-Balancer configuration. This is not mandatory, but failure to do so may result in application-level issues trying to access an unresponsive gateway.

  2. For the C7 Gateway instance, disconnect both the frontend and backend network ports, at the VM level.

  3. Re-configure the frontend (external) interface to match the configuration of the old C7 gateway, and the interface will need to be “bounced” for this to take effect.

    nmcli con mod ${GW_FRONTEND_NIC_NAME} ipv4.addresses ${GW_FRONTEND_IP}/${FRONTEND_PREFIX}
    nmcli con mod ${GW_FRONTEND_NIC_NAME} ipv4.gateway ${GATEWAY}
    nmcli con mod ${GW_FRONTEND_NIC_NAME} ipv4.dns ${DNS_SERVER_1},${DNS_SERVER_2} 
    nmcli con mod ${GW_FRONTEND_NIC_NAME} ipv4.dns-search ${DNS_DOMAIN}
    nmcli con mod ${GW_FRONTEND_NIC_NAME} ipv4.method manual
    nmcli con mod ${GW_FRONTEND_NIC_NAME} connection.autoconnect yes
    nmcli con reload
    nmcli con down ${GW_FRONTEND_NIC_NAME} && nmcli con up ${GW_FRONTEND_NIC_NAME}

Any ssh sessions will be terminated when bouncing the interface.

  1. Set up the networking for the Backend (Swarm) network. This will be the same as the current C7 instance.

    nmcli con mod ${GW_BACKEND_NIC_NAME} ipv4.addresses ${GW_BACKEND_IP}/${BACKEND_PREFIX}
    nmcli con mod ${GW_BACKEND_NIC_NAME} ipv4.method manual
    nmcli con mod ${GW_BACKEND_NIC_NAME} connection.autoconnect yes
    nmcli con mod ${GW_BACKEND_NIC_NAME} connection.zone swarm
    nmcli con reload
    nmcli con down ${GW_BACKEND_NIC_NAME} && nmcli con up ${GW_BACKEND_NIC_NAME}

Testing

  1. Verify that the storage nodes are present and talk to them as expected.

    /root/dist/swarmctl -d ${ANY_STORAGE_NODE_IP}
    curl -I http://${ANY_STORAGE_NODE_IP}
  2. Restart the cloudgateway services, and validate they start correctly.

    systemctl restart cloudgateway
    systemctl status cloudgateway
  3. Verify that the gateway is performing correctly.

    # This should be unauthorized, but returning the relevant "Cloud Gateway" and "CAStor Cluster" header
    curl -I http://${GW_FRONTEND_IP}:${SCSP_PORT}
    
    # This should return a full set of header information              
    curl -I -u ${VALID_USER}:${VALID_PASSWORD} http://${GW_FRONTEND_IP}:${SCSP_PORT}
  4. Check via a browser that you are able to access http://${GW_FRONTEND_IP}:${SCSP_PORT}/_admin/portal and login as normal.

  5. Perform other tests, as applicable and able, to the gateway’s frontend IP.

  6. Recommend rebooting the server and verifying all services come up as expected following a reboot.

Finalization

Now, this gateway is ready to be made available to the applications again.

  1. Add (or uncomment) the RL8 gateway’s frontend (external) IP into the DNS and/or Load-Balancer configuration.

  2. Remove the C7 gateway’s frontend (external) IP from the DNS and/or Load-Balancer configuration if necessaary.

  3. Power off the C7 gateway server.

Repeat the process for all other gateway servers.

Elasticsearch Server

There are two approaches to this upgrade:

  • Suspending a single node, transferring the (old) C7 Elasticsearch IP, configuration, and data to a (new) RL8 server, and then resuming with the RL8 server.

  • Adding a (new) RL8 Elasticsearch server to the Elasticsearch cluster, allowing data to balance across, and then retiring an (old) C7 Elasticsearch server.

In a production environment, there will likely be multiple Elasticsearch nodes, that all need to be migrated. Therefore, repeat this procedure for each node, waiting for each node to complete before proceeding to the next Elasticsearch instance.

Gather Information

On the C7 instance, please make a support bundle that will aid DataCore Support in the event of an issue or a requirement to revert changes.

cd /root/dist && ./updateBundle.sh
cd /root && /root/dist/techsupport-bundle-grab.sh

Note the existing network configuration and plan any new IPs, if required.

Configuration

CentOS 7

Rocky 8

How to find

Hostname

hostname

Frontend (External) IP

ip a

Backend (Swarm) IP

ip a

ES Data location

grep path.data /etc/elasticsearch/elasticsearch.yml

Upgrade the (old) C7 Elasticsearch cluster

  1. Upgrade Elasticsearch cluster to the same version as the packages to be installed on RL8 or the OVA images. For the purpose of this guide, it is assumed to be 7.17.14.
    Full instructions for the upgrade can be found here Upgrading Elasticsearch

Prepare the (new) RL8 Elasticsearch server

The version of Elasticsearch installed on the RL8 server must match the major.minor version that is running on the old C7 Gateway server.

  1. Install a minimal instance of Rocky Linux 8 or deploy the Elasticsearch RL8 OVA.

  2. Prepare the basic configuration:

    1. Set the hostname: hostnamectl set-hostname <HOSTNAME>

    2. Set the timezone, if required: timedatectl set-timezone <CONTINENT>/<CITY>

    3. Configure the Frontend (External) IP, if not configured during installation: nmcli con mod <NIC_NAME> <SETTING> <SETTING_VALUE>

    4. Do NOT configure the backend (Swarm) network at this time.

  3. Verify the OS is patched to the latest releases (assuming internet or repo connectivity): dnf update

  4. Verify that a second virtual disk is installed for Elasticsearch data. This disk must match or must be better in the performance and capacity of the existing C7 Elasticsearch nodes, and must be mounted to the ES data location identified at the beginning of the process (Defaults to /var/lib/elasticsearch). This is automatically created and configured during the OVA deployment.

  5. If you have deployed using the Elasticsearch OVA, the software packages are already installed so you can skip this step.

    cd ${PATH_TO_INSTALLER_PACKAGES}
    rpm --import RPM-GPG-KEY
    rpm --import GPG-KEY-elasticsearch
    dnf install -y elasticsearch-VERSION.rpm
    dnf install -y caringo-elasticsearch-search-VERSION.noarch.rpm
  6. If the system was deployed with a minimal installation, please install the Swarm support tools.

    cd /root
    curl -O https://support.cloud.datacore.com/tools/updateBundle.sh
    bash updateBundle.sh fresh
    rm updateBundle.sh

    If the system was deployed with the OVA, the support tools should already be installed, but please update the following:

    cd /root/dist && ./updateBundle.sh

Option A - Transferring Data from C7 to (new) RL8 Instance

This process involves copying a potentially large amount of index data from the old C7 instance to the new RL8 instance. This may take some time to complete, which may be undesirable. In virtualized environments, it may be possible to move (or clone) the index disk from the old C7 instance to the RL8 instance. This may be substantially faster but may represent higher risk, especially for rollback. If this method is desirable, please refer to DataCore Support for further assistance.

Prepare the (old) C7 Elasticsearch Server

  1. No Frontend NIC has been disabled or the NIC has been disconnected at the VM layer on many deployments. You need to enable or reconnect this interface to allow copying data to the RL8 server.

  2. Discover the current master node. It is recommended to always upgrade the master node as the last node, so there is only one re-election of the master.

    curl http://${ES_BACKEND_IP}:9200/_cat/nodes?v

    The master node appears with an “*” in the master column.

    ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role  master name
    10.10.0.12           18          62   0    0.01    0.02     0.05 cdfhimrstw *      10.10.0.12        <=== this is the master
    10.10.0.11            7          69   0    0.06    0.01     0.00 cdfhimrstw -      10.10.0.11
    10.10.0.13           32          62   0    0.00    0.01     0.05 cdfhimrstw -      10.10.0.13
  3. Verify that the Elasticsearch cluster is healthy before commencing any migrations. Do NOT proceed if the status is either YELLOW or RED.

    curl http://${ES_BACKEND_IP}:9200/_cluster/health?pretty
  4. Disable shard allocation and balancing on Elasticsearch cluster.

    curl -XPUT -H'Content-Type: application/json' http://${ES_BACKEND_IP}:9200/_cluster/settings -d '{ "transient":{ "cluster.routing.allocation.enable": "none" } }'
  5. Stop the Elasticsearch service on this C7 node.

    systemctl stop elasticsearch
  6. Copy configuration and ES data to the (new) RL8 instance.

    rsync -av -R --ignore-missing-args \
        /etc/hosts \
        /etc/chrony.conf \
        /etc/firewalld/services/elasticserach.xml \
        /etc/elasticsearch/elasticsearch.yml \
        /etc/elasticsearch/jvm.options \
        /etc/security/limits.d/10-caringo-elasticsearch.conf \
        /etc/sysconfig/elasticsearch \
        /etc/systemd/system/elasticsearch.service.d/override.conf \
        ${ES_DATA_PATH}  \
        root@${RL8_FRONTEND_IP}:/
  7. Disconnect the backend (Swarm) network at the VM level.

Configure the (new) RL8 Elasticsearch Server

  1. RL8 does not automatically enable chronyd, therefore, start it manually. The configuration should have been copied from the old C7 server in the previous set.

    systemctl enable chronyd && systemctl start chronyd
    
    # And to check it is working correctly
    chronyc sources
  2. Create a zone for the Swarm network, and allow Elasticsearch and SSH traffic through this interface.

    firewall-cmd --new-zone swarm --permanent
    firewall-cmd --reload
    firewall-cmd --zone=swarm --add-service=elasticsearch --permanent
    firewall-cmd --zone=swarm --add-service=ssh --permanent
    firewall-cmd --reload
  3. Set up the networking for the Backend (Swarm) network. This must be the same as the IP on the current C7 instance.

    nmcli con mod ${ES_BACKEND_NIC_NAME} ipv4.addresses ${ES_BACKEND_IP}/${BACKEND_PREFIX}
    nmcli con mod ${ES_BACKEND_NIC_NAME} ipv4.method manual
    nmcli con mod ${ES_BACKEND_NIC_NAME} connection.autoconnect yes
    nmcli con mod ${ES_BACKEND_NIC_NAME} connection.zone swarm
    nmcli con reload
    nmcli con down ${ES_BACKEND_NIC_NAME} && nmcli con up ${ES_BACKEND_NIC_NAME}
  4. Verify the storage nodes are present and talk to them as expected.

    /root/dist/swarmctl -d ${ANY_STORAGE_NODE_IP}
    curl -I http://${ANY_STORAGE_NODE_IP}

Migrate to the (new) RL8 Elasticsearch Server

The Frontend (External) interface was already configured during the initial preparations so there is nothing that needs to be done here

  1. Start the Elasticsearch service and verify that it has started correctly.

    systemctl start elasticsearch
    systemctl status elasticsearch
  2. Enable Elasticsearch service to start on system boot.

    systemctl enable elasticsearch
  3. Verify the Elasticsearch node is joined to the Elasticsearch cluster.

    curl http://${ES_BACKEND_IP}:9200/_cluster/health?pretty              
    curl http://${ES_BACKEND_IP}:9200/_cat/nodes?v
  4. Enable shard allocation and balancing on Elasticsearch cluster.

    curl -XPUT -H'Content-Type: application/json' http://${ES_BACKEND_IP}:9200/_cluster/settings -d '{ "transient":{ "cluster.routing.allocation.enable": "all" } }'
  5. Wait until the health status of Elasticsearch changes from YELLOW to GREEN.

    curl http://${ES_BACKEND_IP}:9200/_cluster/health?pretty
  6. If required, disconnect the frontend (external) network on the (old) C7 instance, at the VM level,
    Re-configure the frontend (external) interface on the RL8 instance to match the previous configuration of the (old) C7 ES instance. The interface needs to be “bounced” for this to take effect.

    nmcli con mod ${ES_FRONTEND_NIC_NAME} ipv4.addresses ${ES_FRONTEND_IP}/${FRONTEND_PREFIX}
    nmcli con mod ${ES_FRONTEND_NIC_NAME} ipv4.gateway ${GATEWAY}
    nmcli con mod ${ES_FRONTEND_NIC_NAME} ipv4.dns ${DNS_SERVER_1},${DNS_SERVER_2} 
    nmcli con mod ${ES_FRONTEND_NIC_NAME} ipv4.dns-search ${DNS_DOMAIN}
    nmcli con mod ${ES_FRONTEND_NIC_NAME} ipv4.method manual
    nmcli con mod ${ES_FRONTEND_NIC_NAME} connection.autoconnect no
    nmcli con reload
    nmcli con down ${ES_FRONTEND_NIC_NAME}

Info

  • Any SSH sessions will be terminated when bouncing the interface. Please reconnect via the backend (Swarm) network.

  • It is recommended that the frontend interface is not open. The configuration below takes care of this by setting auto-connect to “no”, and bringing the interface down.

Repeat the above Elasticsearch procedure for all the nodes in the cluster.

Option B - Adding New RL8 Nodes to the Existing Cluster and Retiring Old C7 Nodes

<TBD>

  • No labels