Migrating from Older Elasticsearch

Newer versions of Elasticsearch include advances in speed, security, scalability, and hardware efficiency, and they support newer tool releases. All upgrades from Elasticsearch 2.3.3/5.6.12 to version 6.8.6 are legacy migrations requiring both a new separate Elasticsearch cluster and a new Swarm Search Feed to reindex content into the new ES cluster and format. Upgrade in-place to Elasticsearch 7 as part of the Swarm 12 upgrade once on ES 6.8.6.

Migration Process

Given the complexities of converting legacy ES data, the easiest path is to start fresh: provision a new ES cluster (machines or VMs meeting the requirements), install ES, Search, and Swarm Metrics on this cluster, and create a new search feed to this cluster. Swarm continues to support the existing primary feed to the legacy ES cluster while it builds the index data for the new feed. Searching remains available for users. Make the new feed primary, restart the Gateways, and the migration is complete once the new feed has completed indexing. This is an overview of the migration process to Elasticsearch 6:

Pre-Upgrade Checklist

Swarm Requirements

  1. Upgrade Swarm Storage: Complete the upgrade to the latest version of Swarm Storage. See How to Upgrade Swarm.

  2. Case-Sensitivity: Content Gateway still allows S3 to perform the case-sensitive operations it needs if enabling case-insensitive searching in SCSP (search.caseInsensitive = 1).

  3. (optional) Enable atime: Enable Time of Last Access (atime) feature now so the new index populates the accessed field if implementing it, which requires a full reindexing. The feature does affect performance, so review the impact discussion here: Time of Last Access - atime.

New ES Cluster

  1. Provision a new set of ES servers (machines or VMs) on which to install the new version of Elasticsearch. Do not attempt to upgrade the legacy ES servers: it is challenging to clean up old data and config files.

    • Contact DataCore Support for assistance if provisioning a new ES cluster is not possible.

  2. Verify every Elasticsearch node meets the hardware, network, and software requirements, including the latest RHEL/CentOS 7 and Java 8.

See Hardware Requirements for Elasticsearch

See Preparing the Search Cluster

Migrating to Elasticsearch 6

Follow these steps to migrate to a new Elasticsearch 6 cluster, from which allow upgrades to Swarm 12 and Elasticsearch 7 in-place, retaining the same Search feed and index data. An existing Elasticsearch 2.3.3 or 5.6.12 cluster cannot be upgraded; a new cluster and new Search Feed must be created. 

Important

Do not run different versions of Elasticsearch in the same ES cluster. Verify the new Elasticsearch configuration has a different name for the new cluster; otherwise, the new ES servers join the old ES cluster.

  1. Set up the new Elasticsearch.

    1. Obtain the Elasticsearch RPMs and Search from the downloaded Swarm bundle.

    2. On each ES server in the newly provisioned cluster install and configure the Elasticsearch components.

      1. Install the RPMs from the bundle.

        yum install elasticsearch-VERSION.rpm yum install caringo-elasticsearch-search-VERSION.noarch.rpm
      2. Complete configuration of Elasticsearch and the environment. See Configuring Elasticsearch
        Set the number of replicas to zero to avoid warnings if using a single-node ES cluster. See Scaling Elasticsearch

      3. The configuration script starts the Elasticsearch service and enables it to start automatically.

      4. Verify the mlockall setting is true. Else, contact DataCore Support.

        curl -XGET "ES_HOST:9200/_nodes/process?pretty"

        Verify the HTTP_PROXY and http_proxy environment variables are not set if cURL requests do not send an expected response. Implementing a proxy for root users commands causes communication issues between ES nodes if implemented by an IT organization or security policy.

      5. Proceed to the next server.

    3. All ES servers are installed and started at this point. Use Swarm UI or one of these methods to verify Elasticsearch is running (the status is yellow or green):

      curl -XGET ES_HOST:9200/_cluster/health systemctl status elasticsearch

      Verify the HTTP_PROXY and http_proxy environment variables are not set if cURL requests do not send an expected response. Implementing a proxy for root users commands causes communication issues between ES nodes if implemented by an IT organization or security policy.

      Troubleshooting — Run the status command (systemctl status elasticsearch) and look at the log entries:/var/log/elasticsearch/CLUSTERNAME.log

  2. Create a search feed for the new ES. Swarm allows creation of more than one Search feed supporting transition between Elasticsearch clusters. 

    1. Create a new search feed pointing to the new Elasticsearch cluster in the Swarm UI. See Managing Feeds.

      1. Verify the [storage cluster] managementPassword is set properly in the gateway.cfg file if errors are encountered during feed creation. Correct the value and restart the gateway service if a change is needed.

    2. Continue using the existing primary feed for queries during the transition. The second feed is incomplete until it fully clears the backlog.

  3. Set up Swarm Metrics.

    1. Install Swarm Metrics on one server in the new Elasticsearch cluster (or another machine running RHEL/CentOS 7). See Installing Swarm Metrics

    2. (optional) Swarm Metrics includes a script to migrate the historical metrics and content metering data. Proceed with the following steps if preserving the historical chart history is required (such as billing clients based on storage and bandwidth usage):

      1. Add a "whitelist" entry to the new ES server so it trusts the old ES server before running the script.

        1. Edit the config file: /etc/elasticsearch/elasticsearch.yml on the destination ES node.

        2. Add the whitelist line, using the old ES source node in place of the example:

        3. Restart Elasticsearch: systemctl restart elasticsearch

      2. Run the data migration script, specifying the source and destination clusters: 

        Troubleshooting options: 

        • By default, the script includes all metering data (client bandwidth and storage usage). To skip importing this data, add the flag -c.

        • To force reindexing of all imported data, add the flag --force-all.

      3. Allow an hour or more for the script to complete if there are a large amount of metrics to convert (many nodes and several months of data).

      4. Run the script again, and repeat until it completes successfully if connection or other problems occur and the screen reports errors.

      5. To see the past metrics, prime the curator by running it with the -n flag:

    3. Change the metrics.target from the old ES target to the new ES target. This reconfiguration pushes the new schema to the new ES cluster.

  4. Complete new feed and make primary

    1. On the Swarm UI's Reports > Feeds page, watch for indexing to be done, which is when the feed shows 0 "pending evaluation".

    2. Set Swarm to use the second feed when it is caught up. In the feed's command (gear) menu, select Make Primary.

  5. Install Gateway 7.0, Swarm UI, and Content UI on each Gateway server.

  6. Complete post-migration

    1. Delete the original feed when verifying it is working as the new primary feed target. Having two feeds is for temporary use because every feed incurs cluster activity, even when paused.

    2. As appropriate, decommission the old ES cluster to reclaim those resources.

Upgrade in place to Swarm 12, which includes Elasticsearch 7 when completing this migration and are running Swarm 11.3. See How to Upgrade Swarm

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.