Using the elasticsearch api to split large shards

This procedure should not be used if the search feed was created with search.perDomainIndex=True.

Before starting to split the index with more shards, please be aware of the fact that, this operation will create several duplicate shards, select the best one and delete the remaining. So, this may consume 8x to 10x size of the original index you are splitting, at least momentarily until the operation is completed successfully.

Once the new index is created, all shards are properly assigned, the deleted documents are merged, the new index will have almost the same size of the old index, albeit with higher number of shards.

An Elasticsearch index is divided into a set of shards – primary shards and replica (backup) shards. The shard count is configured when the index is created. Elastic recommends shard sizes not be larger than about 50GB, to make them faster to update or to shuffle between nodes when necessary. They also recommend a 32GB-heap node only store up to 600 shards. Although there are typically hundreds of metrics- and csmeter- shards they shouldn’t have much effect on performance as they are small, with time-based indices.

Swarm 12+ allows you to set search.numberOfShards (default is 5 for versions of Swarm less than version 16.0 and 10 for 16.0 or later) to a larger value like 20 if you know you will have a very large search feed index (e.g. you will be storing a billion objects or a large amount of metadata).

Before Elasticsearch 6 you were unable to increase the number of shards after an index was created. Now you can use the _split api https://www.elastic.co/guide/en/elasticsearch/reference/7.17/indices-split-index.html. This is faster than creating a new search feed with the correct number of shards and waiting for it to populate. But note it requires downtime to complete these steps and for the new split index to be ready. It also requires that you have enough Elasticsearch disk space for a copy of the current index.

Please let us know if you think you need to split your index. Also note that Elasticsearch 6 (EOL since 2022) has an extra requirement that index.number_of_routing_shards is set before using _split https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-split-index.html. You should instead first upgrade to Elasticsearch 7 which has been supported since Swarm 12.

Instructions

  1. Find the search feed index name, this will be the OLD_INDEX used in commands below:

    curl 'http://elasticsearch:9200/_cat/indices?index=index_*'
    yellow open index_caringo71-cluster0            _27akdSrQyK0_uo76y3Ofw 5 1 12 0 126.1kb 126.1kb

  2. The 0 suffix represents the Swarm search feed id, it might be a 1 or 2 in your environment!

    OLD_INDEX=index_caringo71-cluster0

  3. Find the name of the alias that Swarm and Gateway use to refer to the index, you'll use this later.
    curl "http://elasticsearch:9200/_aliases?index=${OLD_INDEX}&pretty"
    {

      "index_caringo71-cluster0" : {

        "aliases" : {

          "caringo71-cluster0" : { }

        }

      }

    }
    The alias name should be the index name without "index_".
    ALIAS_NAME=caringo71-cluster0

  4. Pause the Swarm search feed in the Storage UI or Swarm console or api.

  5. Make the current index read-only:

    curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/${OLD_INDEX}/_settings?pretty" --data-binary '
    {
    "settings": {
    "index.blocks.write" : true
    }
    }
    '

  6. Split the index, e.g. from default 5 shards to 20. This will return an error if it's not a multiple of the current number of shards.
    curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/${OLD_INDEX}/_split/${OLD_INDEX}_split20?pretty" --data-binary '
    {
    "settings": {
    "index.number_of_shards": 20
    }
    }
    '

  7. Elasticsearch should quickly become yellow again, verify with:

    time while ! curl -fsS 'http://elasticsearch:9200/_cluster/health?wait_for_status=yellow&pretty' ; do sleep 5 ; done

  8. But make sure at least all the primary shards are STARTED (they should be, since yellow)

    curl -fsS "http://elasticsearch:9200/_cat/shards?index=${OLD_INDEX}_split20" | grep -w p

  9. Change the alias to point from the old index to the new split index

    curl -XPOST -H 'Content-type: application/json' 'http://elasticsearch:9200/_aliases?pretty' --data-binary '
    {
    "actions": [
    { "remove" : { "index" : "'"${OLD_INDEX}"'", "alias" : "'"${ALIAS_NAME}"'" } },
    { "add" : { "index" : "'"${OLD_INDEX}_split20"'", "alias" : "'"${ALIAS_NAME}"'" } }
    ]
    }
    '
    Verify the alias is correctly pointing to the new “split” index with:
    curl http://elasticsearch:9200/_aliases | grep "${ALIAS_NAME}"

  10. Verify again Elasticsearch is at least yellow and all primary shards are STARTED

    time while ! curl -fsS 'http://elasticsearch:9200/_cluster/health?wait_for_status=yellow&pretty' ; do sleep 5 ; done

    curl -fsS "http://elasticsearch:9200/_cat/shards?index=${OLD_INDEX}_split20" | grep -w p

  11. It's now safe to delete the old index

    curl -fsS -XDELETE "http://elasticsearch:9200/${OLD_INDEX}"

  12. Now undo the read-only setting (the setting was copied to the new index)

    curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/${OLD_INDEX}_split20/_settings?pretty" --data-binary '
    {
    "settings": {
    "index.blocks.write" : null
    }
    }
    '

  13. You can now Resume (Storage UI) the search feed or uncheck Pause (legacy Swarm console). Do not Restart/Refresh!

  14. Verify Swarm indexing and listings are again working:

    curl -i 'http://swarm/?domains&format=json'
    HTTP/1.1 200 OK
    Castor-System-Object-Count: 2
    ...

    curl -i -XPOST --post301 --location-trusted 'http://swarm/?domain=temptestsplit&createDomain'
    HTTP/1.1 201 Created

    curl -i 'http://swarm/?domains&format=json&name=temptestsplit'
    HTTP/1.1 200 OK
    ...
    {"last_modified": "2020-10-01T00:00:21.200000Z", "bytes": 0, "name": "temptestsplit", "hash": "4a49ac2fe229ca9b7a9e6b042e159f04", "written": "2020-10-01T00:00:21.200000Z", "accessed": "2020-10-01T00:00:21.200000Z"} 

  15. If you upgrade to Swarm 16.1.2 - 16.1.4 an additional alias with the index_ prefix must be added to workaround a regression (SWAR-10240):
    curl -XPOST -H 'Content-type: application/json' 'http://elasticsearch:9200/_aliases?pretty' --data-binary '
    {
    "actions": [
    { "add" : {
    "index" : "'"${OLD_INDEX}_split20"'", "alias" : "'"index_${ALIAS_NAME}"'" }
    }
    ]
    }
    '

Yes, all this really needs to be scripted – too many error-prone quotes and curls.

Related articles

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.