Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

This procedure should not be used if the search feed was created with search.perDomainIndex=True.

Please follow the below documentation if you have in search.perDomainIndex=True your environment.
https://perifery.atlassian.net/wiki/x/MgBV3

Note

Before starting to split the index with more shards, please be aware of the fact that, this operation will create several duplicate shards, select the best one and delete the remaining. This will consume 2X size of the original index you are splitting.

Once the new index is created, all shards are properly assigned, the deleted documents are merged, the new index will have almost the same size of the old index, albeit with higher number of shards.

An Elasticsearch index is divided into a set of shards – primary shards and replica (backup) shards. The shard count is configured when the index is created. Elastic recommends shard sizes not be larger than about 50GB, to make them faster to update or to shuffle between nodes when necessary. They also recommend a 32GB-heap node only store up to 600 shards. Although there are typically hundreds of metrics- and csmeter- shards they shouldn’t have much effect on performance as they are small, with time-based indices.

...

  1. Find the search feed index name, this will be the OLD_INDEX used in commands below:

    curl 'http://elasticsearch:9200/_cat/indices?index=index_*'
    yellow open index_caringo71-cluster0            _27akdSrQyK0_uo76y3Ofw 5 1 12 0 126.1kb 126.1kb

  2. The 0 suffix represents the Swarm search feed id, it might be a 1 or 2 in your environment!

    OLD_INDEX=index_caringo71-cluster0

  3. Find the name of the alias that Swarm and Gateway use to refer to the index, you'll use this later.
    curl "http://elasticsearch:9200/_aliases?index=${OLD_INDEX}&pretty"
    {

      "index_caringo71-cluster0" : {

        "aliases" : {

          "caringo71-cluster0" : { }

        }

      }

    }
    The alias name should be the index name without "index_".
    ALIAS_NAME=caringo71-cluster0

  4. Pause the Swarm search feed in the Storage UI or Swarm console or api.

  5. Make the current index read-only:

    curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/${OLD_INDEX}/_settings?pretty" --data-binary '
    {
    "settings": {
    "index.blocks.write" : true
    }
    }
    '

  6. Split the index, e.g. from default 5 shards to 20. This will return an error if it's not a multiple of the current number of shards.
    curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/${OLD_INDEX}/_split/${OLD_INDEX}_split20?pretty" --data-binary '
    {
    "settings": {
    "index.number_of_shards": 20
    }
    }
    '

  7. Elasticsearch should quickly become yellow again, verify with:

    time while ! curl -fsS 'http://elasticsearch:9200/_cluster/health?wait_for_status=yellow&pretty' ; do sleep 5 ; done

  8. But make sure at least all the primary shards are STARTED (they should be, since yellow)

    curl -fsS "http://elasticsearch:9200/_cat/shards?index=${OLD_INDEX}_split20" | grep -w p

  9. Change the alias to point from the old index to the new split index

    curl -XPOST -H 'Content-type: application/json' 'http://elasticsearch:9200/_aliases?pretty' --data-binary '
    {
    "actions": [
    { "remove" : { "index" : "'"${OLD_INDEX}"'", "alias" : "'"${ALIAS_NAME}"'" } },
    { "add" : { "index" : "'"${OLD_INDEX}_split20"'", "alias" : "'"${ALIAS_NAME}"'" } }
    ]
    }
    '
    Verify the alias is correctly pointing to the new “split” index with:
    curl http://elasticsearch:9200/_aliases | grep "${ALIAS_NAME}"

  10. Verify again Elasticsearch is at least yellow and all primary shards are STARTED

    time while ! curl -fsS 'http://elasticsearch:9200/_cluster/health?wait_for_status=yellow&pretty' ; do sleep 5 ; done

    curl -fsS "http://elasticsearch:9200/_cat/shards?index=${OLD_INDEX}_split20" | grep -w p

  11. It's now safe to delete the old index

    curl -fsS -XDELETE "http://elasticsearch:9200/${OLD_INDEX}"

  12. Now undo the read-only setting (the setting was copied to the new index)

    curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/${OLD_INDEX}_split20/_settings?pretty" --data-binary '
    {
    "settings": {
    "index.blocks.write" : null
    }
    }
    '

  13. You can now Resume (Storage UI) the search feed or uncheck Pause (legacy Swarm console). Do not Restart/Refresh!

  14. Verify Swarm indexing and listings are again working:

    curl -i 'http://swarm/?domains&format=json'
    HTTP/1.1 200 OK
    Castor-System-Object-Count: 2
    ...

    curl -i -XPOST --post301 --location-trusted 'http://swarm/?domain=temptestsplit&createDomain'
    HTTP/1.1 201 Created

    curl -i 'http://swarm/?domains&format=json&name=temptestsplit'
    HTTP/1.1 200 OK
    ...
    {"last_modified": "2020-10-01T00:00:21.200000Z", "bytes": 0, "name": "temptestsplit", "hash": "4a49ac2fe229ca9b7a9e6b042e159f04", "written": "2020-10-01T00:00:21.200000Z", "accessed": "2020-10-01T00:00:21.200000Z"} 

  15. If you upgrade to Swarm 16.1.2 - 16.1.4 an additional alias with the index_ prefix must be added to workaround a regression (SWAR-10240):
    curl -XPOST -H 'Content-type: application/json' 'http://elasticsearch:9200/_aliases?pretty' --data-binary '
    {
    "actions": [
    { "add" : {
    "index" : "'"${OLD_INDEX}_split20"'", "alias" : "'"index_${ALIAS_NAME}"'" }
    }
    ]
    }
    '

Info

Yes, all this really needs to be scripted – too many error-prone quotes and curls.

...