Page Comparison

Info
This procedure should not be used if the search feed was created with `search.perDomainIndex=True`.

Note

Before starting to split the index with more shards, please be aware of the fact that, this operation will create several duplicate shards, select the best one and delete the remaining. So, this may consume 8x to 10x size of the original index you are splitting, at least momentarily until the operation is completed successfully.

Once the new index is created, all shards are properly assigned, the deleted documents are merged, the new index will have almost the same size of the old index, albeit with higher number of shards.

An Elasticsearch index is divided into a set of shards – primary shards and replica (backup) shards. The shard count is configured when the index is created. Elastic recommends shard sizes not be larger than about 50GB, to make them faster to update or to shuffle between nodes when necessary. They also recommend a 32GB-heap node only store up to 600 shards. Although there are typically hundreds of metrics- and csmeter- shards they shouldn’t have much effect on performance as they are small, with time-based indices.

Swarm 12+ allows you to set search.numberOfShards (default is 5 for versions of Swarm less than version 16.0 and 10 for 16.0 or later) to a larger value like 20 if you know you will have a very large search feed index (e.g. you will be storing a billion objects or a large amount of metadata).

...

Find the search feed index name, this will be the OLD_INDEX used in commands below:
curl 'http://elasticsearch:9200/_cat/indices?index=index_*'
yellow open index_caringo71-cluster0 _27akdSrQyK0_uo76y3Ofw 5 1 12 0 126.1kb 126.1kb
The 0 suffix represents the Swarm search feed id, it might be a 1 or 2 in your environment!
OLD_INDEX=index_caringo71-cluster0
Find the name of the alias that Swarm and Gateway use to refer to the index, you'll use this later.
curl "http://elasticsearch:9200/_aliases?index=${OLD_INDEX}&pretty"
{
"index_caringo71-cluster0" : {
"aliases" : {
"caringo71-cluster0" : { }
}
}
}
The alias name should be the index name without "index_".
ALIAS_NAME=caringo71-cluster0
Pause the Swarm search feed in the Storage UI or Swarm console or api.
Make the current index read-only:
curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/${OLD_INDEX}/_settings?pretty" --data-binary '
{
"settings": {
"index.blocks.write" : true
}
}
'
Split the index, e.g. from default 5 shards to 20. This will return an error if it's not a multiple of the current number of shards.
curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/${OLD_INDEX}/_split/${OLD_INDEX}_split20?pretty" --data-binary '
{
"settings": {
"index.number_of_shards": 20
}
}
'
Elasticsearch should quickly become yellow again, verify with:
time while ! curl -fsS 'http://elasticsearch:9200/_cluster/health?wait_for_status=yellow&pretty' ; do sleep 5 ; done
But make sure at least all the primary shards are STARTED (they should be, since yellow)
curl -fsS "http://elasticsearch:9200/_cat/shards?index=${OLD_INDEX}_split20" | grep -w p
Change the alias to point from the old index to the new split index
curl -XPOST -H 'Content-type: application/json' 'http://elasticsearch:9200/_aliases?pretty' --data-binary '
{
"actions": [
{ "remove" : { "index" : "'"${OLD_INDEX}"'", "alias" : "'"${ALIAS_NAME}"'" } },
{ "add" : { "index" : "'"${OLD_INDEX}_split20"'", "alias" : "'"${ALIAS_NAME}"'" } }
]
}
'
Verify the alias is correctly pointing to the new “split” index with:
curl http://elasticsearch:9200/_aliases | grep "${ALIAS_NAME}"
Verify again Elasticsearch is at least yellow and all primary shards are STARTED
time while ! curl -fsS 'http://elasticsearch:9200/_cluster/health?wait_for_status=yellow&pretty' ; do sleep 5 ; done

curl -fsS "http://elasticsearch:9200/_cat/shards?index=${OLD_INDEX}_split20" | grep -w p
It's now safe to delete the old index
curl -fsS -XDELETE "http://elasticsearch:9200/${OLD_INDEX}"
Now undo the read-only setting (the setting was copied to the new index)
curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/${OLD_INDEX}_split20/_settings?pretty" --data-binary '
{
"settings": {
"index.blocks.write" : null
}
}
'
You can now Resume (Storage UI) the search feed or uncheck Pause (legacy Swarm console). Do not Restart/Refresh!
Verify Swarm indexing and listings are again working:
curl -i 'http://swarm/?domains&format=json'
HTTP/1.1 200 OK
Castor-System-Object-Count: 2
...
curl -i -XPOST --post301 --location-trusted 'http://swarm/?domain=temptestsplit&createDomain'
HTTP/1.1 201 Created
curl -i 'http://swarm/?domains&format=json&name=temptestsplit'
HTTP/1.1 200 OK
...
{"last_modified": "2020-10-01T00:00:21.200000Z", "bytes": 0, "name": "temptestsplit", "hash": "4a49ac2fe229ca9b7a9e6b042e159f04", "written": "2020-10-01T00:00:21.200000Z", "accessed": "2020-10-01T00:00:21.200000Z"}
If you upgrade to Swarm 16.1.2 - 16.1.4 an additional alias with the index_ prefix must be added to workaround a regression (SWAR-10240):
curl -XPOST -H 'Content-type: application/json' 'http://elasticsearch:9200/_aliases?pretty' --data-binary '
{
"actions": [
{ "add" : {
"index" : "'"${OLD_INDEX}_split20"'", "alias" : "'"index_${ALIAS_NAME}"'" }
}
]
}
'

Info
Yes, all this really needs to be scripted – too many error-prone quotes and curls.

...

Versions Compared

Old Version 4

New Version 8

Key