Notes from Elasticsearch 7 Webinar

These are notes for the 2020-12-15 Elasticsearch 7 Webinar.

We started with Elasticsearch back in its 0.x releases. Fortunately the project thrived and has proven very popular. Many of our customers are using it independently, unrelated to Swarm.

https://twitter.com/kimchy/status/1331273199364038663
"Searchable snapshots is one of Elasticsearch biggest features that came out in the last few years."
--Shay Banon, creater of elasticsearch

https://www.elastic.co/blog/introducing-elasticsearch-searchable-snapshots

Search years of data in a snap(shot).
Find a needle in years of haystacks with searchable snapshots on low-cost object stores like S3 in 7.10. Now in beta.

While we've had support for backing up data to low-cost object stores for a long time, with searchable snapshots you can now use them as an active part of storing and searching your data.


Before getting into that new feature, let's get the (hopefully) boring upgrade part out of the way.
If you're already on Swarm 11.3 with Elasticsearch 6 it's just:

Upgrading Elasticsearch 6 => 7
Upgrading Elasticsearch

Btw I'm using Caringo Demo Containers because it's easy and I can run a
realistic environment all on my laptop. Just tweaked a bit to use
systemd so I can demo the simple in place upgrade:

Running the caringo:demo containers

Verify it's at Elasticsearch 6

curl http://elasticsearch:9200
curl http://elasticsearch:9200/_cat/nodes?v

Copy new rpms into the containers

docker cp /tmp/es7 caringo42_elasticsearch_1:/
docker cp /tmp/es7 caringo42_elasticsearchleader_1:/

Start upgrading non-master nodes first.

# /usr/share/elasticsearch/bin/elasticsearch-plugin remove repository-s3
# cd /es7
# yum install -y ./caringo-elasticsearch-search-7.0.0-1.noarch.rpm
# /usr/share/caringo-elasticsearch-search/bin/configure_elasticsearch_with_swarm_search.py --upgrade
# /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch repository-s3
# echo '-Des.allow_insecure_settings=true' >> /etc/elasticsearch/jvm.options
# systemctl restart elasticsearch

Verify it's at Elasticsearch 7

curl http://elasticsearch:9200
curl http://elasticsearch:9200/_cat/nodes?v

Note that Swarm supports only 7.5.2 at this time due to a minor compatibility issue. Support for 7.10 should come soon in 2021. It will be another simple in-place upgrade.


Let's start where we left off in last demo after uploading a few image and video objects tagged with metadata.

s3cmd ls s3://animals

Use our SCSP api to search by metadata

curl -u caringoadmin:password 'http://backup42/animals?format=json&x-animal-meta=cat&fields=name,content-type'

Do not rely on direct elasticsearch queries like this because our schema can change, but fyi:

curl 'elasticsearch:9200/_cat/indices'
curl 'elasticsearch:9200/index_caringo42-cluster0/_search?q=x_animal_meta:cat&pretty'


Elasticsearch, like most databases today, can make snapshots to and restore from object storage.

Snapshot and Restore Search Data
More info:
https://medium.com/swlh/elasticsearch7-backup-snapshot-restore-aws-s3-54a581c75589

s3cmd mb s3://essnapshots

Every elasticsearch node MUST be able to contact the domain endpoint!

Every node must have -Des.allow_insecure_settings=true in /etc/elasticsearch/jvm.optionsto allow the keys below to be specified.

curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/_snapshot/myRepo" -d '{
"type": "s3",
"settings": {
"bucket": "essnapshots",
"region": null,
"endpoint": "http://backup42/",
"protocol": "http",
"base_path": "myswarmcluster",
"access_key": "34d62db21e8fc6d1a3dc449ede70c446",
"secret_key": "secret"
}
}'

curl -XPUT "http://elasticsearch:9200/_snapshot/myRepo/snapshot_20201215?wait_for_completion=true&pretty"

This is how you would restore, but never restore back onto the same active elasticsearch cluster because the index alias is preserved.

curl -i -XPOST -H 'Content-type: application/json' \
"http://elasticsearch:9200/_snapshot/myRepo/snapshot_20201213095133/_restore?pretty" \
-d '{
"rename_pattern": "(.+)",
"rename_replacement": "restored_$1"
}'

But if you do, you can clear that alias with:
curl -i -XDELETE http://elasticsearch:9200/restored_index_caringo42-cluster0/_alias/caringo42-cluster0


Containers can be thought of as tiny virtual machines but really they are just linux processes that the OS isolates and manages. Kubernetes is a container orchestration system for automating application deployment, scaling and management.

There are dozens of ways to run Kubernetes. Probably the easiest way to run it on your Windows or macOS laptop is to install Docker Desktop (https://www.docker.com/products/docker-desktop), which can also run a Kubernetes node. To set up a real k8s cluster on a set of linux servers try k3sup (https://github.com/alexellis/k3sup) which installs the simple and slim k3s Kubernetes distribution from Rancher, who where recently acquired by SUSE Linux.

https://kubernetes.io/docs/reference/kubectl/cheatsheet/

A Kubernetes "operator" is an application-specific controller that extends k8s to manage instances of a complex application, such as Elasticearch.

The Elastic Cloud on Kubernetes (ECK) operator "lets you operate elasticsearch as you would any other Kubernetes resource. No need to configure endless Kubernetes pods, services, and secrets. ECK embeds years of knowledge into everyday Elasticsearch operations – from scaling up to version upgrades."
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-overview.html
https://devopscon.io/blog/native-elasticsearch-on-kubernetes-simple-with-eck/

kubectl apply -f https://download.elastic.co/downloads/eck/1.3.1/all-in-one.yaml

kubectl -n elastic-system logs -f statefulset.apps/elastic-operator &

Delete all elasticsearch k8s resources (can take a while):

kubectl get namespaces --no-headers -o custom-columns=:metadata.name
| xargs -n1 kubectl delete elastic --all -n

Now to create a new Elasticsearch 7.10. Note this is totally independent of the 7.5.2 cluster from where a snapshot was made to Swarm at http://backup42.

cat <<EOF | kubectl apply -f - apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: quickstart spec: version: 7.10.1 nodeSets: - name: default count: 2 config: node.master: true node.data: true node.ingest: true node.store.allow_mmap: false xpack.security.enabled: false podTemplate: spec: initContainers: - name: install-plugins command: - sh - -c - | bin/elasticsearch-plugin install --batch repository-s3 containers: - name: elasticsearch env: - name: ES_JAVA_OPTS value: -Xms1g -Xmx1g -Des.allow_insecure_settings=true resources: requests: memory: 2Gi EOF

kubectl get pods

kubectl logs -f quickstart-es-default-0

kubectl port-forward service/quickstart-es-http 9200 &

curl http://localhost:9200
curl http://localhost:9200/_cat/plugins

Sanity test this is a real elasticsearch cluster:

curl -XPOST http://localhost:9200/jamtest1/_doc --data-binary '{"foo":"bar"}' -H Content-type:application/json
curl "http://localhost:9200/jamtest1/_search?pretty"


The searchable snapshots feature is in beta and at least for now requires a temporary license:

curl -XPOST "http://localhost:9200/_license/start_trial?acknowledge=true"

Remember every elasticsearch node MUST be able to contact the endpoint. Demo mistakes.

curl -XPUT -H 'Content-type: application/json' "http://localhost:9200/_snapshot/myRepo" \ -d '{ "type": "s3", "settings": { "bucket": "essnapshots", "region": null, "endpoint": "http://backup42/", "protocol": "http", "base_path": "myswarmcluster", "access_key": "34d62db21e8fc6d1a3dc449ede70c446", "secret_key": "secret" } }'

curl "http://localhost:9200/_snapshot/myRepo/_all"
curl "http://localhost:9200/_snapshot/myRepo/snapshot_20201215/_status"

This is the new ES7 feature -- mounting a snapshot (as opposed to restoring it).

https://www.elastic.co/guide/en/elasticsearch/reference/7.x/searchable-snapshots-api-mount-snapshot.html

curl -XPOST -H Content-type:application/json \ "http://localhost:9200/_snapshot/myRepo/snapshot_20201215/_mount?wait_for_completion=true" \ --data-binary ' { "index": "index_caringo42-cluster0", "renamed_index": "offline-caringo42-cluster0", "index_settings": { "index.number_of_replicas": 0 }, "ignored_index_settings": [ "index.refresh_interval" ] }'

And finally, show that this is a fully functioning read-only copy of the original elasticsearch index.

curl "http://localhost:9200/offline-caringo42-cluster0/_search?pretty&q=x_animal_meta:cat"

© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.