Notes from Elasticsearch 7 Webinar
These are notes for the 2020-12-15 Elasticsearch 7 Webinar.
We started with Elasticsearch back in its 0.x releases. Fortunately the project thrived and has proven very popular. Many of our customers are using it independently, unrelated to Swarm.
https://twitter.com/kimchy/status/1331273199364038663
"Searchable snapshots is one of Elasticsearch biggest features that came out in the last few years."
--Shay Banon, creater of elasticsearch
https://www.elastic.co/blog/introducing-elasticsearch-searchable-snapshots
Search years of data in a snap(shot).
Find a needle in years of haystacks with searchable snapshots on low-cost object stores like S3 in 7.10. Now in beta.While we've had support for backing up data to low-cost object stores for a long time, with searchable snapshots you can now use them as an active part of storing and searching your data.
Before getting into that new feature, let's get the (hopefully) boring upgrade part out of the way.
If you're already on Swarm 11.3 with Elasticsearch 6 it's just:
Upgrading Elasticsearch 6 => 7
Upgrading Elasticsearch
Btw I'm using Caringo Demo Containers because it's easy and I can run a
realistic environment all on my laptop. Just tweaked a bit to use
systemd so I can demo the simple in place upgrade:
Running the caringo:demo containers
Verify it's at Elasticsearch 6
curl http://elasticsearch:9200
curl http://elasticsearch:9200/_cat/nodes?v
Copy new rpms into the containers
docker cp /tmp/es7 caringo42_elasticsearch_1:/
docker cp /tmp/es7 caringo42_elasticsearchleader_1:/
Start upgrading non-master nodes first.
# /usr/share/elasticsearch/bin/elasticsearch-plugin remove repository-s3
# cd /es7
# yum install -y ./caringo-elasticsearch-search-7.0.0-1.noarch.rpm
# /usr/share/caringo-elasticsearch-search/bin/configure_elasticsearch_with_swarm_search.py --upgrade
# /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch repository-s3
# echo '-Des.allow_insecure_settings=true' >> /etc/elasticsearch/jvm.options
# systemctl restart elasticsearch
Verify it's at Elasticsearch 7
curl http://elasticsearch:9200
curl http://elasticsearch:9200/_cat/nodes?v
Note that Swarm supports only 7.5.2 at this time due to a minor compatibility issue. Support for 7.10 should come soon in 2021. It will be another simple in-place upgrade.
Let's start where we left off in last demo after uploading a few image and video objects tagged with metadata.
s3cmd ls s3://animals
Use our SCSP api to search by metadata
curl -u caringoadmin:password 'http://backup42/animals?format=json&x-animal-meta=cat&fields=name,content-type'
Do not rely on direct elasticsearch queries like this because our schema can change, but fyi:
curl 'elasticsearch:9200/_cat/indices'
curl 'elasticsearch:9200/index_caringo42-cluster0/_search?q=x_animal_meta:cat&pretty'
Elasticsearch, like most databases today, can make snapshots to and restore from object storage.
Snapshot and Restore Search Data
More info:
https://medium.com/swlh/elasticsearch7-backup-snapshot-restore-aws-s3-54a581c75589
s3cmd mb s3://essnapshots
Every elasticsearch node MUST be able to contact the domain endpoint!
Every node must have -Des.allow_insecure_settings=true
in /etc/elasticsearch/jvm.options
to allow the keys below to be specified.
curl -XPUT -H 'Content-type: application/json' "http://elasticsearch:9200/_snapshot/myRepo" -d '{
"type": "s3",
"settings": {
"bucket": "essnapshots",
"region": null,
"endpoint": "http://backup42/",
"protocol": "http",
"base_path": "myswarmcluster",
"access_key": "34d62db21e8fc6d1a3dc449ede70c446",
"secret_key": "secret"
}
}'
curl -XPUT "http://elasticsearch:9200/_snapshot/myRepo/snapshot_20201215?wait_for_completion=true&pretty"
This is how you would restore, but never restore back onto the same active elasticsearch cluster because the index alias is preserved.
curl -i -XPOST -H 'Content-type: application/json' \
"http://elasticsearch:9200/_snapshot/myRepo/snapshot_20201213095133/_restore?pretty" \
-d '{
"rename_pattern": "(.+)",
"rename_replacement": "restored_$1"
}'
But if you do, you can clear that alias with:curl -i -XDELETE http://elasticsearch:9200/restored_index_caringo42-cluster0/_alias/caringo42-cluster0
Containers can be thought of as tiny virtual machines but really they are just linux processes that the OS isolates and manages. Kubernetes is a container orchestration system for automating application deployment, scaling and management.
There are dozens of ways to run Kubernetes. Probably the easiest way to run it on your Windows or macOS laptop is to install Docker Desktop (https://www.docker.com/products/docker-desktop), which can also run a Kubernetes node. To set up a real k8s cluster on a set of linux servers try k3sup (https://github.com/alexellis/k3sup) which installs the simple and slim k3s Kubernetes distribution from Rancher, who where recently acquired by SUSE Linux.
https://kubernetes.io/docs/reference/kubectl/cheatsheet/
A Kubernetes "operator" is an application-specific controller that extends k8s to manage instances of a complex application, such as Elasticearch.
The Elastic Cloud on Kubernetes (ECK) operator "lets you operate elasticsearch as you would any other Kubernetes resource. No need to configure endless Kubernetes pods, services, and secrets. ECK embeds years of knowledge into everyday Elasticsearch operations – from scaling up to version upgrades."
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-overview.html
https://devopscon.io/blog/native-elasticsearch-on-kubernetes-simple-with-eck/
kubectl apply -f https://download.elastic.co/downloads/eck/1.3.1/all-in-one.yaml
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator &
Delete all elasticsearch k8s resources (can take a while):
kubectl get namespaces --no-headers -o custom-columns=:metadata.name
| xargs -n1 kubectl delete elastic --all -n
Now to create a new Elasticsearch 7.10. Note this is totally independent of the 7.5.2 cluster from where a snapshot was made to Swarm at http://backup42.
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.10.1
nodeSets:
- name: default
count: 2
config:
node.master: true
node.data: true
node.ingest: true
node.store.allow_mmap: false
xpack.security.enabled: false
podTemplate:
spec:
initContainers:
- name: install-plugins
command:
- sh
- -c
- |
bin/elasticsearch-plugin install --batch repository-s3
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: -Xms1g -Xmx1g -Des.allow_insecure_settings=true
resources:
requests:
memory: 2Gi
EOF
kubectl get pods
kubectl logs -f quickstart-es-default-0
kubectl port-forward service/quickstart-es-http 9200 &
curl http://localhost:9200
curl http://localhost:9200/_cat/plugins
Sanity test this is a real elasticsearch cluster:
curl -XPOST http://localhost:9200/jamtest1/_doc --data-binary '{"foo":"bar"}' -H Content-type:application/json
curl "http://localhost:9200/jamtest1/_search?pretty"
The searchable snapshots feature is in beta and at least for now requires a temporary license:
curl -XPOST "http://localhost:9200/_license/start_trial?acknowledge=true"
Remember every elasticsearch node MUST be able to contact the endpoint. Demo mistakes.
curl -XPUT -H 'Content-type: application/json' "http://localhost:9200/_snapshot/myRepo" \
-d '{
"type": "s3",
"settings": {
"bucket": "essnapshots",
"region": null,
"endpoint": "http://backup42/",
"protocol": "http",
"base_path": "myswarmcluster",
"access_key": "34d62db21e8fc6d1a3dc449ede70c446",
"secret_key": "secret"
}
}'
curl "http://localhost:9200/_snapshot/myRepo/_all"
curl "http://localhost:9200/_snapshot/myRepo/snapshot_20201215/_status"
This is the new ES7 feature -- mounting a snapshot (as opposed to restoring it).
curl -XPOST -H Content-type:application/json \
"http://localhost:9200/_snapshot/myRepo/snapshot_20201215/_mount?wait_for_completion=true" \
--data-binary '
{
"index": "index_caringo42-cluster0",
"renamed_index": "offline-caringo42-cluster0",
"index_settings": {
"index.number_of_replicas": 0
},
"ignored_index_settings": [ "index.refresh_interval" ]
}'
And finally, show that this is a fully functioning read-only copy of the original elasticsearch index.
curl "http://localhost:9200/offline-caringo42-cluster0/_search?pretty&q=x_animal_meta:cat"
Related articles
© DataCore Software Corporation. · https://www.datacore.com · All rights reserved.