Overview

This guide will help you set up and enable regular backups of your Elasticsearch index without causing downtime. It leverages Elasticsearch's snapshot and restore functionality to create backups efficiently. The example provided uses a shared file system for storing snapshots.

Prerequisites

Elasticsearch cluster running.
Shared file system accessible by all Elasticsearch nodes.
Access to the cluster via curl or a similar HTTP client.
(Optional) Elasticsearch Curator for automating snapshots.

Step-by-Step Guide

Create a Snapshot Repository

You need to create a snapshot repository where the snapshots will be stored. This can be a shared file system or an S3 bucket.
Using a Shared File System:
First, specify the shared repository location in the elasticsearch.yml file:

Code Block
path.repo: "/mount/backups/my_backup"

Then, create the repository using the following command:

Code Block
curl -X PUT "http://<es_node_ip>:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/mount/backups/my_backup" } }'

Using DataCore Swarm S3 bucket:
When the target repository is another Swarm cluster, the command to create the snapshot repository would be as follows:

Code Block

curl -X PUT "http://<es_node_ip>:9200/_snapshot/my_s3_backup" -H 'Content-Type: application/json' -d'
{
  "type": "s3",
  "settings": {
    "bucket": "my-elasticsearch-backup-bucket",
    "endpoint": "https://datacore-swarm.example.com",
    "access_key": "your_access_key",
    "secret_key": "your_secret_key",
    "protocol": "https"
  }
}'

...

Verify the Repository
After creating the repository, verify it to ensure it is set up correctly:
Code Block
curl -X GET "http://<es_node_ip>:9200/_snapshot/my_backup"

Create a Snapshot

Once the repository is set up and verified, create a snapshot of your index. Replace index_mumbkctcomobs.datacore.com.com0 with your index name.

Code Block

curl -X PUT "http://<es_node_ip>:9200/_snapshot/my_backup/snapshot_$(date +\%Y\%m\%d\%H\%M)" -H 'Content-Type: application/json' -d'
{
  "indices": "index_swarm.datacore.com.com0",
  "ignore_unavailable": true,
  "include_global_state": false
}'

Automate Snapshot Creation

To automate the creation of snapshots, you can use cron jobs on Linux or scheduled tasks on Windows.

Example using a cron job (runs daily at 2 AM):

Code Block

0 2 * * * curl -X PUT "http://<es_node_ip>:9200/_snapshot/my_backup/snapshot_$(date +\%Y\%m\%d\%H\%M)" -H 'Content-Type: application/json' -d'
{
  "indices": "index_swarm.datacore.com.com0",
  "ignore_unavailable": true,
  "include_global_state": false
}'

Monitor Snapshots
Regularly check the status of your snapshots to ensure they are completing successfully:
Code Block
curl -X GET "http://<es_node_ip>:9200/_snapshot/my_backup/_all/_status"

Restoring a Snapshot (if needed)
If you need to restore a snapshot, you can do so with the following command:

Code Block

curl -X POST "http://<es_node_ip>:9200/_snapshot/my_backup/snapshot_<snapshot_date>/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "index_swarm.datacore.com.com0",
  "ignore_unavailable": true,
  "include_global_state": false
}'

For the S3 bucket:

Code Block

curl -X POST "http://<es_node_ip>:9200/_snapshot/my_s3_backup/snapshot_<snapshot_date>/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "index_swarm.datacore.com.com0",
  "ignore_unavailable": true,
  "include_global_state": false
}'

'"include_global_state": false' means that only the data stored in the particular index is restored.
If you wan to restore everything from the cluster, including templates, persistent cluster settings, and more, set '"include_global_state": true'.

Automating with Elasticsearch Curator

Elasticsearch Curator simplifies managing indices and snapshots. Here’s how to set it up:

Install Curator
Code Block
pip install elasticsearch-curator

Create a Curator Configuration File (curator.yml)

Code Block

language	yaml

client:
  hosts:
    - 127.0.0.1
  port: 9200
logging:
  loglevel: INFO
  logfile: /var/log/curator.log
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

Create a Curator Action File (snapshot.yml)

Code Block

language	yaml

actions:
  1:
    action: snapshot
    description: "Snapshot selected indices"
    options:
      repository: my_backup
      name: snapshot-%Y%m%d%H%M
      ignore_unavailable: false
      include_global_state: false
    filters:
    - filtertype: pattern
      kind: prefix
      value: index_swarm.datacore.com.com0

Create a Cron Job to Run Curator

Code Block
0 2 * * * curator --config /path/to/curator.yml /path/to/snapshot.yml

Best Practices

Test Snapshots: Regularly restore snapshots to a test cluster to ensure data integrity.
Monitor Resources: Monitor cluster resources during snapshot operations to ensure they do not impact performance.
Automate Alerts: Set up alerts to notify you if a snapshot operation fails.
Retention Policy: Implement a retention policy to manage storage, deleting older snapshots to save space.

...

Version	Old Version 12	New Version 13
Changes made by	Milton Suen	Milton Suen
Saved on	Jun 27, 2024	Jun 27, 2024

Versions Compared

Key

Overview

Prerequisites

Step-by-Step Guide

Automating with Elasticsearch Curator

Best Practices

Content Comparison

Versions Compared

Key

Overview

Prerequisites

Step-by-Step Guide

Automating with Elasticsearch Curator

Best Practices