Overview
This guide will help you set up and enable regular backups of your Elasticsearch index without causing downtime. It leverages Elasticsearch's snapshot and restore functionality to create backups efficiently. The example provided uses a shared file system for storing snapshots.
Prerequisites
Elasticsearch cluster running.
Shared file system accessible by all Elasticsearch nodes.
Access to the cluster via curl or a similar HTTP client.
(Optional) Elasticsearch Curator for automating snapshots.
Step-by-Step Guide
Create a Snapshot Repository
First, create a snapshot repository where the snapshots will be stored. This can be a shared file system, Amazon S3, HDFS, etc. For this example, we'll use a shared file system.
curl -X PUT "http://<es_node_ip>:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d' { "type": "fs", "settings": { "location": "/mount/backups/my_backup" } }'
When the target repository is another Swarm cluster, the command to create the snapshot repository would be as follows:
curl -X PUT "http://<es_node_ip>:9200/_snapshot/my_s3_backup" -H 'Content-Type: application/json' -d' { "type": "s3", "settings": { "bucket": "my-elasticsearch-backup-bucket", "endpoint": "https://datacore-swarm.example.com", "access_key": "your_access_key", "secret_key": "your_secret_key", "protocol": "https" } }'
Ensure that the location path is accessible and writable by all nodes in the cluster.
NOTE: The shared repository location will have to be specified in the elasticsearch.yml
file as a “path.repo
” parameter:
path.repo: "/mount/backups/my_backup"
Verify the Repository
After creating the repository, verify it to ensure it is set up correctly:
curl -X GET "http://<es_node_ip>:9200/_snapshot/my_backup"
Create a Snapshot
Once the repository is set up and verified, create a snapshot of your index. Replace
index_mumbkctcomobs.datacore.com.com0
with your index name.curl -X PUT "http://<es_node_ip>:9200/_snapshot/my_backup/snapshot_$(date +\%Y\%m\%d\%H\%M)" -H 'Content-Type: application/json' -d' { "indices": "index_swarm.datacore.com.com0", "ignore_unavailable": true, "include_global_state": false }'
Automate Snapshot Creation
To automate the creation of snapshots, you can use cron jobs on Linux or scheduled tasks on Windows.
Example using a cron job (runs daily at 2 AM):
0 2 * * * curl -X PUT "http://<es_node_ip>:9200/_snapshot/my_backup/snapshot_$(date +\%Y\%m\%d\%H\%M)" -H 'Content-Type: application/json' -d' { "indices": "index_swarm.datacore.com.com0", "ignore_unavailable": true, "include_global_state": false }'
Monitor Snapshots
Regularly check the status of your snapshots to ensure they are completing successfully:
curl -X GET "http://<es_node_ip>:9200/_snapshot/my_backup/_all/_status"
Restoring a Snapshot (if needed)
If you need to restore a snapshot, you can do so with the following command:
curl -X POST "http://<es_node_ip>:9200/_snapshot/my_backup/snapshot_<snapshot_date>/_restore" -H 'Content-Type: application/json' -d' { "indices": "index_swarm.datacore.com.com0", "ignore_unavailable": true, "include_global_state": false }'
NOTE:
"include_global_state": false
will mean that only the data that are stored in the particular index is restored.
But if we want to restored everything from the cluster right from templates, persistent cluster settings and all, we may have to enable the same by using"include_global_state": true
Automating with Elasticsearch Curator
Elasticsearch Curator simplifies managing indices and snapshots. Here’s how to set it up:
Install Curator
pip install elasticsearch-curator
Create a Curator Configuration File (
curator.yml
)client: hosts: - 127.0.0.1 port: 9200 logging: loglevel: INFO logfile: /var/log/curator.log logformat: default blacklist: ['elasticsearch', 'urllib3']
Create a Curator Action File (
snapshot.yml
)actions: 1: action: snapshot description: "Snapshot selected indices" options: repository: my_backup name: snapshot-%Y%m%d%H%M ignore_unavailable: false include_global_state: false filters: - filtertype: pattern kind: prefix value: index_mumbkctcomobs.ipstorage.tatacommunications.com0
Create a Cron Job to Run Curator
0 2 * * * curator --config /path/to/curator.yml /path/to/snapshot.yml
Best Practices
Test Snapshots: Regularly restore snapshots to a test cluster to ensure data integrity.
Monitor Resources: Monitor cluster resources during snapshot operations to ensure they do not impact performance.
Automate Alerts: Set up alerts to notify you if a snapshot operation fails.
Retention Policy: Implement a retention policy to manage storage, deleting older snapshots to save space.
By following these steps, you can enable regular backups of your Elasticsearch index without causing downtime, ensuring your data is safe and recoverable.