1 Scripted Configuration
2 Customization
- 2.1 Elasticsearch Config File
- 2.2 Systemd (RHEL/CentOS)
- 2.3 Environment Settings
- 2.4 JVM Options
- 2.5 Log Setup

Elasticsearch requires configuration and settings file changes to be made consistently across the Elasticsearch cluster.

Scripted Configuration

Using the provided configuration script automates in-place Elasticsearch upgrades as well as the essential configuration that Elasticsearch requires for use with Swarm.

The script handles the following:

Upgrading Elasticsearch in place (using the same index) if it detects a supported version (6.8.6+) is already installed and configured
Editing /etc/elasticsearch/elasticsearch.yml (except for changing the path.data variable to use a different data directory)
Editing /etc/elasticsearch/log4j2.properties
Editing /usr/lib/systemd/system/elasticsearch.service
Editing /etc/sysconfig/elasticsearch
Creating the override file for Systemd: /etc/systemd/system/elasticsearch.service.d/override.conf

Bulk
Usage

This method is most efficient for a large number of nodes and/or have manual configurations to apply to the elasticsearch.yml (see next section).

Run the configuration script provided in /usr/share/caringo-elasticsearch-search/bin/ on the first Elasticsearch node. This script prompts for the needed values as it progresses:
/usr/share/caringo-elasticsearch-search/bin/configure_elasticsearch_with_swarm_search.py --esversion 8.15.1
The script generates custom configuration files for each of the nodes in the Elasticsearch cluster. (v10.x)
- The current node's file is /etc/elasticsearch/elasticsearch.yml
- The other nodes' files (if any) are /etc/elasticsearch/elasticsearch.yml.<node-name-or-ip>
Follow the Customization details (below) to update the YAML files further, such as to change Elasticsearch's path.data (data directory).
1. Update log files to match the data path or other customizations.
2. Update the rollingfile appender to delete rotated logs archives to prevent running out of space.
Complete these steps for all remaining nodes:
1. Copy over the appropriate file as /tmp/elasticsearch.yml.<node-name-or-ip> on the next Elasticsearch node.
2. Run the configuration script with the -c argument with the YAML file in place so it uses the existing file.
  configure_elasticsearch_with_swarm_search.py --esversion 8.15.1 -c /tmp/elasticsearch.yml.<node-name-or-ip>
3. Repeat for each node in the cluster.
Resume the installation to turn on the service: Installing Elasticsearch or Migrating from Older Elasticsearch

Non-Bulk
Usage

Info

This still requires running the configure script on each node but do not copy the generated elasticsearch.yml files between the nodes.

Run the configuration script provided in /usr/share/caringo-elasticsearch-search/bin/ on the first Elasticsearch node. This script prompts for the needed values as it progresses:
configure_elasticsearch_with_swarm_search.py --esversion 8.15.1
The script generates a custom /etc/elasticsearch/elasticsearch.yml configuration file for the current node as well as files for each of the nodes, which can be ignored. (v10.x)
Following the Customization details below to update the YAML file further, such as to change Elasticsearch's path.data (data directory).
1. Update log files to match the data path or other customizations.
2. Update the rollingfile appender to delete rotated logs archives to prevent running out of space.
Run the script the same way on each remaining ES node, answering the prompts consistently and reapplying any manual configurations.
Resume the installation to turn on the service: Installing Elasticsearch or Migrating from Older Elasticsearch

Note

In step 4, the prompt for the cluster name and list of nodes must be answered identically.

Customization

The paths given are relative to the Elasticsearch installation directory, which is assumed to be the working directory.

Caution

Errors in adding and completing these settings can prevent the Elasticsearch service from working properly.
Adjust all references to Elasticsearch’s path.data location below to reflect the new location if the path.data location is customized from the default.

Elasticsearch Config File

Version Differences

The Elasticsearch configuration settings have changed with each major release. See Elasticsearch Configuration Differences to track how these configuration settings have changed since Elasticsearch 2.3.3.

Edit the Elasticsearch config file: /etc/elasticsearch/elasticsearch.yml

action.auto_create_index: "+csmeter,+_nfsconnector,.watches, .triggered_watches,.watcher-history-*"	Needed to disable automatic index creation, csmeter indices, and Swarm NFS connectors. (v10.1)
cluster.name: <ES_cluster_name>	Provide the Elasticsearch cluster a unique name, which is unrelated to the Swarm cluster name. Do not use periods in the name. Important This must differ from the `cluster.name` of the legacy ES cluster to prevent merging, if one is operating.
node.name: <ES_node_name>	Optional: Elasticsearch supplies a node name if one is not set. Do not use periods in the name.
network.host: _site_	Assign a specific hostname or IP address, which requires clients to access the ES server using that address. Update `/etc/hosts` if using a hostname. Defaults to the special value, `_site_`.
cluster.initial_master_nodes	(ES 7+) For first-time bootstrapping of a production ES cluster. Set to an array or comma-delimited list of the hostnames of the master-eligible ES nodes whose votes should be counted in the very first election. This setting should be removed after the cluster has formed: https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#initial_master_nodes
discovery.zen. minimum_master_nodes: 3	(ES 6 only) Set to (number of master-eligible nodes / 2, rounded down) + 1. Prevents split-brain scenarios by setting the minimum number of ES nodes online before deciding on electing a new master.
discovery.seed_hosts	(ES 7+) Enables auto-clustering of ES nodes across hosts. Set to an array or comma-delimited list of the addresses of all master-eligible nodes in the cluster.
discovery.zen.ping.unicast.hosts: ["es0", "es1"]	(ES 6 only) Set to the list of node names/IPs in the cluster, verifying all ES servers are included. Multicast is disabled by default.
gateway.expected_data_nodes: 4	Add and set to the number of data nodes in the ES cluster. Recovery of local shards starts as soon as this number of nodes have joined the cluster. It falls back to the `recover_after_data_nodes` value after 5 minutes. This example is for a 4-node cluster. Before ES 8 this configuration was named `gateway.expected_nodes`.
gateway.recover_after_data_nodes: 2	Set to the minimum number of ES data nodes started before going into operation status: Set to 1 if total nodes is 1 or 2. Set to 2 if total nodes is 3 or 4. Set to the number – 2 if total nodes is 5 to 7. Set to the number – 3 if total nodes 8 or more. Before ES 8 this configuration was named `gateway.recover_after_nodes`.
bootstrap.memory_lock: true	Set to lock the memory on startup to verify Elasticsearch does not swap (swapping leads to poor performance). Verify enough system memory resources are available for all processes running on the server. The RPM installer makes these edits to`/etc/security/limits.d/10-caringo-elasticsearch.conf` to allow the `elasticsearch` user to disable swapping and to increase the number of open file descriptors: `# Custom for Caringo Swarm elasticsearch soft nofile 65536 elasticsearch hard nofile 65536 elasticsearch soft nproc 4096 elasticsearch hard nproc 4096 # allow user 'elasticsearch' memlock elasticsearch soft memlock unlimited elasticsearch hard memlock unlimited`
path.data: <path_to_data_directory>	By default path.data is `/var/lib/elasticsearch` and the directory is created with the needed ownership. A separate, dedicated partition of ample size can be used instead of making the `elasticsearch` user the owner of that directory: `chown -R elasticsearch:elasticsearch <path_to_data_directory>` Then path.data can be set to the directory or make a symlink to the default location: `ln -s <path_to_data_directory> /var/lib/elasticsearch`
thread_pool.write.queue_size	The size of the queue used for bulk indexing. This variable was called `threadpool.bulk.queue_size` in earlier Elasticsearch versions.
node.attr.rack	Optional: A setting for Elasticsearch that tells to not assign the replica shard to a node running in the same “rack” where the primary shard lives. This allows for example a 6-node cluster running with 2 nodes on each of 3 ESXi hosts to survive one of the ESXi hosts being down. The state is yellow, not red. Set to a rack name or ESXi host identifier like `esxi3` on the Elasticsearch node(s) running on the third virtual machine host. This also requires setting `cluster.routing.allocation.awareness.attributes=rack` on all ES nodes. Both settings should already be in your Elasticsearch.yml but commented out. Ideally, this is set right after initial configuration when first starting Elasticsearch. To add to an existing deployment, all nodes must be restarted before shards are reallocated. To do this without downtime, first turn off shard allocation, then restart each node one by one waiting for it to show in `GET /_cat/nodes` before moving to the next node. When done, reenable shard allocation. Health is yellow during this time. As an example, this process takes an hour for a 9 node cluster (20 x 30GB shards) to go green. Monitor shard allocation with `GET _cluster/allocation/explain`, also see: https://www.elastic.co/guide/en/elasticsearch/reference/7.5/allocation-awareness.html Rolling Restart of Elasticsearch

Systemd (RHEL/CentOS)

Create a systemd override file for the Elasticsearch service to set the LimitMEMLOCK property to be unlimited.

Create the override file.
/etc/systemd/system/elasticsearch.service.d/override.conf
Add this content.
[Service] LimitMEMLOCK=infinity # Uncomment below line if elasticsearch fails to start with the JNA warning: # [WARN ][o.e.b.Natives ] unable to load JNA native support library # You will need to manually make the "tmp" directory and chown it to elasticsearch # Environment=ES_TMPDIR=/usr/share/elasticsearch/tmp
Load the override file; otherwise, the setting does not take effect until the next reboot.
sudo systemctl daemon-reload

Environment Settings

Edit the environmental settings: /etc/sysconfig/elasticsearch

`MAX_OPEN_FILES`	Set to `65536`
`MAX_LOCKED_MEMORY`	Set to `unlimited` (prevents swapping)

JVM Options

Edit the JVM settings to manage memory and space usage: /etc/elasticsearch/jvm.options

`-Xms`	Set to half the available memory, but not more than 31 GB.
`-Xmx`	Set to half the available memory, but not more than 31 GB.

GC logs (optional) - Elasticsearch enables GC logs by default. These are configured in jvm.options and output to the same default location as the Elasticsearch logs. The default configuration rotates the logs every 64 MB and can consume up to 2 GB of disk space. Disable these logs until needed to troubleshoot memory leaks. Comment out these lines to disable them:

#8:-Xloggc:/var/log/elasticsearch/gc.log
#8:-XX:+UseGCLogFileRotation
#8:-XX:NumberOfGCLogFiles=32
#8:-XX:GCLogFileSize=64m
#9:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m

Log Setup

Adjust the configuration file: /etc/elasticsearch/log4j2.properties to customize the logging format and behavior.

Logging has the needed ownership in the default location. Choose a separate, dedicated partition of ample size, and make the elasticsearch user the owner of that directory to move the log directory:

chown -R elasticsearch:elasticsearch <path_to_log_directory>

Deprecation log

This is the log of deprecated actions, to inform for future migrations. Adjust the log size and log file count for the deprecation log:

Update to these values

appender.deprecation_rolling.policies.size.size = 2097152
appender.deprecation_rolling.strategy.max = 25

Deprecation logging is enabled at the WARN level by default, the level at which all deprecation log messages are emitted. Change the log level to ERROR to avoid having large warning logs:

Change level

logger.deprecation.level = error

Swarm Documentation

Configuring Elasticsearch