Deploy ES Cluster App

Overview

This document describes the principles, ideas, and specific methods of building an Elasticsearch cluster based on Kato. The official Elaseicsearch cluster version application has been launched in the application market, Kato users can directly pull and install it with one click. The ES cluster mechanism of different major versions is different. The Elasticsearch version used in this article is v6.2.3.

Introduction to Elasticsearch and Cluster Principles

Introduction to Elasticsearch

Elasticsearch is a highly scalable open source full-text search and analysis engine. It allows you to store, search and analyze large amounts of data quickly and in near real-time. It is usually used as the underlying engine/technology to provide support for applications with complex search functions and requirements.

Guide reading:

Cluster Principle

Elasticsearch uses a custom discovery implementation called “Zen Discovery” for node-to-node clustering and master election.

Based on this, there are two necessary options that need to be set:

  • discovery.zen.ping.unicast.hosts

This option specifies the addresses of seeds that can join the cluster. There are three optional configuration formats as follows:

-192.168.1.10:9300 directly uses the IP:PORT format, multiple nodes are used, separated.
-192.168.1.10 specifies the IP address, and its port is specified by `transport.profiles.default.port`, if there is no setting, it is specified by `transport.tcp.port`. Multiple nodes are used and separated.
-seeds.mydomain.com specifies a resolvable domain name. If the domain name corresponds to multiple IPs, try them one by one.
  • discovery.zen.minimum_master_nodes

This option tells all nodes that meet the characteristics of the master node, the maximum number of master nodes that must be visible in the cluster. Used to prevent cluster split brain from causing data loss. The reasonable number of settings is:

(master_eligible_nodes / 2) + 1

Guide Reading:

Kato’s Idea of M​making Elasticsearch

The effect we hope to achieve is to expand multiple instances in the same Elasticsearch application. Once multiple instances are expanded, they will automatically form a cluster; it should be noted that the IP of an Elasticsearch instance running in a container environment may change.

According to the Elasticsearch cluster principle described above. We need to meet at least one of the following two conditions:

  • You can dynamically obtain the IP addresses of all instances under the current application
  • You can dynamically obtain the resolvable domain names of all instances under the current application, and write the information into the configuration file before the instance starts.

After comparison, the first of the two conditions is easier to achieve.

In the Kato environment, in stateful applications, the IP of each instance can be obtained through DNS resolution:

sh-4.2# env | grep SERVICE_NAME
SERVICE_NAME=gr4f7bd5
sh-4.2# nslookup ${SERVICE_NAME}
Server: 10.10.20.12
Address: 10.10.20.12#53

Name: gr4f7bd5.157b2015f1c74b219f38849f7857d382.svc.cluster.local
Address: 192.168.119.163
Name: gr4f7bd5.157b2015f1c74b219f38849f7857d382.svc.cluster.local
Address: 192.168.171.182
Name: gr4f7bd5.157b2015f1c74b219f38849f7857d382.svc.cluster.local
Address: 192.168.9.155

It is worth noting that the result obtained by this command is dynamic.

Even if only one instance is deployed at the beginning, and subsequent expansion is performed, executing the above command during the startup of the second instance will obtain the IP addresses of the current two instances.

Since the IP addresses of all instances can be dynamically obtained, the following method can be used to make a dynamically scalable Es cluster application based on Kato.

Actual Production Process and Code Analysis

Sample repository address

Users can directly fork the above repositories to obtain the cluster version of the ES application source code.

Please read the comments in the code carefully.

Dockerfile

# The base image is designated as ES official image 6.2.3 version
FROM elastic/elasticsearch:6.2.3
MAINTAINER gdevs <gdevs@gridworkz.com>
# Install nslookup command based on CentOS
RUN yum makecache fast && \
    yum install bind-utils -y && \
    yum clean all && \
    rm -rf /var/cache/yum
# Copy the startup script in the source code into the mirror
COPY docker-entrypoint.sh /
# Copy the basic configuration file in the source code into the mirror
COPY elasticsearch.yml /usr/share/elasticsearch/config/elasticsearch.yml
# Open port
EXPOSE 9200 9300
# Persistent data mount
VOLUME ["/usr/share/elasticsearch/data"]
# Start command
ENTRYPOINT ["/docker-entrypoint.sh"]
CMD ["/usr/local/bin/docker-entrypoint.sh","eswrapper"]

Points to note in Dockerfile:

  1. The installation method of the nslookup command is different for different releases. For example, Ubuntu/Debian is apt-get install dnsutils

  2. docker-entrypoint.sh requires executable permissions. The reason why the authorization operation is not performed in this example is that the authorization has been completed locally before the file is uploaded.

docker-entrypoint.sh

#!/bin/bash
[[ $DEBUG ]] && set -x 
# Modify the configuration file, set specific items through environment variables
sed -i -e "s/POD_IP/${POD_IP:-'0.0.0.0'}/g" \
       -e "s/HOSTNAME/${HOSTNAME}.${HOSTNAME%-*}.${TENANT_ID}.svc.cluster.local./g" /usr/share/elasticsearch/config/elasticsearch.yml

sleep 10
# Dynamically obtain the number of current instances to determine whether to enable the cluster configuration
CURRENT_POD_NUM=$(nslookup ${SERVICE_NAME} | grep Address | sed '1d' | awk'{print $2}' | wc -l)
[[ $DEBUG ]] && echo $(nslookup ${SERVICE_NAME})> ./logfile
# If the number of instances in the current application is greater than 1, enable the configuration to join the cluster
if [[ $CURRENT_POD_NUM -gt 1 ]];then
    sed -i'$a\discovery.zen.ping.unicast.hosts' /usr/share/elasticsearch/config/elasticsearch.yml
    # Get the IP addresses of all current instances
    ip=$(nslookup ${SERVICE_NAME} | grep Address | sed '1d' | awk'{print $2}')
    # According to the ES configuration file format, format processing
    ips=$(echo $ip | tr ''',')
    [[ $DEBUG ]] && echo ${ip} >> ./logfile
    # Write all IPs to the configuration file
    sed -i "s/discovery.zen.ping.unicast.hosts*/discovery.zen.ping.unicast.hosts: [${ips}]/g" /usr/share/elasticsearch/config/elasticsearch.yml
fi
    
[[ $PAUSE ]] && sleep $PAUSE
# Execute the startup script specified by ES
exec $@

Summary

Using DNS resolution to dynamically obtain the IP addresses of all instances of the current application is a best practice for building a cluster. The cases in this article represent most applications that need to explicitly declare the connection addresses of all nodes in the cluster in the configuration file. Users can make their own cluster applications of this type based on this case.