[ 기초부터 다지는 ElasticSearch 운영 노하우] 5장. 클러스터 구축하기

eunsol Jo·2021년 9월 15일

🔎 기초부터 다지는 ElasticSearch 운영 노하우

목록 보기

4/4

5.1 elasticsearch.yml 설정파일

5.1.1 Cluster 영역

# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#

클러스터 전체에 적용되는 설정. 클러스터 구성시 각 노드에 같은 클러스터 이름을 사용.
클러스터 이름 변경시, 클러스터 내 모든 노드를 재시작
cluster.name
- 클러스터 이름 (기본값 : elasticsearch)

5.1.2 Node 영역

# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#

해당 노드에만 적용되는 설정.
node.name
- 노드 이름.
- 클러스터내 유일해야함. → ${HOSTNAME} = 호스트명 으로 설정시 중복을 피할 수 있다.
- 운영중 변경 불가. 변경시 노드 재시작.
- 주석처리시 랜덤한 문자열으로 자동 설정됨.
node.attr.rack
- 사용자 정의된 rack값으로 HA구성과 같이 샤드를 분배.

5.1.3 Path 영역

# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /path/to/data
#
# Path to log files:
#
path.logs: /path/to/logs
#

데이터 와 로그 저장 위치 관련 설정
반드시 설정되어야 하는 값. 주석X. (설정 하지 않으면 애플리케이션 실행이 되지 않음.)
path.data
- 노드의 문서들을 저장할 경로. = 색인된 문서들이 세그먼트 파일로 저장될 위치.
- 멀티 path 설정 가능. (경로1, 경로2)
  - 여러개의 디스크존재시, 분산저장할 수 있다는 장점이 있다.
  - 그러나, 하나의 디스크에 문제 발생시 어느 문서들이 영향을 받는지 알 수 없다는 단점이 있다.
path.logs
- 로그를 저장할 경로

5.1.4 Memory 영역

# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#

Elasticsearch 프로세스에서 할당되는 메모리 영역 관리 설정
bootstrap.memory_lock
- 스왑 메모리 영역 사용 여부 설정
- 스왑 메모리 사용하지 않는것을 권고
  - 이는 성능을 보장 하나, Out Of Memory 에러로 노드 장애 발생 가능성이 있다.
  - 대게의 경우 문제가 없으나, JVM 힙 메모리 용량이 시스템 메모리 용량의 절반이상이 되면 Out Of Memory 발생 가능성이 있다.
  - 이를 사용하기 위해선 OS의 /etc/security/limits.conf 파일도 수정해야 한다.
  - systemd로 프로세스 시작시 추가 설정

            $ sudo vi etc/security/limits.conf
            elasticsearch soft memlock unlimited
            elasticsearch hard memlock unlimited
            # {계정명} -> yum or rpm 설치시 디폴트계정명(=elasticsearch)

            $ sudo mkdir /etc/systemd/system/elasticsearch.service.d
            $ sudo vi /etc/systemd/system/elasticsearch.service.d/override.conf
            [service]
            LimitMEMLOCK=infinity
            $ sudo systemctl deamon-reload

	⇒ 위 추가설정 없이 bootstrap.memory_lock: true 설정시 프로세스가 시작되지 않음.

5.1.5 Network 영역

# ---------------------------------- Network -----------------------------------
#
# By default Elasticsearch is only accessible on localhost. Set a different
# address here to expose this node on the network:
#
#network.host: 192.168.0.1
#
# By default Elasticsearch listens for HTTP traffic on the first free port it
# finds starting at 9200. Set a specific HTTP port here:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#

외부와 통신하기 위한 IP & 노드간 통신 설정
network.host : IP주소 설정.
- 아래 두가지 설정을 한번에 설정하기 위한 필드. 두가지로 나누어 각각 설정 할 수 있다.
- 대게 분리해서 설정함.
  
  WHY?
  
  ⇒ bind_host 를 특정 IP로 지정하면, 노드 내부에서도 localhost(127.0.0.1)으로 접근 할 수 없다.
  
  ⇒ 스크립트나 자동화를 통해 ElasticSearch관리시 localhost 사용이 효율적.
  
  ⇒ 0.0.0.0 설정 권장 → 노드IP + localhost 모두 사용 가능.
  
  ⇒ 하지만, 두개이상의 노드의 경우 같은 IP로 설정시, 통신 불가능...
  
  ⇒ [결론] 클라이언트 요청 = 노드IP + localhost & 노드간 통신 = 노드IP (5.3에서 해결방안)
- network.bind_host - 노드간 통신
- network.publish_host - 외부통신
http.port : 포트 설정

5.1.6 Discorvery 영역

# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.seed_hosts: ["host1", "host2"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
#cluster.initial_master_nodes: ["node-1", "node-2"]
#
# For more information, consult the discovery and cluster formation module documentation.
#

노드간 클러스터링 설정
discovery.zen.ping.unicast.hosts ???
- 클러스터링을 위한 다른 노드들의 정보.
- 배열 형태. 1개 이상의 노드 나열.
discovery.zen.minimum_master_nodes
- 최소한의 마스터 노드 개수 설정. (기본값 : 1)
- split brain 현상을 방지하는데 반드시 필요한 설정.
- 7.x부터 사라짐.
  - node.master: true 인 노드가 추가되면 클러스터가 스스로minimum_master_nodes 노드 값을 변경하도록 됨
  - 사용자는 최초 마스터 후보로 선출할 cluster.initial_master_nodes: 값만 설정

⇒ hosts에 설정된 IP에 ElasticSearch 떠있는지 확인 → 해당 클러스터의 정보를 받아서, 마스터 노드 개수를 확인. 최소 개수 이상이면, 성공적으로 클러스터에 합류.

*split brain 현상 :

data node (3개) + master node (3개)

마스터 노드간 통신이 (2) / (1) 로 분리 되었을때, 최소 마스터 노드가 1개 일경우

분리된 두 클러스터 모두 조건을 충족 → 그래서, 과반수인 2로 설정을 하여 이런 경우를 방지.

과반수 : (전체 마스터 후보 노드 / 2) + 1

5.1.7 Gatway 영역

클러스터 복구 관련 설정
gateway.recover_after_nodes
- 클러스터내 노드 모두 재시작시, 최소 몇개 노드가 정상적인 상태일때 복구를 시작할 것인지 설정
  - 버전 업그레이드 or 전체 노드 장애 → 전체 노드 재시작 (Full Cluster Restart)
  - 재시작 노드들을 순차적으로 클러스터링 진행
  - 클러스러링 시작시, 클러스터내 인덱스 데이터를 복구
- 노드 role에 따라 최소 개수 설정 가능
  - gateway.recover_after_master_nodes
  - gateway.recover_after_data_nodes

5.1.8 Various 영역

# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

action.destructive_requires_name
- 클러스터내 인덱스를 _all or wildcard 표현식으로 삭제 불가 설정
- 사용자의 실수로 삭제되는것을 방지

5.1.9 노드의 역할 정의

마스터 노드 / node.master : ture
데이터 노드 / node.data : ture
인제스트 노드 / node.ingest : ture
코디네이트 노드 = 위 역할을 모두 false 처리
- 역할 : 사용자 요청 → 노드 → 응답 취합 → 사용자
- 데이터 노드중 하나가 코디네이터 노드 역할을 하여 사용량 증가 하는것을 방지
- 데이터 노드 큐 + 코디네이터 노드 큐 → 많은 힙 메모리 사용 → Out Of Memory
- 특히 aggregate API 사용이 빈번한 클러스터는 더욱 분리가 필요

⇒ 노드 역할은 중복 설정이 가능.

5.2 jvm.options 설정파일

ElasticSearch는 자바 기반 → 힙 메모리, GC 같은 JVM 설정이 필요
어플리케이션 성능에 중요!
jvm.options


    ################################################################
    ##
    ## JVM configuration
    ##
    ################################################################
    ##
    ## WARNING: DO NOT EDIT THIS FILE. If you want to override the
    ## JVM options in this file, or set any additional options, you
    ## should create one or more files in the jvm.options.d
    ## directory containing your adjustments.
    ##
    ## See https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html
    ## for more information.
    ##
    ################################################################################################################################
    ## IMPORTANT: JVM heap size
    ################################################################
    ##
    ## The heap size is automatically configured by Elasticsearch
    ## based on the available memory in your system and the roles
    ## each node is configured to fulfill. If specifying heap is
    ## required, it should be done through a file in jvm.options.d,
    ## and the min and max should be set to the same value. For
    ## example, to set the heap to 4 GB, create a new file in the
    ## jvm.options.d directory containing these lines:
    ##
    ## -Xms4g
    ## -Xmx4g
    ##
    ## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
    ## for more information
    ##
    ################################################################

    ################################################################
    ## Expert settings
    ################################################################
    ##
    ## All settings below here are considered expert settings. Do
    ## not adjust them unless you understand what you are doing. Do
    ## not edit them in this file; instead, create a new file in the
    ## jvm.options.d directory containing your adjustments.
    ##
    ################################################################
    ## GC configuration
    8-13:-XX:+UseConcMarkSweepGC
    8-13:-XX:CMSInitiatingOccupancyFraction=75
    8-13:-XX:+UseCMSInitiatingOccupancyOnly

    ## G1GC Configuration
    # NOTE: G1 GC is only supported on JDK version 10 or later
    # to use G1GC, uncomment the next two lines and update the version on the
    # following three lines to your version of the JDK
    # 10-13:-XX:-UseConcMarkSweepGC
    # 10-13:-XX:-UseCMSInitiatingOccupancyOnly
    14-:-XX:+UseG1GC

    ## JVM temporary directory
    -Djava.io.tmpdir=${ES_TMPDIR}

    ## heap dumps

    # generate a heap dump when an allocation from the Java heap fails; heap dumps
    # are created in the working directory of the JVM unless an alternative path is
    # specified
    -XX:+HeapDumpOnOutOfMemoryError

    # specify an alternative path for heap dumps; ensure the directory exists and
    # has sufficient space
    -XX:HeapDumpPath=data

    # specify an alternative path for JVM fatal error logs
    -XX:ErrorFile=logs/hs_err_pid%p.log

    ## JDK 8 GC logging
    8:-XX:+PrintGCDetails
    8:-XX:+PrintGCDateStamps
    8:-XX:+PrintTenuringDistribution
    8:-XX:+PrintGCApplicationStoppedTime
    8:-Xloggc:logs/gc.log
    8:-XX:+UseGCLogFileRotation
    8:-XX:NumberOfGCLogFiles=32
    8:-XX:GCLogFileSize=64m

    # JDK 9+ GC logging
    9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m

-Xms1g(최소) / -Xmx1g(최대)
- 최대/최소 힙메모리 크기 설정
- 두 값이 같지 않으면, 처음에 최소로 확보후 추가 확보를 하게됨 → 성능 저하 → 같은값 설정 권장
- 권장사항
  - 32GB 넘지 않게 설정할것
    
    JVM의 연산을 위한 데이터를 저장하기 위한 공간 = 힙 메모리
    
    힙 메모리에 저장되는 데이터 = 오브젝트
    
    오브젝트에 접근하기 위한 메모리상의 주소 = OOP(Ordinary Object Pointer) → 구조체
    
    OOP는 시스템 아키텍처에 따라 32비트(최대4GB) or 64비트(최대16GB) 주소공간을 가리킴
    
    실제 성능 : 32비트 >> 64비트
    
    이유? 더 많은 연산과 더 많은 메모리 공간을 필요로한다.
    
    ⇒ 이러한 성능 저하를 방지 하기 위해 Compressed OOP로 32비트 기반으로도 4GB이상의 영역을 가르키도록 구현
    
    Native OOP VS Compressed OOP
    
    1 → 주소1 → 주소 8
    
    2 → 주소2 → 주소 16
    
    8배 많은 주소 공간 표시 가능
    
    기존 4GB → 32GB까지 증가
  - 전체 메모리의 절반정도로 설정할것
    
    ElasticSearch는 빈번한 I/O발생(=성능) → 페이지 캐시 활용하는 것이 좋음
    - 페이지 캐시
      
      OS에서 I/O발생을 줄이기 위해 메모리에 저장
      
      애플리케이션이 사용하지 않는 미사용 메모리를 활용 → 애플리케이션의 메모리를 줄여야함
8-13:-XX:+UseConcMarkSweepGC
- CMS라는 GC방식을 사용 (권장)
8-13:-XX:CMSInitiatingOccupancyFraction=75
- CMS GC사용시, 힙메모리 X% 이상 사용시 old GC실행 설정
- 기본값 75%
- 낮으면 old GC가 너무 잦게 발생, 높으면 수행시간이 길어짐. 적절한 설정 필요.
- old GC 발생시 Stop-the-world 현상에 의해 프로세스가 응답 불가 상태가 될 수 있음 (7장에서 자세히)
8-13:-XX:+UseCMSInitiatingOccupancyOnly
- old GC실행 조건을 (3)에 의해서만 제어. (GC통계 데이터를 보지 않음)
## G1GC Configuration
- CMS GC아닌 G1 GC사용. 이는 다양한 이슈 발생 가능성이 높다. 많은 테스트 필요.

5.3 클러스터 사용하기

데이터 인덱싱
클러스터 노드 장애시 서비스 정상동작 확인

eunsol Jo

Later never comes 👩🏻‍💻

이전 포스트