EFK를 정리하자 3일차 - Fluentd Configuration2

놀고 싶은데, 왜 다들 공부하는거야·2025년 4월 4일

EFK fluentd

Config File Syntax

1. source

Embedding Ruby expressions

Data Types for values

Common plugin parameters

EFK

목록 보기

3/8

Config File Syntax

fluentd의 configuration file은 UTF-8과 ASCII를 기본적으로 사용하며, 기본적인 지시어들은 다음과 같다.
1. source: input source를 결정한다.
2. match: output destination을 결정한다.
3. filter: event를 처리하는 pipeline을 결정한다.
4. system: system-wide 설정
5. label: configuration 파일 내의 routing을 설정하는 방법으로 output과 filter를 그룹핑하여 구조화한다.
6. worker: 특정 worker에 한정하도록 한다.
7. @include: 다른 file들을 포함한다.

1. `source`

source지시자를 사용하여 원하는 input plugin들을 선택하고, 설정할 수 있다. 기본적으로 http, forward를 가지고 있으며 http는 endpoint를 통해서 message를 받을 수 있다. 반면 forward는 TCP endpoint로 TCP 패킷을 받을 수 있다. 물론, 이 두개를 모두 사용하여 받을 수 있다. 다음의 예시를 보자.

# Receive events from 24224/tcp
# This is used by log forwarding and the fluent-cat command
<source>
  @type forward
  port 24224
</source>

# http://<ip>:9880/myapp.access?json={"event":"data"}
<source>
  @type http
  port 9880
</source>

@type 파라미터를 사용하여 source에서 어떤 input plugin을 사용하는 지 지정해야한다.

source는 event들을 fluentd routing engine으로 전달한다. event들은 3가지 entity인 tag, time, record를 가지고 있으며 tag는 string으로 만들어져 .으로 분리되어 fluentd internal routing engine에 대한 방향자로 쓰인다. time filed는 input plugin에 의해 추가되며 unix time format으로 쓰여야 한다. record는 보통 JSON 객체로 실제 log정보이다.

참고로 tag는 output desination에 따라 다양한 context에서도 사용되므로 소문자 알파벳, 숫자, underscore()를 사용하는 것이 권장된다. `^[a-z0-9]+$`

2. `match`

match는 source로부터 들어온 event들을 tag에 매칭하는 다른 시스템으로 전달하는데, 다른 시스템으로 전달하는 의미에서 output plugin으로 불린다. fluentd는 기본적으로 output plugin으로 file과 forward를 가지고 있는데, 다음의예시를 보자.

# Receive events from 24224/tcp
# This is used by log forwarding and the fluent-cat command
<source>
  @type forward
  port 24224
</source>

# http://<ip>:9880/myapp.access?json={"event":"data"}
<source>
  @type http
  port 9880
</source>

# Match events tagged with "myapp.access" and
# store them to /var/log/fluent/access.%Y-%m-%d
# Of course, you can control how you partition your data
# with the time_slice_format option.
<match myapp.access>
  @type file
  path /var/log/fluent/access
</match>

match지시어는 <match pattern>으로 pattern과 함께 내부에 @type 파라미터를 반드시 적어주어야 한다. tag의 pattern에 일치하는 event들만이 오직 해당 output desination으로 전달된다. 위의 예제에서는 myapp.access가 tag의 pattern가 된다. @type 파라미터는 사용할 output plugin을 적어주면 된다.

3. `filter`

filter 지시어는 match와 같으나 filter 체이닝 기능을 제공하여, 하나의 파이프라인을 만들 수 있다. 다음과 같은 순서로 동작한다고 보면 된다.

Input -> filter-1 -> ... -> filter-N -> Output

record_transformer filter를 추가해보도록 하자.

# http://this.host:9880/myapp.access?json={"event":"data"}
<source>
  @type http
  port 9880
</source>

<filter myapp.access>
  @type record_transformer
  <record>
    host_param "#{Socket.gethostname}"
  </record>
</filter>

<match myapp.access>
  @type file
  path /var/log/fluent/access
</match

다음의 configuration을 시각화하면 다음과 같다.

|----http----|    |----record_transformer----|     |--------file-------|  
|Tag: null   |----|Match: myapp.access       |-----|Match: myapp.access|  
|------------|    |--------------------------|     |-------------------|

event로 {"event": "data"}를 source로부터 받았다면 record_transformer filter를 먼저 걸치게된다. record_transformer filter는 host_param을 event에 추가하여, filter를 거친 event는 {"event": "data", "host_param": "webserver1"}로 변형되어 output plugin으로 간다.

4. `system`

system 지시어를 사용해서 fluentd에 대한 system configuration을 설정할 수 있다. 다음의 configuration들이 가능하다.
1. log_level
2. suppress_repeated_stacktrace
3. emit_error_log_interval
4. suppress_config_dump
5. without_source
6. process_name

다음과 같이 사용할 수 있다.

<system>
  # equal to -qq option
  log_level error
  # equal to --without-source option
  without_source
  # ...
</system>

5. `label`

label 지시어는 filter와 output을 그룹화하여 내부의 라우팅 작업을 편하게하는데 도움을 준다. label은 tag 처리의 복잡도를 낮춰준다. label 파라미터는 builtin plugin 파라미터이므로 @가 필요하다.

다음의 configuration을 보도록 하자.

<source>
  @type forward
</source>

<source>
  @type tail
  @label @SYSTEM
</source>

<filter access.**>
  @type record_transformer
  <record>
    # ...
  </record>
</filter>
<match **>
  @type elasticsearch
  # ...
</match>

<label @SYSTEM>
  <filter var.log.middleware.**>
    @type grep
    # ...
  </filter>
  <match **>
    @type s3
    # ...
  </match>
</label>

해당 configuration을 시각화하면 다음과 같다.

|----forward----|    |----record_transformer----|     |--------elasticsearch-------|  
|Tag: null      |----|Match: access.**          |-----|Match: **                   |  
|---------------|    |--------------------------|     |----------------------------|     

|-----tail-----|    |---------grep----------|     |--------s3-------|  
|Tag: null     |----|Match: myapp.access    |-----|Match: **        |
|Label: @SYSTEM|    |var.log.middleware.**  |     |                 |
|--------------|    |-----------------------|     |-----------------|

@type forward source로 들어오는 event들은 record_transformer filter를 거치고 elasticsearch output으로 라우팅된다. 반면 @type tail source로 들어오는 event들은 @SYSTEM label을 가지고 있으므로 아래의 grep filter를 거치고 s3로 ouput destination을 전달한다.

fluentd의 built-in label들이 있는데 다음과 같다.
1. @ERROR label: @ERROR label은 emit_error_event API plugin에 의해 발생한 error record에 사용된다. 만약 <label @ERROR>가 설정되고 관련된 에러가 발생할 때, 해당 event들은 label에 정의된 파이프라인을 따라 간다. 가령, 메시지 버퍼가 꽉찼거나, record가 유효하지 앟은 경우들이 있다.

ROOT label: event_emitter_router API plugin의 root router를 얻기위해 사용된다. v1.14.0에 도입되었으며 default orute로 label을 다시 할당하기위해 사용한다. 가령 timeout이 발생한 event record가 concat filter에 의해 처리되어 default route로 전달될 수 있다.

6. `worker`

여러 worker를 둘 수 있는데, worer 지시어를 사용해서 최대 수를 제한할 수 있다. worker가 무엇인지 궁금하면 다음을 참고할 수 있다. https://docs.fluentd.org/deployment/multi-process-workers

기본적으로 fluentd는 supervisor 인스턴스 하나, worker 인스턴스 하나를 배포한다. worker는 input/filter/output plugin들을 포함하고 있다.

multi-process workers feature는 여러 worker들을 launch시키고 각 worker마다 분리된 process를 사용한다.

                        |---> worker0(forward -> grep -> elasticsearch)
Supervisor ---> socker  |---> worker1(forward -> grep -> elasticsearch)
                        |---> worker2(forward -> grep -> elasticsearch)

<worker N> 또는 <worker N-M> 지시어로 worker수를 특정할 수 있다. 가령 worker 4개 중에 0~2(0,1,2)를 지정하고 싶다면 <worker 0-2>라고 쓰면 된다.

다음의 예시를 보도록하자.

<system>
  workers 4
</system>

<source>
  @type sample
  tag test.allworkers
  sample {"message": "Run with all workers."}
</source>

<worker 0>
  <source>
    @type sample
    tag test.oneworker
    sample {"message": "Run with only worker-0."}
  </source>
</worker>

<worker 0-1>
  <source>
    @type sample
    tag test.someworkers
    sample {"message": "Run with worker-0 and worker-1."}
  </source>
</worker>

<filter test.**>
  @type record_transformer
  <record>
    worker_id "#{worker_id}"
  </record>
</filter>

<match test.**>
  @type stdout
</match>

<source>로 된 부분이 3개가 있는 것을 볼 수 있다. <worker>가 지정되지 않은 test.allworkers source는 system에 설정된 기본 값 worker 4에 의해서 4개의 worker를 가진다.

test.oneworker source는 worker 0를 배정받았으므로, worker 0번만 동작하게 된다.

test.someworkers source는 worker 0-1이므로 worker0과 worker1이 배정된다.

이를 로그로 확인하면 다음과 같다.

... test.allworkers: {"message":"Run with all workers.","worker_id":"0"}
... test.allworkers: {"message":"Run with all workers.","worker_id":"1"}
... test.allworkers: {"message":"Run with all workers.","worker_id":"2"}
... test.allworkers: {"message":"Run with all workers.","worker_id":"3"}
... test.oneworker: {"message":"Run with only worker-0.","worker_id":"0"}
... test.someworkers: {"message":"Run with worker-0 and worker-1.","worker_id":"0"}
... test.someworkers: {"message":"Run with worker-0 and worker-1.","worker_id":"1"}

7. `@include`

@include 지시어를 사용해서 다른 configuration 파일을 가져올 수 있다.

# Include config files in the ./config.d directory
@include config.d/*.conf

@include 지시어는 일반 file path도 지원하고 glob pattern, http URL convention들도 지원한다.

# absolute path
@include /path/to/config.conf

# if using a relative path, the directive will use
# the dirname of this config file to expand the path
@include extra.conf

# glob match pattern
@include config.d/*.conf

# http
@include http://example.com/fluent.conf

glob pattern은 file들을 가져올 때, 알파벳 순서로 가져온다. 가령 a.conf랑 b.conf가 있다면 a.conf를 먼적 가져와 적용시킨다. 이러한 순서를 원하지 않는다면 하나하나 @include하는 수 밖에 없다.

같은 파라미터를 @include 지시어를 사용해서 공유할 수 있다. 가령, /path/to/out_buf_params.conf에 다음의 configuration 값이 있다면, @include로 가져와 설정할 수 있다.

/path/to/out_buf_params.conf

# /path/to/out_buf_params.conf
flush_interval    5s
total_limit_size  100m
chunk_limit_size  1m

이를 가져와 match에서 사용할 수 있는 것이다.

# config file
<match pattern>
  @type forward
  # ...
  <buffer>
    @type file
    path /path/to/buffer/forward
    @include /path/to/out_buf_params.conf
  </buffer>
</match>

<match pattern>
  @type elasticsearch
  # ...
  <buffer>
    @type file
    path /path/to/buffer/es
    @include /path/to/out_buf_params.conf
  </buffer>
</match>

Matching pattern

tag를 이용해 라우팅을 제공할 때 pattern이 맞아야만 한다는 조건이 있었다. pattern은 exatly하게 동일하게 만든 방법도 있지만 pattern wildcard들이 존재한다.

다음의 match pattern들은 <match>와 <filter> tag에만 적용이 가능하다.
1. *: single tag part 매칭을 확인한다. 가령, a.*이라면 a.b는 매칭되지만 a 또는 a.b.c는 안된다.
2. **: 0~N개 이상의 tag part와 매칭을 확인한다. 가령 a.**은 a.b, a, a.b.c 모두 가능하다.
3. {X, Y, Z}: X or Y or Z로 하나라도 매칭되면 된다. 가령 {a,b}라면 a와 b는 되지만, c는 안된다. 해당 패턴은 *, **패턴 조합과 함께 사용하는 것이 좋다. 가령, a.{b,c}.*와 a.{b,c.**}와 같은 패턴 매칭이 있다.
4. /regular expression/: 정규표현식을 사용해 패턴 매칭을 하도록 하는 것이다. 단, v1.11.2에서부터 도입되었으므로 최신버전으로 사용하도록 하자. /(?!a\.).*/라고 쓴다면 a.로 시작하지 않는 것들은 모두 된다 가령 b.xxx가 있다.
5. #{...}: 브라켓 안에 있는 string을 ruby expression으로 읽는다. 즉 ruby expression을 임베딩하는 것이다.
6. multiple parameter: 여러 태그가 whitespace를 기준으로 적혀있다면, 이들 중 하나라도 매칭되는 지 평가하는 것이다. 가령, <match a b>는 a와 b 모두 가능하다. <match a.** b.**>이면 a, a.b, a.b.c 모두 가능하며 b.d도 가능하다.

한 가지 조심해야할 것은 fluentd는 configuration을 읽을 때 위에서 부터 아래로 읽기 때문에 tag pattern이 가장 처음에 매칭되는 곳으로 event를 라우팅시킨다.

# ** matches all tags. Bad :(
<match **>
  @type blackhole_plugin
</match>

<match myapp.access>
  @type file
  path /var/log/fluent/access
</match

myapp.access는 절대 매칭되지 않는다. <match **>가 모든 태그와 매칭되기 때문에 모든 event들을 흡수하기 때문이다.

따라서, tight한 match condition을 먼저 만들고, 점점 wider한 match pattern을 두는 것이 좋다.

<match myapp.access>
  @type file
  path /var/log/fluent/access
</match>

# Capture all unmatched tags. Good :)
<match **>
  @type blackhole_plugin
</match>

또한, filter역시도 match보다 위에 두어야 하는데, match가 나온 뒤에 filter가 나오면 절대 실행되지 않기 때문이다. 이는 위에서부터 아래로 읽는 fluentd의 특성 때문이다.

# You should NOT put this <filter> block after the <match> block below.
# If you do, Fluentd will just emit events without applying the filter.

<filter myapp.access>
  @type record_transformer
  ...
</filter>

<match myapp.access>
  @type file
  path /var/log/fluent/access
</match>

다음과 같이 filter를 먼저 쓰고 match를 써야한다. 만약 <match myapp.access>를 먼저쓱 <filter myapp.access>를 나중에 쓰면 filter를 거치지 않고 match로 먼저가서 output을 배출하기 때문이다.

Embedding Ruby expressions

fluentd v1.4.0 version부터 #{...}를 사용하면 임의의 Ruby code를 match pattern에 임베딩 할 수 있다. 다음의 예제를 보도록 하자.

<match "app.#{ENV['FLUENTD_TAG']}">
  @type stdout
</match>

만약 환경변수 FLUENTD_TAG의 값이 dev였다면 app.dev가 되는 것이다.

Data Types for values

각 fluentd plugin들은 자신들의 특화된 파라미터들을 가지고 있다. 가령 in_tail은 rotate_wait과 pos_file과 같은 파라미터들이 있다. 각 파라미터는 특정 타입과 관련이 되어있는데 타입 정의는 다음과 같다.

string, integer, float
size: bytes 수로 파싱되는데 다음과 같다.
- k or K: N kilobytes이다.
- m or M: N megabytes이다.
- g or G: N gigabytes이다.
- t or T: N terabytes이다.
time: time duration으로 다음과 같이 파싱된다.
- s: seconds
- m: minutes
- h: hours
- d: days
- integer를 쓰지 않으면 float로 파싱되며 단위는 초단위가 된다. 가령, 0.1이면 0.1second로 100 ms가 된다.
array: JSON array로 파싱되며, 단축 문법이 제공된다.
- Normal: ["key1", "key2"]
- Shorthand: key1,key2
hash: JSON object로 파싱되며 단축 문법이 제공되낟.
- Normal: {"key1": "value1", "key2": "value2"}
- Shorthand: key1:value1,key2:value2

다음과 같이 정의하여 파라미터를 만들 수 있다.

str_param "foo bar"

array_param [
  "a", "b"
]

hash_param {
  "k": "v",
  "k1": 10
}

Common plugin parameters

다음의 파라미터는 모두 예약되어 있고 @ symbol과 함께 사용한다.
1. @type: plugin type을 지정한다.
2. @id: plugin id를 지정한다. in_monitor_agent는 plugin_id로 이 값을 사용한다.
3. @label: label symbol을 지정한다.
4. @log_level: plugin log level을 지정한다.

configuration 파일을 만들었다면 --dry-run 옵션을 통해서 유효성 검사를 할 수 있다.