EFK를 정리하자 4일차 - Fluentd Configuration3

놀고 싶은데, 왜 다들 공부하는거야·2025년 4월 4일

EFK

목록 보기

4/8

Common Parameters

@type은 plugin의 type을 설명하는 파라미터이다.

<source>
  @type my_plugin_type
</source>

<filter>
  @type my_filter
</filter>

@id는 configuration에 대한 unique한 이름을 지정해준다. id는 buffer, storage, logging 등 다양한 목적을 위해 path로 사용된다.

<match>
  @type file
  @id service_www_accesslog
  path /path/to/my/access.log
  # ...
</match>

해당 파라미터는 모든 plugin들이 root_dir feature를 global하게 사용하기 위해서 꼭 있어야한다.

@log_level은 해당 plugin에서 타겟으로 삼는 logging level을 지정한다. 기본적으로 info이며, 전역적으로 logging level을 설정하고 싶다면 <system> section에 log_level을 설정함으로서 만들 수 있다. 전역적으로 log level을 설정했다라도, 각 plugin의 local @log_level 파라미터가 먼저 우선으로 작용된다.

<system>
  log_level info
</system>

<source>
  # ...
  @log_level debug # shows debug log only for this plugin
</source>

위의 경우 log_level이 <system>에서는 info이지만 <source>에서는 log_level이 debug이므로 해당 source에서는 debug가 적용된다.

해당 parameter의 주요 목적은 다음과 같다.
1. plugin에서 다루는 log들이 너무 많거나
2. debugging 과정에서 debug log를 보여주기 위해서이다.

@label은 input event를 <label>이 정의된 section으로 라우팅을 해준다. <label>은 <match>와 <filter>로 이루어져 일련의 시퀸스를 이룬다.

<source>
  @type ...
  @label @access_logs
  # ...
</source>

<source>
  @type ...
  @label @system_metrics
  # ...
</source>

<label @access_logs>
  <match **>
    @type file
    path ...
  </match>
</label>

<label @system_metrics>
  <match **>
    @type file
    path ...
  </match>
</label>

@label @access_logs를 가진 source는 <label @access_logs>로 라우팅되며, @label @system metrics를 가진 source는 <label @system_metrics>로 간다.

주의: @label 파라미터는 반드시 @로 시작하는 문자로 구성되어야 한다.

Parse section

일부 fluentd plugin에서 <parse> section을 제공하는데 이는 raw data를 어떻게 parse할 것인지를 정의하는 부분이다.

parse는 <source>, <match> 또는 <filter> section 아래이 있을 수 있으며, parse plugin feature을 제공하는 plugin에 대해서 사용가능하다. 즉, input plugin에서 parse section을 제공안하면 적용 안된다는 것이다.

<source>
  @type tail
  # ...
  <parse>
    # ...
  </parse>
</source>

parse plugin도 type이 있기 때문에 <parse> section에 다음과 같이 써줄 수 있다.

<parse>
  @type apache2
</parse>

parse의 @type은 다음의 것들이 있다. regexp, apache2, nginx, syslog, csv, tsv, json 등등이 있다. (https://docs.fluentd.org/parser)

다양한 parse parameter들이 있으므로 다음을 참고하면된다. (https://docs.fluentd.org/configuration/parse-section)

주요 paramter들을 보면 다음과 같다.

types(hash, optional): event의 특정 field의 type을 fluentd의 type으로 변경해준다. hash 타입으로 써주면 된다. {"field1": "type1"}이라고 쓰면 event의 "filed1"을 fluentd의 "type1"로 바꾸어준다.
- string: 문자열로 바꾸어준다.
- bool: "true", "yes", "1"과 같은 문자에 대해서 true로 써주고, 나머지는 false이다.
- float: "7.45"라는 string이 있다면 7.45라는 부동 소수점으로 바꾸어 준다.
- time: field를 EventTime type으로 변경한다. time parameter는 조금 복잡함으로 문서를 참고해야한다.
- array: string field를 Array로 변경한다. 단, delimeter를 3번째 인자로 주어야하는데, 기본적으로 ,이다. 만약 들어온 item_ids field가 Adam|Alice|Bob이면 types item_ids:array:|로 적어야 한다. 결과로 ["Adam", "Alice", "Bob"]가 된다.
time_key(string, optional): event 시간에 대한 time을 지정한다. 만약 event에서 시간이 없다면 현재 시간이 쓰인다.

Buffer Section

fluentd에서는 <buffer> section을 제공하여 event들의 버퍼링을 설정할 수 있다.

@type파라미터는 buffer plugin의 type을 지정한다. 기본적인 type은 memory이고 output plugin 구현에 따라 달라진다. 가령 file output plugin의 기본적인 buffer plugin type은 file이다.

<buffer>
  @type file
  # ...
</buffer>

<match> section안에 <buffer> section이 존재하며, output plugin에서 제공할 때만 사용할 수 있다.

<match tag.*>
  @type file
  # ...
  <buffer>
    # ...
  </buffer>
</match>

<buffer> section의 @type 파라미터를 지정할 수 있다.

<buffer>
  @type file
</buffer>

fluentd는 file과 memory buffer plugin들을 제공하고 있다.
1. file
2. memory

재밌게도 @type은 반드시 설정되어야 하는 파라미터가 아니다. 만약 생략되면 기본적으로 output plugin에서 사용중인 buffer plugin으로 사용된다. 그렇지 않으면 memory buffer plugin이 사용된다.

<buffer> section에 chunk key를 사용해서 event들을 그룹핑해 버퍼링할 수 있다. 만약, 아무것도 안써있다면 모두 하나의 buffer에 써지게 되고, size가 초과해버리면 비워낸다.

<buffer ARGUMENT_CHUNK_KEYS>
  # ...
</buffer>

ARGUMENT_CHUNK_KEYS인 event들만 해당 buffer로 밀어넣는다. ARGUMENT_CHUNK_KEYS가 정의된다면 반드시 comma로 구분된 string이어야 한다.

이렇게 chunk key가 지정되지 않으면 하나의 chunk에 들어간다.

<match tag.**>
  # ...
  <buffer>      # <--- No chunk key specified as argument
    # ...
  </buffer>
</match>

# No chunk keys: All events will be appended into the same chunk.

11:59:30 web.access {"key1":"yay","key2":100}  --|
                                                 |
12:00:01 web.access {"key1":"foo","key2":200}  --|---> CHUNK_A
                                                 |
12:00:25 ssh.login  {"key1":"yay","key2":100}  --|

tag가 chunk key로 쓰여있다면 tag로 chunk가 정해진다. 따라서, event들이 서로 다른 tag들을 가진다면 다른 chunk에 간다는 것이다.

<match tag.**>
  # ...
  <buffer tag>
    # ...
  </buffer>
</match>

# Tag chunk key: The events will be grouped into chunks by tag.

11:59:30 web.access {"key1":"yay","key2":100}  --|
                                                 |---> CHUNK_A
12:00:01 web.access {"key1":"foo","key2":200}  --|

12:00:25 ssh.login  {"key1":"yay","key2":100}  ------> CHUNK_B

위와 같이 web.access tag는 CHUNK_A로 ssh.login는 CHUNK_B로 버퍼링되는 것을 확인할 수 있다.

time을 <buffer>의 chunk key로 쓸 수 있는데 time 파라미터와 timekey가 지정되면, output plugin은 해당 chunk그룹에 event들을 time key마다 써준다. 가령, timekey가 1h이면 1시간마다 chunk에 써진다.

<match tag.**>
  # ...
  <buffer time>
    timekey      1h # chunks per hours ("3600" also available)
    timekey_wait 5m # 5mins delay for flush ("300" also available)
  </buffer>
</match>

# Time chunk key: The events will be grouped by timekey with timekey_wait delay.

11:59:30 web.access {"key1":"yay","key2":100}  ------> CHUNK_A

12:00:01 web.access {"key1":"foo","key2":200}  --|
                                                 |---> CHUNK_B
12:00:25 ssh.login  {"key1":"yay","key2":100}  --|

timekey가 1h이므로 1시간마다 buffer chunk를 만들어낸다. 11:00:00에 CHUNK_A가 만들어지므로 11:59:30에 들어온 event는 CHUNK_A에 들어간다. 12:00:01에 들어온 event는 11:00:00으로부터 1시간이 지났으므로 CHUNK_B에 들어간다. 12:00:25 event도 마찬가지이다.

timekey_wait는 buffer를 비워내고 output에 써주기 전까지 기다리는 delay시간이다. 즉, 위의 예시에서는 1시간이 지나서 CHUNK_B로 이제 event를 담는다고 해도 CHUNK_A를 바로 flush시키지 않는다. timekey_wait 5m이므로 5분간 기다렸다가 flush시킨 후에 들어간다.

 timekey: 3600
 -------------------------------------------------------
 time range for chunk | timekey_wait | actual flush time
  12:00:00 - 12:59:59 |           0s |          13:00:00
  12:00:00 - 12:59:59 |     60s (1m) |          13:01:00
  12:00:00 - 12:59:59 |   600s (10m) |          13:10:00

만약, buffer section에 time, tag와 같은 keyword들이 없다면, 이는 record의 field name이다. output plugin은 event의 field 값을 보고 chunks로 그룹화한다는 것이다.

<match tag.**>
  # ...
  <buffer key1>
    # ...
  </buffer>
</match>

# Chunk keys: The events will be grouped by values of "key1".

11:59:30 web.access {"key1":"yay","key2":100}  --|---> CHUNK_A
                                                 |
12:00:01 web.access {"key1":"foo","key2":200}  --|---> CHUNK_B
                                                 |
12:00:25 ssh.login  {"key1":"yay","key2":100}  --|---> CHUNK_A

더 복잡하게 이중 배열과 같이 nested field에 접근할 수 있는 방법을 제공해준다.

<match tag.**>
  # ...
  <buffer $.nest.field> # access record['nest']['field']
    # ...
  </buffer>
</match>

$.nest.field는 record['nest']['field']를 기준으로 chunk 그룹화를 하겠다는 것이다.

더불어, tag, time과 같은 keyword를 동시에 쓸 수 있다. 가령 timekey가 1h인 경우 다음과 같이 chunk 그룹을 나눌 수 있다.

# <buffer tag,time>

11:58:01 ssh.login  {"key1":"yay","key2":100}  ------> CHUNK_A

11:59:13 web.access {"key1":"yay","key2":100}  --|
                                                 |---> CHUNK_B
11:59:30 web.access {"key1":"yay","key2":100}  --|

12:00:01 web.access {"key1":"foo","key2":200}  ------> CHUNK_C

12:00:25 ssh.login  {"key1":"yay","key2":100}  ------> CHUNK_D

단, 너무 많은 buffer chunk key는 IO성능을 감소시키므로 이를 조절하는 것이 중요하다.

<buffer []>와 같은 문법도 사용할 수 있는데 이렇게 쓰면 key를 empty로 쓰겠다는 것이다. 이는 output plugin에서 default로 쓰고 있는 buffer chunk key를 없애는 좋은 방법이다.

<match tag.**>
  # ...
  <buffer []>
    # ...
  </buffer>
</match>

output plugin에서 extract_placeholder메서드를 configuration값에서 제공을 하면, chunk key를 placeholder를 통해서 추출할 수 있다.

다음의 예시는 chunk key로 tag가 쓰인 경우이고, 이를 output plugin인 file에서 path로 가져오는 것이다. 가령, tag가 log.map이라면 ${tag}로 log.map을 가져올 수 있다.

# chunk_key: tag
# ${tag} will be replaced with actual tag string
<match log.*>
  @type file
  path /data/${tag}/access.log  #=> "/data/log.map/access.log"
  <buffer tag>
    # ...
  </buffer>
</match>

timekey역시도 가져올 수 있다. 이 경우에 buffer section에서 time을 사용하고 있어야 하며, strptime방식으로 사용하면 된다. 다음을 참고하자.

# chunk_key: tag and time
# ${tag[1]} will be replaced with 2nd part of tag ("map" of "log.map"), zero-origin index
# %Y, %m, %d, %H, %M, %S: strptime placeholder are available when "time" chunk key specified

<match log.*>
  @type file
  path /data/${tag[1]}/access.%Y-%m-%d.%H%M.log #=> "/data/map/access.2017-02-28.20:48.log"

  <buffer tag,time>
    timekey 1m
  </buffer>
</match>

access.%Y-%m-%d.%H%M.log가 access.2017-02-28.20:48.log로 바뀌어 설정된다.

다음과 같이 chunk key를 지정한 경우에도 레퍼런싱할 수 있다.

<match log.*>
  @type file
  path /data/${tag}/access.${key1}.log #=> "/data/log.map/access.yay.log"
  <buffer tag,key1>
    # ...
  </buffer>
</match>

buffer section의 chunk key인 key1를 match에서 path의 placeholder로 가져울 수 있다.

또한, chunk_id이라는 것이 있는데, 이는 fluentd내부적으로 정해지는 것이기 때문에 따로 뭔가 설정하거나 지정할 필요는 없다.

<match test.**>
  @type file
  path /path/to/app_${tag}_${chunk_id}
  append true
  <buffer tag>
    flush_interval 5s
  </buffer>
</match>

match의 tag가 test.foo였다면 다음의 결과가 나온다.

# 5b35967b2d6c93cb19735b7f7d19100c is chunk id
/path/to/app_test.foo_5b35967b2d6c93cb19735b7f7d19100c.log

record에 대한 nested filed또한 지원한다.

<match log.*>
  @type file
  path /data/${tag}/access.${$.nest.field}.log #=> "/data/log.map/access.nested_yay.log"
  <buffer tag,$.nest.field> # access record['nest']['field']
    # ...
  </buffer>
</match>

buffer의 $.nest.field는 record의 {"nest": {"field": "nested_yay"}}에서 "nested_yay"를 가져오는 것과 같다.

조심해야 할 것은 buffer에 tag, time, chunk key 3개를 같이 쓴다면 다음의 순서로 쓰는 것을 추천한다.

<buffer> # blank
  # ...
</buffer>

<buffer tag, time, key1> # keys
  # ...
</buffer>

주의: tag, time chunke key들은 예약어로 record filed로 쓸 수 없다. 또한, time을 사용하면 timekey, timekey_wait, timekey_use_utc, timekey_zone을 쓸 수 있다.

이 밖에 buffer의 버퍼링관련 파라미터로 다음의 것들이 있는데, 참고만 하도록 하자.
1. chunk_limit_size: chunk의 max 사이즈를 설정한다.
2. chunk_limit_records: 각 chunk가 저장하는 최대 evetns의 개수
3. total_limit_sze: buffer plugin instance의 최대 제한으로, 저장된 buffer의 최대 크기가 limit에 이르면 더 이상 연산이 동작하지 않고 저장을 멈춘다.
4. queue_limit_length: buffer plugin 인스턴스의 queue사이즈로 default가 nil이다.
5. chunk_full_threshold: chunk size threshold퍼센트에 이르면 flushing을 수행한다.
6. queued_chunks_limit_size: queued chunks의 개수 제한이다.
7. compress: text, gzip만 가능하고 buffer에 record를 넣기전에 압축한다. 기본적으로 text이고, output에 보내기 전에 다시 decompress를 하므로 복원을 걱정할 필요는 없다.

다음은 flushing에 관한 파라미터로 주로 성능 최적화와 관련이 깊다.
1. flush_at_shutdown: shutdown의 상황에서 모든 buffer를 삭제할 지 말지를 결정한다. default는 false이다.
2. flush_mode: default, lazy, interval, immediate로 이루어져 있어, 언제 flush할지를 결정한다.
3. flush_interval: 기본적으로 60s이다.
4. flush_thread_count: chunk를 병렬적으로 flush/write하는 thread의 개수로 default로 1개가 있다.
5. flush_thread_interval: 다음 flush를 위해서 thread가 가디라는 간격을 말한다. default로 1.0s이다.

이 밖에도 retry에 관련된 parameter들이 있으니, 참고하도록 하자. https://docs.fluentd.org/configuration/buffer-section#buffering-parameters

놀고 싶은데, 왜 다들 공부하는거야

R3의 망령

이전 포스트

EFK를 정리하자 3일차 - Fluentd Configuration2

다음 포스트

EFK를 정리하자 4일차 - Fluentd Configuration3

EFK

Common Parameters

Parse section

Buffer Section

EFK를 정리하자 3일차 - Fluentd Configuration2

EFK를 정리하자 5일차 - Fluentd Configuration4

0개의 댓글