Prometheus를 배워보자 1일차 - Prometheus 정리

놀고 싶은데, 왜 다들 공부하는거야·2025년 3월 10일

목록 보기

1/1

infra 관리가 다 똑같듯이 그렇게까진 어렵진 않지만, 귀찮고 손이 많이가고 시간도 많이 든다. 그 중 하나가 prometheus이다.

정확한 metric없이 그냥 linux command로 대충 성능 비교한다음에 java로 ~했더니 성능이 향상했어요! 하는 곳이 수두룩하다. 그러나 promethes로 여러 metric 따지고 보면 메모리가 2배 상승했거나, 요청량은 몇 배로 늘어났거나 하는 경우가 대다수이다.

하지만 매니저들은 눈 가리고 아옹(?)하면서 더 윗선에게 보고하고, 자신의 업적이라고 말하면 될 일이라 크게 신경쓰지 않는다. 그러나, prometheus로 여러 metric을 통해 진실(불)을 알아낸 개발자들은 땀을 삐질삐질 흘리며 이걸 어떻게 해결할 지 고민하고, 팀에 진실을 안겨주지만 그 후의 내용은 다 알듯이 제우스(매니저)가 인류(개발자)에게 재앙을 내린다.

prometheus라는 이름 자체를 정말 잘 만들었다. 개발을 하다보면 마주하고 싶지 않은 진실들이 있다. prometheus는 그런 진실과 가까이해주도록 하여 독수리에게 간이 쪼이듯이 세상의 핍박으로부터 고통받을 수 있다. 그러나 불이 또 다른 세상을 열어주었듯이 prometheus도 개발에 있어서 새로운 세상을 열어주므로 꼭 쓰기 바란다.

Prometheus

kubernetes 상에서 기본 적으로 제공하는 metrics들이 존재한다. 이는 kubelet안에 있는 cAvisor로 각 container에 대한 metrics를 제공하고, host인 node에 대한 정보는 node-exporter가 제공한다. cluster에 대한 정보는 kube-apiserver를 통해서 metrics로 제공된다.

참고로 prometheus는 pull 방식이다. 대상의 /metrics 엔드포인트에 HTTP로 접근하여 metrics들을 가져오는 것이다. 물론 metrics을 가져오는 scrap rule에 대해서는 custom하게 정의할 수도 있지만, 관례적으로 /metrics endpoint로 가져온다.

-------------
|cAdvisor   |
|(kubelet)  | ---> container metric 수집 및 공개    
-------------

-----------------
|   /metrics    |
|       ^       | --> node metrics 수집 및 공개
|       |       |
| node-exporter |
-----------------

----------------------
|       /metrics     |
|          ^         |
|          |         |
| kube-state-metrics | --> kubernetes cluster metrics
|          ^         |     수집 및 공개
|          |         |
|   kube-apiserver   |
----------------------

이 3가지 metric 정보들을 기본적으로 kubernetes가 prometheus에 제공해주며 prometheus는 tsdb에 해당 metric 정보들을 저장한다.

graph에서 up을 치면 현재 실행 중인 pod들의 정보를 알려준다.

prometheus 서버로 가서 Status -> Targets로 가면 prometheus target들이 나온다. 이 target들이 metrics 정보를 넘겨주는 대상들이다. configuration에서 job_name에 해당하는 부분들이 target이 되는 것이다.

Status -> Service Discovery로 가면 prometheus에서 target들을 찾기위한 service discovery 대상들을 나타낸다. kubernetes-apiservers (1 / 20 active targets)라고 써있는 부분은 1개의 target에 대해서는 active하여 가져왔지만, 19개의 target은 inactive하다는 것이다.

이 service discovery에 성공한 대상들이 바로 target이 되고, target에서 endpoint를 통해서 metric들을 prometheus가 pull해오는 것이다.

만약 metrics를 가져오는 방식으로 pull이 아니라 push로 app이 prometheus로 제공하고 싶다면 pushgateway를 따로 설치해주면 된다. 대표적으로 일시적으로 실행되는 job들의 경우가 metric 정보들을 pushgateway로 전달하여 pushgateway가 metric을 관리하고 prometheus가 pushgwateway metric을 가져가는 것이다.

개괄적인 그림은 다음과 같다.

  ----------              --------------
  |job, ...|              |alertmanager| --webhook--> slack...
  ----------              --------------
      |                         ^
     push                       |
      |                         |
      v                         v
-------------             ------------
|pushgateway| <----pull---|prometheus|
-------------        |    ------------
                     | 
-----------          |
|app, ... | <--pull--|
-----------

Prometheus metrics 수집

promethues가 metrics를 가져오기 위해서는 service discovery가 동작해야하고, service discovery가 exporter들의 노출된 endpoint들을 pull해서 metric들을 가져오는 것이다.

configuration을 보면 ~sd_config가 바로 service discovery이고, service discovery에 의해서 노출된 대상들이 바로 target이다.

cAdvisor(container metric exporter)

가장 기본적인 metric은 container metrics으로 pod 안에 있는 container의 metric을 말한다. 이는 kubelet에 내장된 cAdvisor에 의해서 노출되게 된다. 이 metric들은 container_*라는 prefix를 붙게된다.

container_network_receive_bytes_total

이는 container가 받은 network byte 양을 나타낸다. 이때 모든 container들이 나올텐데, 특정 container에 대해서 보고 싶다면 다음과 같이 쓰면 된다.

container_network_receive_bytes_total{pod="nginx-0"}

이는 nginx-0`라는 pod에 대해서 contaner의 network 양을 보여준다.

node exporter

container가 배포된 node에 대한 정보는 node-exporter가 해준다. node-exporter는 node(os, hardware)와 관련된 정보를 가져온다.

중요한 것은 node-exporter는 os의 /sys와 /proc을 수집한다는 것이다.

ll /sys/fs/cgroup

sys안에 cgroup에 대한 정보가 있다. 따라서, 할당받은 computing 자원에 대해서 알 수 있는 것이다.

오늘 날에는 /sys보다는 /proc안에 정보들을 놓으므로 os에 대한 정보, cpu에 대한 정보, memory정보 등 주요 정보들이 /proc안에 있다.

node에 대한 metric들은 node_* prefix를 갖는다.

node_memory_Active_bytes

다음의 metrics은 node의 memory 양을 나타낸다.

node_cpu_seconds_total{mode="user"}

위는 user mode의 CPU 사용 총량을 보여준다.

node exporter는 node의 CPU, Memory, Disk, Network의 총량을 보여주기 위해서 사용하는 것이다.

kube-state-metrics

----------------------
|       /metrics     |
|          ^         |
|          |         |
| kube-state-metrics | --> kubernetes cluster metrics
|          ^         |     수집 및 공개
|          |         |
|   kube-apiserver   |
----------------------

kube-state-metrics는 kubernetes의 object에 대한 metrics들을 보여주는데, kubernetes object는 pod, configmap, service 등이 있는데 이 object들은 kube-apiserver로부터 관리를 받고 reconcile을 받는다. 따라서, kube-apiserver에서 수집한 kubernetes object 정보들을 kube-state-metrics에 전달하여 prometheus로 수집되는 것이다.

kube_라는 prefix를 사용한다.

kube_deployment_status_replicas

application 전용 exporter

container metrics는 cAdvisor에 의해서 수집된 metric들이고 application metric은 application에 custom한 metric들을 말하는 것이다.

가령 nginx의 경우는 request 수, 실패한 request 수 등이 있을 것이다.

prometheus에서 탐색, 수집하는 단계에서 해당 application에 대한 탐색, 수집을 설정했다면 kube-apiserver를 통해서 해당 application pod에 대한 경로르 받을 수 있다. prometheus는 받은 application 경로에서 metric을 수집할 수 있는데, 문제는 applicaton이 /metrics라는 endpoint를 열어놓지 않을 수 있다.

그래서, sidecar container를 하나를 더 pod에 올려서 기존 container의 metric들을 받아 prometheus로 보낸다.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        kubectl.kubernetes.io/default-container: nginx
    spec:
      containers:
      - image: nginx:1.21
        name: nginx
...

nginx deployment를 보면 kubectl.kubernetes.io/default-container: nginx가 바로 default container로 설정된 것이다. exporter의 경우는 sidecar이다.

    spec:
      containers:
      - image: nginx:1.21
        name: nginx
        volumeMounts:
        - name: nginx-vol
          mountPath: /etc/nginx/templates
        env:
        - name: PORT
          value: '8080'
      - image: nginx/nginx-prometheus-expoter:0.10.0
        name: expoter
        ports:
        - containerPort: 9113
        args:
        - -nginx.scrape-uri=http://localhost:8080/stub_status
    ...

expoter는 sidecar 형식으로 나와있다. main container인 nginx의 localhost:8080에 open된 stub_status로 부터 data를 가져와 nginx exporter가 prometheus로 보내주는 것이다.

상당히 많은 업체에서 metric exporter들을 제공해준다. 또는 metrics 정보를 exporter없이 자체적으로 제공하는 application들도 있다.

application에서 이미 구현된 metrics

어떻게 application에서 prometheus에 의해 metric들이 scrape되는 것일까?? 이는 kubernetes 상에서 annotation을 설정하면 된다.

application에서 metrics을 7777 port로 prometheus에 scrape를 시키게 하고 싶다면 아래와 같이 annotation을 추가하면 된다.

annotations:
  prometheus.io/port: "7777"
  prometheus.io/scrape: "true"

prometheus configuration을 보면 다음의 job_name이 있다.

- job_name: kubernetes-service-endpoints
  honor_labels: true
  honor_timestamps: true
  track_timestamps_staleness: false
  scrape_interval: 1m
  scrape_timeout: 10s
  scrape_protocols:
  - OpenMetricsText1.0.0
  - OpenMetricsText0.0.1
  - PrometheusText0.0.4
  metrics_path: /metrics
  scheme: http
  enable_compression: true
  follow_redirects: true
  enable_http2: true
  http_headers: null
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
...
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    separator: ;
    regex: (.+?)(?::\d+)?;(\d+)
    target_label: __address__
    replacement: $1:$2
    action: replace

ubernetes-service-endpoints job_name은 하나의 target으로 endpoint들을 scrape하는 규칙을 나열한다. 이때 기본 endpoint 규칙이 /metrics이고 source_labels부분이 바로 pod들의 annotation을 말하는 것이다.

annotation에 적힌 내용을 가져와서 파싱한 다음에 scrape 규칙을 적용하는 것이다.

annotations:
  prometheus.io/port: "7777"
  prometheus.io/scrape: "true"

위의 annotation이 pod나 service에 있다면 해당 IP로 7777 port에 /metrics endpoint로 metrics를 수집하는 것이다. 단, 한 가지 조심해야할 점이 있다면 service에도 해당 annotation을 붙여서 사용이 가능한데, service의 경우 port를 prometheus.io/port와 동일하게 쓰는 것이 아니라, targetPort를 promethues.io/port와 동일하게 써야한다.

참고로 path로 바꿀 수도 있다.

annotations:
  prometheus.io/path: /metrics_path

이렇게 설정하면 /metrics_path가 endpoint가 된다. 가져온다. 즉 io:7777/metrics_path로 custom한 application metric을 가져오는 것이다.

물론 이렇게만 설정한다고 해서 끝나느 것이 아니라, application source code 내부에서 prometheus library를 다운받아 custom metric들을 정의하고 내보내주어야 한다.

PromQL

metric type은 4가지가 있다.

게이지(Gauge): 특정 시점의 값을 표현하기 위해서 사용하는 metric으로 CPU, memory 사용량이 있다.
카운터(Countter): 누적된 값을 표현하기 위해 사용하는 metric이다. 계속 증가하는 구간 별로의 변화율을 파악하기 위함이다. 또한, 급증, 급변하는 시점을 알아차리기 위해서 사용한다.
서머리(Summary): 구간 별로 metric 값의 빈도를 측정하여, 어느 구간에 metric이 많이 있는 지 보여준다. 0~1 사이의 구간이 이미 있어서 어느 곳에서 metric가 가장 많이 나왔는 지 보여준다. 즉, 정규분포와 같은 통계적인 값이다.
히스토그램(Histogram): summary를 만들기 위한 raw 데이터이다. 구간 별로 이미 흩뿌려져 있는 raw data를 나타낸다. 나중에 함수를 사용해서 빈도수를 특정 metric으로 바꿔준다.

1, 2는 많이 사용하지만 3, 4는 많이 사용하지 않는다.

node_memory_Active_bytes는 gauge 값이다.

node_memory_Active_bytes

prometheus_target_interval_length_seconds는 summary type으로 0~1사이의 해당 값의 빈도를 보여준다.

prometheus_target_interval_length_seconds


prometheus_target_interval_length_seconds{instance="localhost:9090", interval="1m0s", job="prometheus", quantile="0.01"}
59.999079346
prometheus_target_interval_length_seconds{instance="localhost:9090", interval="1m0s", job="prometheus", quantile="0.05"}
59.999264412
prometheus_target_interval_length_seconds{instance="localhost:9090", interval="1m0s", job="prometheus", quantile="0.5"}
59.999960363
prometheus_target_interval_length_seconds{instance="localhost:9090", interval="1m0s", job="prometheus", quantile="0.9"}
60.000631964
prometheus_target_interval_length_seconds{instance="localhost:9090", interval="1m0s", job="prometheus", quantile="0.99"}
60.00084696

위의 값을 보면 interval 1ms에 도달하는데 하위 1%인 0.01 수준으로 도달한 것이 '59.999'이고 0.99 수준으로 도달한 것이 60.00084 라는 것이다. SLA, SLO를 설정할 때 사용한다. 즉 대부분의 값이 60.0이 나온다는 것이다.

prometheus_http_request_duration_seconds_bucket처럼 bucket이 붙으면 histogram이다. 단, 자동완성에서는 counter로 나온다.

prometheus_http_request_duration_seconds_bucket


prometheus_http_request_duration_seconds_bucket{handler="/-/healthy", instance="localhost:9090", job="prometheus", le="0.1"}
62919
prometheus_http_request_duration_seconds_bucket{handler="/-/healthy", instance="localhost:9090", job="prometheus", le="0.2"}
62919
prometheus_http_request_duration_seconds_bucket{handler="/-/healthy", instance="localhost:9090", job="prometheus", le="0.4"}
62919
prometheus_http_request_duration_seconds_bucket{handler="/-/healthy", instance="localhost:9090", job="prometheus", le="1"}
62919

이와 같은 histogram을 summary와 같은 빈도 수를 얻고 싶다면, histogram_quantile을 사용한다.

histogram_quantile(0.99, rate(prometheus_http_request_duration_seconds_bucket[5m]))

{handler="/metrics", instance="localhost:9090", job="prometheus"}
0.09900000000000002
{handler="/api/v1/query", instance="localhost:9090", job="prometheus"}
0.09900000000000002

5분 동안 수집된 'prometheus_http_request_duration_seconds_bucket' histogram에서 99%(0.99)백분위 수를 계산한다.

/metrics의 경우는 5분간 수집된 HTTP 요청 중에 99%에 해당하는 처리 시간이 약 0.099초임을 의미한다.

PromQL label matcher

label matcher는 {key=value}를 의미한다.

container_network_receive_bytes_total{pod="nginx-0"}

container_network_receive_bytes_total는 metric이고 {pod="nginx-0"}는 label matcher로 특정 값으로 필터링하는 것이다.

label matcher는 4가지 연산이 가능하다.

{key=value}: key값이 value일 떄
{key!=value}: key값이 value가 아닐 때
{key=~"w.+"}: 정규표현식으로 w로 시작하는 것을 찾아서 출력하는 것이다. 여기에서는 w로 시작하는 값을 모두 찾는다.
{key!~"w.+"}: !~ 정규표현식에 해당하지 않는 것을 출력하는 것이다. 여기에서는 w로 시작하지 않는 모두를 찾는다.

promql의 정규 표현식은 .을 기준으로 쓴다는 점을 기억하자.

여러 조건을 함께 쓸 때는 정규표현식으로 |을 같이 써주면 된다.

container_cpu_user_seconds_total{pod=~"nginx.+|haproxy.+"}

또한, label matcher는 또한, 여러 개를 한번에 쓸 수 있다.

{key=value, key=value}

가령, pod이름이 nginx으로 시작하면서, namespace가 nginx인 것은 다음과 같이 만들 수 있다.

container_cpu_user_seconds_total{pod=~"nginx.+", namespace="nginx"}

이진 연산자(binary operator)

free는 byte단위로 나오기 때문에 읽기가 힘들다.

free
              total        used        free      shared  buff/cache   available
Mem:       32715036     8709372     8880496       92864    15125168    23446204
Swap:             0           0           0

이 수치값을 사람이 보기 좋은 값으로 바꾸고 싶을 수 있다.

free -h
              total        used        free      shared  buff/cache   available
Mem:           31Gi       8.3Gi       8.5Gi        90Mi        14Gi        22Gi
Swap:            0B          0B          0B

이렇게 prometheus metric도 수정이 가능하다.

산술 이진 연산자(Arithmetic binary operators): +, -, *, /, %, ^
비교 이진 연산자(Comparison binary operators): ==, !=, > , <, >= , <=
논리/집합 이진 연산자(Logical/set binray operators): and, or, unless(차집합 - 교집합의 나머지)

node_memory_Active_bytes/1024/1024

이렇게 쓰면 MB단위로 볼 수 있다.

kubernetes에서 restart된 pod들의 갯수를 보기 위해서는 다음과 같다.

kube_pod_container_status_restarts_total

만약 3번 이상 재시작된 pod를 찾고 싶다면 다음과 같이 쓸 수 있다.

kube_pod_container_status_restarts_total > 3

집계 연산자

sum
min
max
avg
group
stddev
stdvar
count
count_values
bottomk
topk
quantile

등이 있는데, 모두가 의미 있게 쓰이는 것은 아니다. topk는 상위 k개, bottomk는 하위 k개를 뽑아준다.

CPU 총 사용량을 알기 위한 node_cpu_seconds_total을 보면 CPU마다의 사용량이 나온다.

node_cpu_seconds_total

이걸 하나로 모아주자. 먼저 가장 높은 3개를 뽑아주도록 하자.

topk(3, node_cpu_seconds_total)

cpu core가 실행된 시간 중에 가장 많이 실행된 시간이 나온다. idle이 나온 경우 오랫동안 실행되지 않았다는 것이다.

그럼 하위 3가지를 뽑아보자

bottomk(3, node_cpu_seconds_total > 0)

다음으로 by를 사용하면 group by를 실행하여 group별로 집계 함수를 실행시킬 수도 있다.

avg(node_cpu_seconds_total{mode="user"}) by (node)

user mode의 cpu에 대해서 node별로 평균 값 avg를 내겠다는 것이다.

by를 이용한 group화도 있지만, group을 풀어내는 without도 있다.

sum(kubelet_http_requests_total) without (instance)

instance를 제외하고 다시 검색해준다.

instance vector와 range vector

metric type과 data type의 차이를 알아보자.

metric type은 위에서 알아본 gauge, counter 등과 같은 것을 말하고, data type은 시간 개념이 들어가있는 data들을 말한다. 이는 prometheus가 시간의 개념을 갖는 tsdb로 이루어져 있기 때문이다.

instant vector: 시점에 대한 metric 값만을 가지는 데이터 타입이다.
range vector: 시간을 구간으로 가지고 표현한 데이터로 instant vector가 모여서 구간으로 표시되는 것이다.
scalar: 간단한 실수 값으로 표현한 데이터 값으로 vector의 값을 변화시킬 때 사용
string: 문자열을 표현하는 데이터 타입이다. 더 이상 사용되지 않는다.

instant vector와 range vector가 많이 사용되는데, scalar 같은 경우는 단독으로는 의미가 없고, vector의 값을 변화시키는 경우에만 사용된다.

instant vector의 경우 node_memory_Active_bytes와 같이 특정 시간의 metric 값을 표현하는 데이터 타입이다.

실용적인 instant vector로 pod가 현재 얼마나 메모리를 차지하고 있는 지 확인하는 promql이 있다.

sum(container_memory_working_set_bytes{pod!=""}) by (pod)

container_memory_working_set_bytes는 container들의 현재 memory 값을 보여주는데, 이를 pod별로 묶어내어 여러 개의 container로 이루어진 한 개의 pod의 memory 양을 sum해주는 것이다. 따라서, pod 한 개에 대한 메모리 총량을 나타내는 것이다.

{pod!=""}가 있는 이유는 현재 node의 memory 총량까지 가져오기 때문이다. pod만을 보고 싶으므로 pod!=""을 사용하는 것이다.

range vector가 시간 범위를 가진다고 했는데, 시간 단위가 ms, s, m, h, d, w, y`를 사용한다.

rate(container_cpu_user_seconds_total[])

[]가 바로 range이다. 그래서 1분마다의 데이터를 보고 싶다면 1m을 넣으면 된다.

아래의 예제는 5분 마다의 container 별 CPU 사용량 통계이다.

rate(container_cpu_user_seconds_total[5m])

정리하자면, instant vector는 metric을 딱 한 시점의 값으로 표현한 것이고, range vector는 metric을 특정 구간의 값으로 뽑아와 표현하는 것이다.

가령 sum과 같은 경우는 딱 시점의 metric의 합계 값이기 때문에 instant vector로 표현되는 것이고, 아래에 나올 rate 함수의 경우는 특정 구간의 metric 값들을 뽑아오고 그 구간 내에서의 metric 변화율을 구하는 것이다.

참고로 instant vector, range vector 모두 gauge와 counter metric를 사용할 수 있다. 단지, 어떤 함수를 쓰냐에 따라 metric의 타입이 달라진다.

instant vector와 range vector를 적절히 잘 조합해서 다음과 같이 사용할 수 있다.

sum(rate(container_cpu_user_seconds_total[5m])) by (pod)

위의 query는 5분 간의 pod별 cpu 사용 변화율을 계산하고, 그 값이 특정 시점에 대한 instant vector이므로 sum을 사용하여 pod별로 합계를 계산한 것이다.

range vector --> rate --> instant vector --> sum 으로 실행된 것이다.

Modifier

tsdb는 시간을 기준으로 검색하는데, 현재 시간을 기준으로 계산하게 된다. 그런데 현재 시간을 기준으로 계산이 아니라, 특정 시간을 기준으로 묻고 싶을 때 modifier를 사용한다.

offset 변경자는 현재 시점을 기준으로 단위 시간 전의 값을 출력해준다.

node_memory_Active_bytes offset 10d

10일 전의 기준으로 계산한다.

@ 변경자는 unix 시간으로 표시되는 특정 시점의 값을 출력해준다.

node_memory_Active_bytes @1662529236

유용한 함수

모든 함수를 사용하는 것은 아니고 유용한 함수들을 찾아서 쓰면 된다.

histogram_quantile: histogram 데이터를 summary로 변경
irate: 순간 변화율
rate: 변화율
predict_linear

rate는 변화율로 구간의 시간과 끝의 편차를 계산하여 출력해준다. range vector와 같이 쓰는데, 구간의 변화율을 보는 것이다.

가령 15초간 instant vector를 얻어왔다고 하자.

offset | 2m | 1m 45s | 1m 30s | 1m 15s | 1m | 45s | 30s | 15s | 0 |
metric | 8  |   11   |   12   |   13   | 14 | 15  | 16  | 20  | 20|

위는 2분간의 range vector이다. rate는 변화율로 끝과 끝을 계산하여 출력해준다. 20-8 = 12가 나오는 것이다.

irate는 구간 종료 바로 전과 구간 종료 값을 계산하여 출력해 준다.

offset | 2m | 1m 45s | 1m 30s | 1m 15s | 1m | 45s | 30s | 15s | 0 |
metric | 8  |   11   |   12   |   13   | 14 | 15  | 16  | 20  | 20|

위는 2분간의 range vector인데, irate는 순간 변화율로 맨 앞과 그 다음의 차이이다. 20 - 20 = 0이 되는 것이다.

순간적인 변화가 심한 값은 irate를 사용하고, 변화의 진폭이 장기간 봐야한다면 rate를 봐야한다.

predict_linear는 과거의 추세선을 기반으로 이후의 값들을 예측해주는 것이다. 단, 머신러닝처럼 정확하진 않다.

offset | 2m | 1m 45s | 1m 30s | 1m 15s | 1m | 45s | 30s | 15s | 0 | ~ | +1h | ~ | +2h |
metric | 10 |   11   |   12   |   13   | 14 | 15  | 16  | 20  | 25| ~ |  ~  | ~ | predict |

가령 다음의 경우는 system mode인 cpu의 5분간의 사용량 변화율을 볼 수 있다.

rate(node_cpu_seconds_total{mode="system"}[5m])

이 값들을 사용해서 평균도 구할 수 있다.

avg(rate(node_cpu_seconds_total{mode="system"}[5m]))

모든 CPU의 system mode일 때의 5분 간의 평균 변화율을 구할 수 있다.

다음은 predict_linear이다.

predict_linear(node_memory_Active_bytes[5m], 60*60*2)

두번째 인자로 언제의 선형 예상 값을 뽑을 것이냐이다. 초 단위이기 때문에 60*60*2는 2시간을 말한다.

prometheus 설정 변경 및 유용한 기능 추가

구성된 prometheus 설정 그대로 사용하지 않고, 어느정도 커스텀 할 수 있다.

수집 주기 설정
현재 배포된 kubernentes cluster내의 metric만 수집하지 않고 외부의 데이터도 수집
복잡한 PromQL을 단순 구문으로 설정 가능
alert 경보 설정

이미 동작하고 있는 prometheus의 설정을 변경 할 수 있는가? 가능하다. configmap으로 설정이 있는데, scrape_timeout등을 수정하면 된다.

kubectl edit -n monitoring configmaps prometheus-server

위의 명령어를 실행하여 아래와 같이 바꾸면 된다.

apiVersion: v1
data:
  alerting_rules.yml: |
    {}
  alerts: |
    {}
  allow-snippet-annotations: "false"
  prometheus.yml: |
    global:
      evaluation_interval: 15s
      scrape_interval: 15s
      scrape_timeout: 10s
...

graph로 가서 Status -> Configuration으로 가면 scrape_interval이 바뀐 것을 볼 수 있다.

참고로 edit 이외에 kubectl patch와 kubectl replace가 있다. kubectl patch는 일부만 바꾸는 것이고, kubectl replace는 통으로 바꾸는 것이다. 즉, patch의 경우는 kubernetes yaml 파일을 일부만 가져오는 것이다. kubectl relpace는 전체 파일 정보가 있어야 한다.

어떻게 ConfigMap이 자동 적용되는것일까?? 이는 configmap-reload라는 image가 자동으로 적용되어 있기 때문이다.

job을 추가할 수도 있는데, 만약 외부의 server에 대해서 metric을 수집하고 싶다면 다음과 같이 할 수 있다.

    - job_name: harbor
      metrics_path: /metrics
      static_configs:
      - targets:
        - 192.168.1.64:9090

192.168.1.64:9090/metrics를 통해서 metric을 수집하게 된다. 이렇게 static하게 ip를 설정하지 않고, rule을 설정할 수도 있는데 최대한 간단하게 가져가는 것이 좋다.

Recording rule

복잡한 promql을 recording_rule로 만들어서 간단하게 가져올 수 있다.

data:
  recording_rules.yaml: |
    groups:
      - name: prometheus-recording.rules
        interval: 10s
        rules:
          - record: container:memory_working_set:topk3
            expr: topk(3, sum(container_memory_working_set_bytes{pod!=""}/1024/1024 by (pod))

위의 recording_rule은

topk(3, sum(container_memory_working_set_bytes{pod!=""}/1024/1204 by (pod))

위의 복잡한 promql을 아래와 같이 간단한 record로 치환해주는 것이다.

container:memory_working_set:topk3

그런데 이름을 잘 보면 rule이 한 가지 있다.

recording_rule은 다음의 규칙에 따라 record를 만들어줘야하는 관례가 있다.

level:metric:operations

level: 집계하는 수준을 의미(pod, container, cluster, node)
metrics: 수집하는 metric을 대표하는 이름으로 prefix나 postfix를 빼고 중간의 이름을 보통적는다.
operations: 실제로 수행되는 작업 내용을 말하는 것이다.

위의 경우 topk로 상위 메모리 사용량 container 3개를 가져오므로 topk3이 operations가 된다.

kubectl edit -n monitoring configmaps prometheus-server으로 recording_rules를 추가해주도록 하자. 참고로 graph의 configuration에서는 안보이고 Rules에 보인다.

추가적으로 다른 record도 추가해보도록 하자.

data:
  recording_rules.yaml: |
    groups:
      - name: prometheus-recording.rules
        interval: 10s
        rules:
          - record: node:node_memory:usage
            expr: |-
              100 - 100*((node_memory_MemTotal_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes - node_memory_SReclaimable_bytes) / node_memory_MemTotal_bytes)
          - record: container:memory_working_set:topk3
            expr: topk(3, sum(container_memory_working_set_bytes{pod!=""}/1024/1024) by (pod))

참고로 node_memory_MemTotal_bytes는 총 메모리 용량이고, 여기서 사용된 메모리들을 빼고 백분율을 구한 것이다. 이 백분율은 남은 메모리 이므로 100을 빼주면 사용한 양에 대한 퍼센트가 나오는 것이다.

node:node_memory:usage를 통해서 memory 사용량을 검색할 수 있는 것이다.

Alert

alertmanager는 prometheus 서버에서의 metric을 기반으로 alert rule에 따른 alert를 전송해준다. alert를 전송할 때는 webhook을 통해서 특정 url로 alert를 전송시켜줄 수 있다.

--------------
|AlertManager| (metric 기반) ---> web url hook을 통해 alert 전송
--------------
      ^
      | (sync)
      v
------------
|Prometheus|
------------

동작 방식을 간단하게 정리하면 다음과 같다.
1. prometheus가 metrics을 수집
2. prometheus에 해당 metric에 대한 alert rule을 추가
3. alert 발생 시 alertmanager에게 전송
4. alertmanager가 alert들을 관리하며 다른 곳으로 라우팅 가능

alert rule을 만들기 전에 metric을 생성해주는 nginx을 만들어보도록 하자.

nginx의 경우 기본적으로 nginx 자체 metric이 있는데, stub_status라는 plugin을 on해주어야 한다.

nginx.conf

http 
  ...
  server {
    listen 80;
    server_name metrics;

    location /metrics {
      stub_status on;
      allow all;
    }
  }
}

metrics 경로를 열어두고 stub_status를 on으로 설정해주어야 한다. 또한, allow를 all로 설정하여 들어오는 모든 요청을 받도록 한다.

해당 url로 요청을 보내보면 다음의 응답이 온다.

curl localhost/metrics
Active connections: 1
server accepts handled requests
 82 82 51

그러나 해당 metric 모양은 prometheus metric spec을 따르는 것이 아니기 때문에, nginx의 metric을 변환해주는 exporter를 달아주어야 한다. 그것이 바로 nginx-expoter이다. kubernetes의 경우 다음과 같이 sidecar 형식으로 달아주면 된다.

apiVersion: apps/v1
kind: Deployment
...
spec:
...
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx-metric-exporter 
        image: nginx/nginx-prometheus-exporter:0.10.0
        args:
        - '-nginx.scrape-uri=http://localhost/metrics'
        ports:
        - containerPort: 9113
          protocol: TCP
      - name: nginx 
        image: nginx:latest
      ...

이렇게 두면 -nginx.scrape-uri에 설정된 경로로 nginx metrics를 받아와 nginx exporter가 prometheus metric형식으로 바꿔준다. 또한, nginx expoter를 통해서 prometheus에 metric이 전달된다. 9113이 default로 열리는 nginx exporter의 metric url인 것이다.

마지막으로 service를 만들면 된다.

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/port: "9113"
    prometheus.io/scrape: "true"
  labels:
    app: nginx-metric-service
spec:
  type: ClusterIP
  ports:
  - name: metrics
    port: 9113
    protocol: TCP
    targetPort: 9113
  selector:
    app: nginx

위 service의 9113 port로 /metrics 경로를 통해 prometheus가 metric을 pull하게 된다.

이제 준비가 완료되었다. alert rule을 추가하기 위해서는 prometheus configmap에 alerting_rules.yaml에 정의하면 된다.

apiVersion: v1
data:
  alerting_rules.yml: |
    {}

alerting_rules.yml 부분에 rule을 적어두면 된다.

apiVersion: v1
data:
  alerting_rules.yml: |
    groups:
    - name: nginx-status.alert
      rules:
      - alert: '[P2] NginxDown'
        for: 30s
        annotations:
          title: 'Nginx pod down unexpectedly'
          description: 'nginx가 비정상 종료됨, 빠른 조치 필요!'
          summary: '[P2, warnning!]: Nginx pod has been shutdown unexpectedly'
        expr: |
          (sum(nginx_up) OR vector(0)) == 0

groups로 alert들의 group을 만들고, rules 아래에 하나씩 alert들을 써주면 된다. expr에 promql을 사용하여 alert가 발생할 조건을 써주면 된다.

prometheus graph로 가서 alert 부분을 확인하면 우리가 추가한 alert가 온 것을 볼 수 있다.

[P2] NginxDown (0 active)
name: [P2] NginxDown
expr: (sum(nginx_up) or vector(0)) == 0
for: 30s
annotations:
description: nginx가 비정상 종료됨, 빠른 조치 필요!
summary: [P2, warnning!]: Nginx pod has been shutdown unexpectedly
title: Nginx pod down unexpectedly

초록색으로 alert가 아직 안 발생했다고 나오는데, 강제로 alert를 발생시키면 빨간색으로 나오는 것을 볼 수 있다.

kubectl scale deployment nginx --replicas=0

이후 시간이 지나다보면 pending으로 빠지게되고 pending에서 Firing으로 넘어가게 된다.

다시 replicas 갯수를 1개 이상으로 올리게 되면 해당 alert가 해제된다.

kubectl scale deployment nginx --replicas=1

놀고 싶은데, 왜 다들 공부하는거야

R3의 망령