Signoz를 통해 AWS CloudWatch 정보 모니터링

Adam·2024년 8월 28일

개발일지

목록 보기

13/15

회사의 AWS위에 설계된 어플리케이션 모니터링을 위해서 Signoz를 사용하기로 했다.
이를 위해서 Signoz와 CloudWatch를 연동하는 작업을 진행해야 했는데, 이 작업을 진행하면서 어려웠던 점과 정리된 사항들을 적어보려고 한다.

SigNoz란?

SigNoz는 오픈 소스 애플리케이션 성능 모니터링(APM) 및 로그 관리 플랫폼입니다. 이를 통해 개발자와 운영팀은 애플리케이션의 성능을 실시간으로 모니터링하고, 문제를 탐지하고 해결할 수 있습니다. SigNoz의 주요 기능은 다음과 같습니다:

분산 추적(Distributed Tracing): SigNoz는 마이크로서비스 아키텍처를 사용하는 애플리케이션에서 요청이 어떻게 흐르는지 추적할 수 있도록 도와줍니다. 이를 통해 특정 서비스에서 발생하는 지연이나 오류의 원인을 빠르게 파악할 수 있습니다.
메트릭(Metrics) 모니터링: 애플리케이션의 성능 지표를 실시간으로 수집하고 분석합니다. CPU 사용량, 메모리 사용량, 요청 속도, 오류율 등 다양한 메트릭을 모니터링할 수 있습니다.
로그 관리(Log Management): 애플리케이션에서 생성된 로그를 수집하고 분석하여 문제의 원인을 파악하고 해결할 수 있도록 합니다. 로그 데이터를 기반으로 다양한 대시보드를 생성하고, 실시간으로 로그를 검색할 수 있습니다.
오픈 텔레메트리(OpenTelemetry) 지원: SigNoz는 OpenTelemetry를 기본적으로 지원하여 다양한 언어와 프레임워크에서 쉽게 통합할 수 있습니다.

CloudWatch Exporter란 무엇인가요?

CloudWatch Exporter는 AWS의 CloudWatch에서 수집한 메트릭 데이터를 Prometheus 형식으로 변환해주는 도구입니다. CloudWatch는 AWS의 모니터링 서비스로, 다양한 AWS 리소스(예: EC2, RDS, ElastiCache 등)와 애플리케이션에서 발생하는 메트릭을 수집하고 모니터링할 수 있도록 도와줍니다. 그러나 Prometheus는 기본적으로 CloudWatch와 직접 연동되지 않기 때문에, CloudWatch Exporter를 사용하여 Prometheus와 연동할 수 있습니다.

CloudWatch Exporter의 주요 기능은 다음과 같습니다:

Prometheus 형식으로 메트릭 변환: CloudWatch에서 수집한 메트릭을 Prometheus에서 이해할 수 있는 형식으로 변환하여 Prometheus 서버에서 수집할 수 있도록 합니다.
다양한 AWS 서비스 지원: CloudWatch Exporter는 EC2, RDS, Lambda, S3, ElastiCache 등 다양한 AWS 서비스의 메트릭을 Prometheus에 전달할 수 있습니다.
설정 가능: 사용자는 어떤 메트릭을 수집할지, 어떤 AWS 리소스를 모니터링할지 설정 파일을 통해 세부적으로 설정할 수 있습니다. 이를 통해 필요한 메트릭만 선택적으로 수집할 수 있습니다.

CloudWatch Exporter을 이용해 Signoz에서 Elasticache 모니터링

https://signoz.io/docs/integrations/aws-elasticache-redis/ 에 올라온 정보대로 수행하려 하였음

하지만 해당 도큐멘테이션에서는 jar파일을 받아서 실행하는 방식이었지만, 현재 사용하고 있는 signoz 인스턴스는 docker 이미지로 실행되고 있었음

CloudWatch Exporter이미지가 존재했음

https://hub.docker.com/r/prom/cloudwatch-exporter/

아래와 같이 docker-compose.yaml의 services 부분에 다음을 추가

services:
  cloudwatch-exporter:
    image: prom/cloudwatch-exporter:latest
    container_name: cloudwatch-exporter
    volumes:
      - ./cloudwatch-exporter-config.yaml:/config/config.yml
      - ~/.aws:/root/.aws:ro
    environment:
      - AWS_REGION=ap-northeast-2
    ports:
      - "9106:9106"
    restart: on-failure

주의점

/config/config.yml 이 경로에 반드시 동일한 경로에 동일한 명의 컨피그 파일이 있어야 한다.
config.yaml로 돼있으면 인식을 하지 못하여 서버가 실행되지 않는 것을 확인했음
aws cli에 설정된 컨피그 파일을 반드시 가져와야 cloudwatch-exporter가 성공적으로 cloudwatch에서 정보를 가져올 수 있음

cloudwatch-exporter-config.yaml은 아래와 같이 signoz 도큐멘테이션에 나온대로 설정하면 된다

---
region: ap-northeast-2
metrics:
 - aws_namespace: AWS/ElastiCache
   aws_metric_name: CPUUtilization
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: FreeableMemory
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: NetworkBytesIn
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Sum, Average]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: NetworkBytesOut
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Sum, Average]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: NetworkPacketsIn
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Sum, Average]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: NetworkPacketsOut
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Sum, Average]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: SwapUsage
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: BytesUsedForCache
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Sum, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: CacheHits
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Sum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: CacheMisses
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Sum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: CacheHitRate
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: CurrConnections
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: CurrItems
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: CurrVolatileItems
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: ReplicationLag
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: ReplicationLag
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: SaveInProgress
   aws_dimensions: [CacheClusterId, CacheNodeId]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: TrafficManagementActive
   aws_dimensions: [CacheClusterId, CacheNodeId]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: DatabaseCapacityUsagePercentage
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: DatabaseMemoryUsagePercentage
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: EngineCPUUtilization
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: Evictions
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Sum, Average]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: GlobalDatastoreReplicationLag
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: MemoryFragmentationRatio
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Average, Maximum]

 - aws_namespace: AWS/ElastiCache
   aws_metric_name: MemoryFragmentationRatio
   aws_dimensions: [CacheClusterId, CacheNodeId]
   aws_statistics: [Sum, Average]

마지막으로 otel-collector-config에서도 cloudwatch-exporter에서 프로메테우스 포멧으로 가져온 정보를 호출해야 하는데 prometheus scrape config에 cloudwatch-exporter가 사용하는 포트인 9106포트에서 가져오게 설정을 해주면 된다

prometheus:
    config:
      global:
        scrape_interval: 60s
      scrape_configs:
        - job_name: 'otel-collector'
          static_configs:
            - targets: ['localhost:8888']
              labels:
                job_name: otel-collector
        - job_name: 'cloudwatch-exporter'
          static_configs:
            - targets: ['cloudwatch-exporter:9106']

설정 파일을 변경 후 컨테이너 실행

docker compose up -d

그 후 아래 커맨드로 정보를 정상적으로 가져오는지 확인하면 된다

curl http://localhost:9106/metrics

Adam

Keep going하는 개발자

이전 포스트

Wiremock 정리

다음 포스트