Kubernetes Operator를 만들어보자 4일차 - advance한 기능

놀고 싶은데, 왜 다들 공부하는거야·2024년 10월 28일

Kubernetes Operator

목록 보기

4/5

operator 개발 - advance한 기능

이제 단순 설치, 배포의 기능을 넘어 향상된 기능을 제공하는 operator를 만들어보도록 하자.

status conditions

status condition은 operator의 정상성을 관리자에게 효율적으로 인간이 읽을 수 있는 형식으로 제공하는 하나의 방법이다. CRD를 통해서 statud condition을 제공함으로서 error log보다 더 정확하고 효율적으로 상태 정보를 제공할 수 있도록 만들어보자.

status condition은 kubernetes API 표준화에 의해서 이미 많은 type들이 정립되어있다. https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1623-standardize-conditions

해당 spec을 지킨 kubernetes API부분이 k8s.io/apimachinery/pkg/api/meta 모듈이다. 이를 사용하여 condition type들을 쉽게 정의하고 사용할 수 있다. operator framework도 해당 모듈의 type을 이용하여 condition들을 구현하였다. 이는 Operator SDK와 Operator Lifecycle Manager(OLM) 둘 다 이다. 따라서, condition은 Operator의 custom resource로 설정되거나 OLM이 만들어주는 추가적인 OperatorCondition resource로 설정된다. 우리는 Operator SDK로 설정하는 방법만을 보도록 하자. OLM을 통해서 condition을 관리, 설정하는 방법은 개인적으로 찾아보길 바란다. ~~필자는 OLM을 선호하지 않기 때문이다.~~

Operator CRD conditions

이전에 말했듯이 operator는 spec와 status field를 구현해야하는데, spec은 operator의 parameter로 input을 받는 data들을 설정했었다. 아직 status는 설정하지 않았는데, 여기에 새로운 field들을 정의하여 개발해보도록 하자.

api/v1alpha1/nginxoperator_types.go

// NginxOperatorStatus defines the observed state of NginxOperator
type NginxOperatorStatus struct {
	Conditions []metav1.Condition `json:"conditions"`
}

이 다음 make generate를 실행하여 client code들을 생성하고, make manifests를 실행하여 operator의 CRD에 새로운 field를 추가해주도록 한다. 또는 이 둘을 모두 실행하는 make을 실행하도록 하자.

config/crd/bases/operator.example.com_nginxoperators.yaml

            properties:
              conditions:
                items:
                  description: "Condition contains details for one aspect of the current
                    state of this API Resource. --- This struct is intended for direct
                    use as an array at the field path .status.conditions.  For example,
                    \n type FooStatus struct{ // Represents the observations of a
                    foo's current state. // Known .status.conditions.type are: \"Available\",
                    \"Progressing\", and \"Degraded\" // +patchMergeKey=type // +patchStrategy=merge
                    // +listType=map // +listMapKey=type Conditions []metav1.Condition
                    `json:\"conditions,omitempty\" patchStrategy:\"merge\" patchMergeKey:\"type\"
                    protobuf:\"bytes,1,rep,name=conditions\"` \n // other fields }"
                  properties:
                    lastTransitionTime:
                      description: lastTransitionTime is the last time the condition
                        transitioned from one status to another. This should be when
                        the underlying condition changed.  If that is not known, then
                        using the time when the API field changed is acceptable.
                      format: date-time
                      type: string
                    message:
                      description: message is a human readable message indicating
                        details about the transition. This may be an empty string.
                      maxLength: 32768
                      type: string
                    observedGeneration:
                      description: observedGeneration represents the .metadata.generation
                        that the condition was set based upon. For instance, if .metadata.generation
                        is currently 12, but the .status.conditions[x].observedGeneration
                        is 9, the condition is out of date with respect to the current
                        state of the instance.
                      format: int64
                      minimum: 0
                      type: integer
                    reason:
                      description: reason contains a programmatic identifier indicating
                        the reason for the condition's last transition. Producers
                        of specific condition types may define expected values and
                        meanings for this field, and whether the values are considered
                        a guaranteed API. The value should be a CamelCase string.
                        This field may not be empty.
                      maxLength: 1024
                      minLength: 1
                      pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
                      type: string
                    status:
                      description: status of the condition, one of True, False, Unknown.
                      enum:
                      - "True"
                      - "False"
                      - Unknown
                      type: string
                    type:
                      description: type of condition in CamelCase or in foo.example.com/CamelCase.
                        --- Many .condition.type values are consistent across resources
                        like Available, but because arbitrary conditions can be useful
                        (see .node.status.conditions), the ability to deconflict is
                        important. The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt)
                      maxLength: 316
                      pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
                      type: string
                  required:
                  - lastTransitionTime
                  - message
                  - reason
                  - status
                  - type
                  type: object
                type: array
            required:
            - conditions
            type: object

kubernetes API에서 정의한 condition type을 사용하였기 때문에 여러 validation 요구조건들이 임베딩된 것을 볼 수 있다.

이제 Operator의 CRD는 최신 status 조건들을 보고할 field를 가지게되었고, code를 구현함으로서 해당 status condition을 설정할 수 있게 되었다. k8s.io/apimachinery/pkg/api/meta 모듈에서의 SetStatusCondition() helper 함수를 사용하여 status condition을 설정할 수 있다. 우리의 경우는 OperatorDegraded라는 condition을 추가하여 False이면 정상적으로 reconciling이 동작하여 변화가 적용되었다는 것을 알려주고, True이면 operator가 error를 만났다는 것을 알려줄 것이다.

nginx-operator/internal/controller/nginxoperator_controller.go

package controller

import (
	...
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	utilerrors "k8s.io/apimachinery/pkg/util/errors"
	...
)

func (r *NginxOperatorReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	logger := log.FromContext(ctx)
	operatorCR := &operatorv1alpha1.NginxOperator{}

	err := r.Get(ctx, req.NamespacedName, operatorCR)
	if err != nil && errors.IsNotFound(err) {
		logger.Info("Operator resource object not found.")
		return ctrl.Result{}, nil
	} else if err != nil {
		logger.Error(err, "Error getting operator resource object")

		meta.SetStatusCondition(&operatorCR.Status.Conditions, metav1.Condition{
			Type:               "OperatorDegraded",
			Status:             metav1.ConditionTrue,
			Reason:             "OperatorResourceNotAvailable",
			LastTransitionTime: metav1.NewTime(time.Now()),
			Message:            fmt.Sprintf("unable to get operator custom resource: %s", err.Error()),
		})

		return ctrl.Result{}, utilerrors.NewAggregate([]error{err, r.Status().Update(ctx, operatorCR)})
	}
    ...
}

위의 code는 만약 operator을 가져오는 과정에서 error가 발생하는 경우, operator의 status condition을 OperatorDegraded로 변경하고 update하는 code이다. 참고로 update하는 부분은 utilerrors.NewAggregate에 의해서 차례대로 실행되어 error가 연결된다고 보면 된다.

다음으로 deployment를 가져올 때 error가 발생한 경우의 처리이다.

...
func (r *NginxOperatorReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    ...
	deployment := &appsv1.Deployment{}
	create := false
	err = r.Get(ctx, req.NamespacedName, deployment)
	if err != nil && errors.IsNotFound(err) {
		create = true
		deployment = assets.GetDeploymentFromFile("manifests/nginx_deployment.yaml")
	} else if err != nil {
		logger.Error(err, "Error getting existing Nginx deployment.")

		meta.SetStatusCondition(&operatorCR.Status.Conditions, metav1.Condition{
			Type:               "OperatorDegraded",
			Status:             metav1.ConditionTrue,
			Reason:             "OperandDeploymentNotAvailable",
			LastTransitionTime: metav1.NewTime(time.Now()),
			Message:            fmt.Sprintf("unable to get operand deployment: %s", err.Error()),
		})

		return ctrl.Result{}, utilerrors.NewAggregate([]error{err, r.Status().Update(ctx, operatorCR)})
	}
    ...
}

두 code가 비슷해보이지만 Message가 정확히 다르다. 첫번째 것은 operator의 custom resource를 가져오는 중에 error가 발생한 것이고, 두번째는 deployment를 가져오는 중에 error가 발생했으므로 error Message를 다르게 정의한 것이다.

...
func (r *NginxOperatorReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    ...
	deployment.Namespace = req.Namespace
	deployment.Name = req.Name
	if operatorCR.Spec.Replicas != nil {
		deployment.Spec.Replicas = operatorCR.Spec.Replicas
	}
	if operatorCR.Spec.Port != nil {
		deployment.Spec.Template.Spec.Containers[0].Ports[0].ContainerPort = *operatorCR.Spec.Port
	}
	ctrl.SetControllerReference(operatorCR, deployment, r.Scheme)

	if create {
		err = r.Create(ctx, deployment)
	} else {
		err = r.Update(ctx, deployment)
	}

    if err != nil {
		meta.SetStatusCondition(&operatorCR.Status.Conditions, metav1.Condition{
			Type:               "OperatorDegraded",
			Status:             metav1.ConditionTrue,
			Reason:             "OperandDeploymentFailed",
			LastTransitionTime: metav1.NewTime(time.Now()),
			Message:            fmt.Sprintf("unable to update operand deployment: %s", err.Error()),
		})
		return ctrl.Result{}, utilerrors.NewAggregate([]error{err, r.Status().Update(ctx, operatorCR)})
	}

	meta.SetStatusCondition(&operatorCR.Status.Conditions, metav1.Condition{
		Type:               "OperatorDegraded",
		Status:             metav1.ConditionFalse,
		Reason:             "OperatorSucceded",
		LastTransitionTime: metav1.NewTime(time.Now()),
		Message:            "operator successfully reconciling",
	})

	return ctrl.Result{}, utilerrors.NewAggregate([]error{err, r.Status().Update(ctx, operatorCR)})
}

만약 deployment 생성 및 수정에 있어서 error가 발생한다면, OperandDeploymentFailed로 Reason을 설정하고 OperatorDegraded를 True로 둔다.

deployment 생성 또는 수정에 성공한다면 OperatorDegraded가 성공하였으니 OperatorSucceded로 설정하도록 한다.

각각의 Reason들은 주요한 정보들을 담고 있으므로, 재사용성을 위해서 다른 곳에 따로 정리하는 것이 좋다. 우리의 경우 api/v1alpha1/nginxoperator_types.go에 다음과 같이 정리하도록 하자.

api/v1alpha1/nginxoperator_types.go

const (
	ReasonCRNotAvailable          = "OperatorResourceNotAvailable"
	ReasonDeploymentNotAvailable  = "OperandDeploymentNotAvailable"
	ReasonOperandDeploymentFailed = "OperandDeploymentFailed"
	ReasonSucceeded               = "OperatorSucceeded"
)

해당 const value들을 nginxoperator_controller.go에서 가져다 쓰면 된다.

sudo make docker-build
sudo make deploy
sudo kubectl create -f ./config/samples/operator_v1alpha1_nginxoperator.yaml

kubectl get po -n nginx-operator-system
NAME                                                 READY   STATUS    RESTARTS   AGE
nginx-operator-controller-manager-69b8fccc98-stbrr   2/2     Running   0          41m
nginxoperator-sample-6899cc8684-4h4pg                1/1     Running   0          41m
nginxoperator-sample-6899cc8684-jbtvb                1/1     Running   0          41m
nginxoperator-sample-6899cc8684-x6sm9                1/1     Running   0          41m

이제 operator를 deploy해보고, status를 확인해보도록 하자.

kubectl describe -n nginx-operator-system nginxoperators.operator.example.com nginxoperator-sample

Name:         nginxoperator-sample
Namespace:    nginx-operator-system
...
Spec:
  Port:      8082
  Replicas:  3
Status:
  Conditions:
    Last Transition Time:  2024-07-15T07:17:06Z
    Message:               operator successfully reconciling
    Reason:                OperatorSucceded
    Status:                False
    Type:                  OperatorDegraded
Events:                    <none>

Status에 Conditions로 OperatorSucceced가 설정된 것을 알 수 있다.

Metrics reporting 구현하기

metrics은 cluster에 대한 측정 가능한 data를 통해 insight를 제공한다. 이미 kube-scheduler, kube-controller-manager의 경우 schedule_attempts_total이라는 metrics를 제공하여 Node에 실행된 스케줄링 횟수를 알도록 제공해준다.

metrics의 경우 두 가지로 나눌 수 있는데, 하나는 개별적인 service의 logic에 대한 metric인 service metrics 다른 하나는, 모든 component들이 갖고 있는 metrics들인 core metrics가 있다. service metrics의 경우는 위에서 kube-scheduler가 metrics로 schedule_attempts_total을 갖고 있었던 것처럼 각 개별 component의 service에 집중하는 것이고, core metrics는 cpu, memory 등의 자원에 대해서 각 component들에 대한 resource를 표현한다. 이러한 core metrics들은 kubernetes의 metrics-server application에 의해서 scrape되어 전달된다.

operator의 경우 operand에 대해서 service metrics를 제공하여 더 많은 insight를 제공할 수 있다. 기본적으로 operator-sdk는 /metrics path에 8080으로 이미 metrics에 관한 endpoint를 뚫어놓았고 handler를 등록해놓았다. 사용자는 metrics를 등록하는 code로 metrics를 올려주기만 하면 되는 것이다. 이는 prmetheus에 metrics를 올리는 방식과 동일하다.

https://prometheus.io/docs/guides/go-application/#adding-your-own-metrics

사실 내장된 metrics handlers는 Kubebuilder가 sigs.k8s.io/controller-runtime사용하여 제공하는 것이다. sigs.k8s.io/controller-runtime는 operator code를 통해서 쉽게 새로운 metrics를 등록하고 수정할 수 있도록 해주는 장점이 있다. https://book.kubebuilder.io/reference/metrics.html

controller-rumtime library는 이미 operator에게 다음의 metrics를 제공해준다. 이는 controller_runtime_이라는 접두사를 갖는다.

controller_runtime_reconcile_errors_total: error없이 성공한 Reconcile 함수의 실행 횟수
controller_runtime_reconcile_time_seconds_bucket: 개별 reconciliation 시도에 대한 latency를 히스토그램으로 보여준다.
controller_runtime_reconcile_tital: Reconcile()함수를 실행한 횟수

우리의 경우 controller_runtime_reconcile_total를 수정하여 cluster에서 operator가 operand state를 reconcile 시도한 횟수로 바꾸도록 하자.

RED metrics

operator SDK에서는 RED method라는 방식으로 각 service에 대해서 어떤 metrics를 제공할 지에 대한 insight를 제공한다.

Rate: 초당 metrics 데이터를 보여준다. 가령 초당 몇 번 요청이 왔는 지 같은 것들이 있다.
Errors: error가 발생하여 실패한 횟수를 보여준다.
Duration(latency): operator가 어떤 작업을 하는데 걸린 시간을 보여준다. controller_runtime_reconcile_time_seconds같은 경우 reconcile이 동작하는데 몇 분의 시간이 걸렸는 지 보여준다.

operator에 custom metric를 추가하기 위해서, metrics 정의를 정립해놓는 것이 좋다. 우리의 경우는 controllers/metrics/metrics.go file에 metrics 정의를 정립하여 놓도록 하자. 이 모듈을 통해서 새로운 custom metrics에 대한 정의를 놓고, sigs.k8s.io/controller-runtime/pkg/metrics의 global registry에 등록해놓도록 하자.

controllers/metrics/metrics.go

package metrics

import (
	"github.com/prometheus/client_golang/prometheus"
	"sigs.k8s.io/controller-runtime/pkg/metrics"
)

var (
	ReconcilesTotal = prometheus.NewCounter(
		prometheus.CounterOpts{
			Name: "reconciles_total",
			Help: "Number of total reconciliation attempts",
		},
	)
)

func init() {
	metrics.Registry.MustRegister(ReconcilesTotal)
}

reconciles_total metrics를 만들었다. metrics에 대한 구현체는 prometheus에서 정의한 구현체를 사용하기 때문에, 다른 opensource들과 호환이 가능하다.

실제 개발환경에서 metrics를 만들 때는 metric를 만드는 주체(component)에 대해서 prefix로 두는 것이 좋다. 가령 operator이므로 operator_reconciles_total이렇게 네이밍을 붙이는 것이 좋다. 다음은 prometheus의 naming rule이다. https://prometheus.io/docs/practices/naming/

이제 operator에 reconcile에 metrics의 숫자를 늘려주도록 하자.

controllers/nginxoperator_controller.go

package controller

import (
	...
	"github.com/example/nginx-operator/internal/controller/metrics"
  ...
)

...

func (r *NginxOperatorReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	metrics.ReconcilesTotal.Inc()
  ...
}

metrics.ReconcilesTotal.Inc()를 실행하면 reconciles_total metric이 하나 증가하게 된다. prometheus의 job를 만들어줘 연결시키기만 하면 된다. prometheus를 올리고 확인하는 부분은 다음 chapter에서 더 자세히 확인해보도록 하자.

Leader 선출 구현

분산 처리 시스템에서 HA(high availability)는 하나의 workload에 대한 여러 replica들을 만들어 보장할 수 있다. 이때 HA시스템을 제공하기 위해서 leader 선출 알고리즘을 사용하는데, leader가 선출되면 leader가 주된 일을 맡고, 나머지들은 별다른 일을 하지 않는다. 만약 leader가 실행 불가능하게되거나, 더 이상 요청을 받지못하는 상태가 되면 leader를 다른 replica에 전달해주어, 업무를 분담하는 것이다. operator 역시도 HA를 위해 여러 개를 올리고 leader선출을 통해서 서비스를 제공할 수 있다.

즉, 적절한 leader 선출은 failover를 관리할 수 있고, application의 지속적인 접근을 가능하게 해준다.

Operator SDK는 leader 선출을 간단한 방식으로 제공한다. boilerplate code를 보면 --leader-elect라는 옵션이 있고, default로 false설정이 되어있어 leader 선출 알고리즘이 꺼져 있는 상태이다. 이flag`는 LeaderElecion 옵션에 제공되어 operator의 control manager에 설정된다.

아래는 enableLeaderElection을 사용하여 leader 선출을 허용하는 것이다.

cmd/main.go

func main() {
  var metricsAddr string
  var enableLeaderElection bool
  var probeAddr string
  var secureMetrics bool
  var enableHTTP2 bool
  flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
  flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
  flag.BoolVar(&enableLeaderElection, "leader-elect", false,
    "Enable leader election for controller manager. "+
      "Enabling this will ensure there is only one active controller manager.")

  ...

  mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
	...
		WebhookServer:          webhookServer,
		HealthProbeBindAddress: probeAddr,
		LeaderElection:         enableLeaderElection,
		LeaderElectionID:       "a7e021da.example.com",
	})
	if err != nil {
		setupLog.Error(err, "unable to start manager")
		os.Exit(1)
	}
}

LeaderElectionID operator가 leader 선출에 사용할 lock을 만들기 위해서 식별자로 사용하는 resource name이다. 위의 leader 선출 알고리즘을 leader-with-lease라고 한다. namespace에서도 영향을 받으므로, namespace가 있는 operator의 경우 LeaderElectionNamespace을 써주어야 한다.

leader 선출 알고리즘은 두 가지가 있는데, 하나는 leader-with-lease이고 하나는 leader-for-life이다. 위의 경우는 leader-with-lease이다. https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#leader-election

Leader-with-lease: default leader 선출 알고리즘으로, 현재 leader는 자신의 status를 leader로 계속 갱신해놓는데, 만약 이 작업이 실패하면 leader 자격을 포기하고 새로운 replica에 leader를 부여한다. 이는 빠르게 동작한다는 장점이 있지만, split-brain문제를 발생시킬 수 있다. 즉, 여러 replica들이 자신이 leader라고 믿게 될 수도 있다는 것이다.
Leader-for-life: leader로 임명된 operator pod가 삭제되면 leader임을 포기하는 알고리즘으로, 삭제된 operator pod의 lock resource가 garbage collection에 의해 삭제된다. 이렇게 함으로서 replica들이 leader에 대한 경쟁 가능성을 삭제할 수 있다. 단, 이 경우는 leader 선출에 있어서 지연이 발생할 수 있다. 가령, pod가 응답하지 않거나 분할된 node에 있는 경우 pod-eviction-timeout은 5m정도 걸리기 때문에 5m간 leader가 없을 수 있다.

다음은 leader-for-lie 방식의 구현이다.

import (
  ...
  "github.com/operator-framework/operator-sdk/pkg/leader"
)

func main() {
  ...
  err = leader.Become(context.TODO(), "nginx-lock")
  if err != nil {
    log.Error(err, "Failed to retry for leader lock")
    os.Exit(1)
  }
  ...
}

leader.Become 호출 시 "nginx-lock"라는 ConfigMap을 생성하여 lock을 만들고, leader로서 operator가 실행된다. 다른 operator의 경우 nginx-lock ConfigMap이 이미 존재하기 때문에 leader로 선정되지 못하고, Become에서 blocking된다.

leader operator가 종료되면 garbage collection에 의해서 "nginx-lock" ConfigMap이 삭제되고 다른 operator가 key를 가져가 실행 leader로서 실행된다.

놀고 싶은데, 왜 다들 공부하는거야

R3의 망령

이전 포스트

Kubernetes Operator를 만들어보자 3일차 - Operator SDK

다음 포스트

Kubernetes Operator를 만들어보자 4일차 - advance한 기능