ElasticSearch 정리 2일차 - Document 관리

놀고 싶은데, 왜 다들 공부하는거야·2025년 4월 4일

목록 보기

2/9

Document 관리

먼저 index를 생성해보고, 삭제해보도록 하자. 이전과 마찬가지로 kibana의 Dev Tools페이지로 들어가서 다음의 명령어를 입력해보도록 하자.

index생성

PUT /products

index 생성은 매우 간단하다. HTTP PUT method로 /{index} 요청을 만들어주면 된다. 위는 products라는 index를 생성한 것이다.

더 나아가서 request body를 추가하여 sharding과 replication을 설정할 수 있다.

PUT /products
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 2
  }
}

request body에 settings key를 설정하고 shard와 replica 수를 설정할 수 있다. 위에서는 2개의 shard와 2개의 replica를 설정하였다. 해당 명령어를 실행하면 다음의 결과가 나온다.

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "products"
}

제대로 만들어졌는 지 확인해보도록 하자. 확인하기 위해서는 해당 index로 GET 요청을 보내면 된다.

GET /products

다음의 결과가 나온다.

{
  "products" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "2",
        "provided_name" : "products",
        "creation_date" : "1698284885481",
        "number_of_replicas" : "2",
        "uuid" : "NDPqAz8yQUKhFxdxRp_AvQ",
        "version" : {
          "created" : "7171499"
        }
      }
    }
  }
}

number_of_shards와 number_of_replicas가 각각 2개씩 설정된 것을 볼 수 있다. 제대로 설정된 것을 알 수 있다.

이제 삭제해보도록 하자. 삭제는 DELETE http method를 index로 요청하면 된다.

DELETE /products

다음의 응답이 오면 성공이다.

{
  "acknowledged" : true
}

Indexing documents

document에 index를 부여하기 위해서 먼저 이전에 만들었던 products index를 다시만들도록 하자.

PUT /products
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 2
  }
}

성공적으로 products index가 만들어졌다면, 다음으로 products에 document를 추가해보도록 하자. document를 추가하는 것은 index의 POST로 요청하면 되며, request body가 document가 된다. 정리하자면 다음과 같다.

path: /{index}/_doc
method: POST
body: document

POST /products/_doc
{
  "name": "Coffee Maker",
  "price": 64,
  "in_stock": 10
}

POST /{index}/_doc으로 요청을 보내며 request body는 전부가 document가 된다. 다음의 결과가 나오게 될 것이다.

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "YjSyaYsBI-HjrEGdJxAT",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

result로 created가 나왔다면 성공한 것이다. _shards에서 total 2인데 successful이 1로 나온 것은 replica shard가 할당될 node가 없기 때문이다.

_id는 document를 구분하는 고유 식별자이다. 이는 자동으로 생성되며, 필요에 따라 클라이언트가 지정할 수 있다.

위의 요청을 다음과 같이 수정하면 된다.

POST /products/_doc/101
{
  "name": "Coffee Maker",
  "price": 64,
  "in_stock": 10
}

_id를 지정하여, 특정 document의 내용을 변경할 수도 있다. 즉 update가 가능하다는 것이다. 단, 이 경우에는 PUT 메서드로 요청해야한다. 만약, 해당 document가 이미 없다하더라도 새로 만든다.

path: /{index}/_doc/{id}
method: PUT
body: document

PUT /products/_doc/100
{
  "name": "Toaster",
  "price": 49,
  "in_stock": 4
}

다음의 응답이 온 것을 확인할 수 있다.

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 8,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 8,
  "_primary_term" : 1
}

해당 document를 가져오는 방법은 다음과 같다.

path: /{index}/_doc/{id}
method: GET

GET /products/_doc/100

응답으로 원하는 document가 왔는 지 확인해보도록 하자.

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 8,
  "_seq_no" : 8,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Toaster",
    "price" : 49,
    "in_stock" : 4
  }
}

만약 해당 document가 없다면 found가 false로 나오게 된다. _source부분이 document 내용인 것을 볼 수 있다.

Update Document

이제 해당 document의 in_stock을 하나 감소시켜보도록 하자. document update API를 사용하면 쉽게 가능하다.

path: /{index}/_update/{document_id}
method: POST
body: 수정할 내용(key: "doc")

POST /products/_update/100
{
  "doc": {
    "in_stock": 3
  }
}

products index의 id 100을 가진 document의 내용에서 in_stock을 3으로 변경하라는 것이다. 응답으로 다음과 같이 온다.

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 11,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 12,
  "_primary_term" : 1
}

result가 updated라면 성공적으로 반영되었다는 것이다.

해당 document가 원하는대로 변경되었는 지 확인해보도록 하자.

GET /products/_doc/100

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 11,
  "_seq_no" : 12,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Toaster",
    "price" : 49,
    "in_stock" : 3
  }
}

in_stock이 성공적으로 3으로 변경된 것을 알 수 있다.

재밌는 것은 document를 update하는 REST API를 전달할 때, document에 없는 field를 추가해도 반영이 된다는 것이다. 즉, 추가된다는 것이다. 위의 document에 원래는 없던 tags를 추가해보도록 하자.

POST /products/_update/100
{
  "doc": {
    "tags": ["electorinics"]
  }
}

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 12,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 13,
  "_primary_term" : 1
}

result가 updated로 된 것을 볼 수 있다. GET으로 확인해보도록 하자.

GET /products/_doc/100

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 12,
  "_seq_no" : 13,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Toaster",
    "price" : 49,
    "in_stock" : 3,
    "tags" : [
      "electorinics"
    ]
  }
}

반영된 것을 확인할 수 있다.

하지만 사실 Update는 없다. 왜냐하면 document는 immutable이기 때문이다. 즉, 여태까지의 document update는 사실 update가 아니라, replace이다. 즉, 원래있던 데이터를 변경하는 것이 아니라 삭제하고 다시 생성한다이다. PUT아든 POST이든 상관없이 새로 만들어서 새로 저장하는 것이다.

이는 겉보기에는 별 문제없어 보이지만, update API가 굉장히 많은 연산을 요구하고 있다는 것을 알 수 있다. 따라서 application level에서 너무 많은 update API를 사용하지 않아야 하며, update API가 많으면 이에 대한 overhead가 생길 수 밖에 없다는 것이다.

위에서는 update를 POST로 했는데 PUT으로하면 연산이 달라진다. POST는 기존 데이터에 새로운 데이터를 추가해서 다시 저장하는 개념이라면, PUT으로 update를 요청하면 request body를 기준으로 이전 데이터를 삭제하고 덮어쓴다.

path: /{index}/_doc/{id}
method: PUT
body: 덮어쓸 document data

PUT /products/_doc/100
{
  "name": "Toaster",
  "price": 78,
  "in_stock": 4
}

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 19,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 22,
  "_primary_term" : 1
}

result가 update로 잘 나온 것을 확인할 수 있다. 이제 진짜로 기존 데이터들이 새로운 데이터로 덮어쓰인 것인지 확인해보도록 하자. 이전과 달라진 데이터들도 있지만 tags의 경우 PUT의 request body에는 tags가 없다. 따라서, 삭제되었을 것이다.

GET /products/_doc/100

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 19,
  "_seq_no" : 22,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Toaster",
    "price" : 78,
    "in_stock" : 4
  }
}

새로운 데이터로 완전히 덮어쓰인 것을 확인할 수 있다.

Script

document를 update할 때 script라는 것을 사용할 수 있다. script는 query에 정해진 script를 넣어서 실행하고, 그 결과를 반영하도록 하는 것이다. 가령 위의 document의 in_stock을 하나 감소시키려면 --를 써서 감소시킬 수 있다.

POST /products/_update/100
{
  "script": {
    "source": "ctx._source.in_stock--"
  }
}

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 17,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 18,
  "_primary_term" : 1
}

ctx._source가 현재 elasticsearch에 저장된 document가 가진 data이다. in_stock--로 하나 감소시켰고, 결과를 보면 updated로 잘 반영된 것을 볼 수 있다. GET으로 확인해보도록 하자.

GET /products/_doc/100

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 17,
  "_seq_no" : 18,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Toaster",
    "price" : 49,
    "in_stock" : 2,
    "tags" : [
      "electorinics"
    ]
  }
}

in_stock이 2로 변경된 것을 알 수 있다. 더 나아가서 script가 주는 장점으로 여러가지 동적인 상황에서의 처리를 이용해보도록 하자. parameter로 quantity를 주고, 이 수량만큼 in_stock을 감소시키도록 하자. parameter는 "param"이라는 json 객체로 key를 통해서 script에서 접근할 수 있다.

POST /products/_update/100
{
  "script": {
    "source": "ctx._source.in_stock -= params.quantity",
    "params": {
      "quantity": 4
    }
  }
}

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 18,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 19,
  "_primary_term" : 1
}

params.quantity를 4로 설정하고 source에서 이를 parameter로 쓰는 것을 알 수 있다. 즉, source는 하나의 함수처럼 쓰고있다는 것을 알 수 있다. result가 updated가 되었으므로 잘 변경되었는 지 확인해보도록 하자.

GET /products/_doc/100

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 18,
  "_seq_no" : 19,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Toaster",
    "price" : 49,
    "in_stock" : -2,
    "tags" : [
      "electorinics"
    ]
  }
}

기존에 in_stock 2에서 -2로 변경된 것을 볼 수 있다.

더 나아가서 source에 다음과 같이 고도화된 스크립트들을 넣을 수 있다.

POST /products/_update/100
{
  "script": {
    "source": """
        if (ctx._source.in_stock < 0) {
          ctx.op = 'noop';
        }
        
        ctx._source.in_stock--;
      """
  }
}

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 18,
  "result" : "noop",
  "_shards" : {
    "total" : 0,
    "successful" : 0,
    "failed" : 0
  },
  "_seq_no" : 19,
  "_primary_term" : 1
}

ctx._source.in_stock < 0 조건을 넣어서 0보다 작으면 ctx.op = 'noop'로 넣도록 하는 것이다. source에 ctx.op값을 변경하는 것은 함수에서 return과 같다. ctx.op를 noop로 쓰게되면 어떠한 operation이 실행되지 않았다는 것이다. 따라서, 응답 부분의 result을 확인하면 noop으로 응답이 온것을 확인할 수 있다.

더 나아가 ctx.op = 'delete'로 넣으면 해당 document를 삭제하도록 할 수도 있다.

Upsert

upsert는 update + insert로 document가 없으면 새로 데이터를 insert하고, 있으면 script의 source부분을 실행한다. 가령, 아직 document가 만들어지지 않았다면 새로 만들고, 이미 있다면 source의 script를 실행하도록 하는 것이다. 다음의 예제를 보자

path: /{index}/_update/102
method: POST
body: script, upsert

POST /products/_update/102
{
  "script": {
    "source": "ctx._source.in_stock++"
  },
  "upsert": {
    "name": "blander",
    "price": 500,
    "in_stock": 5
  }
}

위의 예시는 document id 102가 아직 없다면 upsert부분의 request body를 document data로 넣어주겠다는 것이다. 만약 document id 102가 있다면 script.source부분을 실행하는 것이다. 현재는 없으므로 새로 document가 생성될 것이다.

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "102",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 20,
  "_primary_term" : 1
}

result가 created인 것을 볼 수 있다. GET메서드로 잘 만들어졌는 지 확인해보도록 하자.

GET /products/_doc/102

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "102",
  "_version" : 1,
  "_seq_no" : 20,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "blander",
    "price" : 500,
    "in_stock" : 5
  }
}

request body의 내용이 잘 반영된 것을 확인할 수 있다. 그렇다면 다시 위의 upsert명령어를 다시 실행하면 어떻게 될까? 이미 document id 102가 생성되었으므로, script.source부분을 시작하게 된다.

POST /products/_update/102
{
  "script": {
    "source": "ctx._source.in_stock++"
  },
  "upsert": {
    "name": "blander",
    "price": 500,
    "in_stock": 5
  }
}

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "102",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 21,
  "_primary_term" : 1
}

result가 created에서 updated로 바뀐 것을 알 수 있다. 또한, script.source부분이 실행되었기 때문에 in_stock이 증가되었을 것이다.

GET /products/_doc/102

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "102",
  "_version" : 2,
  "_seq_no" : 21,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "blander",
    "price" : 500,
    "in_stock" : 6
  }
}

5에서 6으로 증가한 것을 확인할 수 있다.

Delete documents

삭제는 굉장히 쉽다.

path: /{index}/_doc/{id}
method: DELETE

DELETE /products/_doc/100

{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "_version" : 20,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 23,
  "_primary_term" : 1
}

result가 deleted로 된 것을 확인할 수 있다. 삭제되었다는 것이고, 실제로 삭제되었는 지 확인해보도록 하자.

GET /products/_doc/100
{
  "_index" : "products",
  "_type" : "_doc",
  "_id" : "100",
  "found" : false
}

found가 false로 나온 것을 알 수 있다.

놀고 싶은데, 왜 다들 공부하는거야

R3의 망령

이전 포스트

ElasticSearch 정리 1일차 - overview

다음 포스트