Elasticsearch 4. join, aggregation

yo·2021년 7월 16일
0

Joining queries

intro

  • ES를 primary data storage로 쓰는것은 권장되지 않는다.
  • 정규화도 잘 안한다. 하지만 간단한 join이 지원되긴 한다.
  • ES optimizes search performance by denormalizing data.
  • Performance > disk space
  • ES only supports simple joins
  • Joins are expensive

Querying nested objects

Creating the index with mapping

PUT /department
{
  "mappings": {  
    "properties": {
      "name": {
        "type": "text"
      },
      "employees": {
        "type": "nested"
      }
    }
  }
}

Adding test documents

PUT /department/_doc/1
{
  "name": "Development",
  "employees": [
    {
      "name": "Eric Green",
      "age": 39,
      "gender": "M",
      "position": "Big Data Specialist"
    },
    {
      "name": "James Taylor",
      "age": 27,
      "gender": "M",
      "position": "Software Developer"
    },
    {
      "name": "Gary Jenkins",
      "age": 21,
      "gender": "M",
      "position": "Intern"
    },
    {
      "name": "Julie Powell",
      "age": 26,
      "gender": "F",
      "position": "Intern"
    },
    {
      "name": "Benjamin Smith",
      "age": 46,
      "gender": "M",
      "position": "Senior Software Engineer"
    }
  ]
}
PUT /department/_doc/2
{
  "name": "HR & Marketing",
  "employees": [
    {
      "name": "Patricia Lewis",
      "age": 42,
      "gender": "F",
      "position": "Senior Marketing Manager"
    },
    {
      "name": "Maria Anderson",
      "age": 56,
      "gender": "F",
      "position": "Head of HR"
    },
    {
      "name": "Margaret Harris",
      "age": 19,
      "gender": "F",
      "position": "Intern"
    },
    {
      "name": "Ryan Nelson",
      "age": 31,
      "gender": "M",
      "position": "Marketing Manager"
    },
    {
      "name": "Kathy Williams",
      "age": 49,
      "gender": "F",
      "position": "Senior Marketing Manager"
    },
    {
      "name": "Jacqueline Hill",
      "age": 28,
      "gender": "F",
      "position": "Junior Marketing Manager"
    },
    {
      "name": "Donald Morris",
      "age": 39,
      "gender": "M",
      "position": "SEO Specialist"
    },
    {
      "name": "Evelyn Henderson",
      "age": 24,
      "gender": "F",
      "position": "Intern"
    },
    {
      "name": "Earl Moore",
      "age": 21,
      "gender": "M",
      "position": "Junior SEO Specialist"
    },
    {
      "name": "Phillip Sanchez",
      "age": 35,
      "gender": "M",
      "position": "SEM Specialist"
    }
  ]
}

Querying nested fields

  • employees중 intern이면서 여자인 사람을 찾아보자.
GET /department/_search
{
  "query": {
    "nested": {
      "path": "employees",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "employees.position": "intern"
              }
            },
            {
              "term": {
                "employees.gender.keyword": {
                  "value": "F"
                }
              }
            }
          ]
        }
      }
    }
  }
}
  • 꼭 nested를 명시해줘야 하는 이유는 object array가 저장될 때 아래처럼 저장되기ㄷ 때문.
  • emplyee와 department를 따로 분리해서 저장해야 관리하기 편하지 않을까?
  • join field를 사용하면 가능하다. RDS의 foreinkey처럼.
  • 먼저 inner hits를 살펴보고, 그 후에 join field를 배워보자.

Nested inner hits

  • inner hits는 relevance score로 정렬된다(디폴트)
  • sort커스텀 하려면 inner_hits값 안에 sort option주면 된다.
GET /department/_search
{
  "_source": false,
  "query": {
    "nested": {
      "path": "employees",
      "inner_hits": {},
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "employees.position": "intern"
              }
            },
            {
              "term": {
                "employees.gender.keyword": {
                  "value": "F"
                }
              }
            }
          ]
        }
      }
    }
  }
}

Mapping document relationships

  • doc간의 Relation을 주기 위해서 먼저 MApping을 손봐야한다.
  • relations의 키가 꼭 인덱스 이름(department)와 일치할 필요는 없다.
  • 아래 매핑으로 department-employee에 부모-자식 관계가 생긴다. (부모: department)
  • 실제 적용을 위해 str이었던 것을 array로 바꿔주면 된다.
PUT /department/_mapping
{
  "properties": {
    "join_field": { 
      "type": "join",
      "relations": {
        "department": "employee"
      }
    }
  }

Adding documents

  • parent와 CHILd는 같은 shard에 존재해야 한다.
  • employees를 add할 때 routing에 parent의 id를 routing으로 지정해주는 이유다.

Adding departments

PUT /department/_doc/1
{
  "name": "Development",
  "join_field": "department"
}
PUT /department/_doc/2
{
  "name": "Marketing",
  "join_field": "department"
}

Adding employees for departments

PUT /department/_doc/3?routing=1
{
  "name": "Bo Andersen",
  "age": 28,
  "gender": "M",
  "join_field": {
    "name": "employee",
    "parent": 1
  }
}
PUT /department/_doc/4?routing=2
{
  "name": "John Doe",
  "age": 44,
  "gender": "M",
  "join_field": {
    "name": "employee",
    "parent": 2
  }
}
PUT /department/_doc/5?routing=1
{
  "name": "James Evans",
  "age": 32,
  "gender": "M",
  "join_field": {
    "name": "employee",
    "parent": 1
  }
}
PUT /department/_doc/6?routing=1
{
  "name": "Daniel Harris",
  "age": 52,
  "gender": "M",
  "join_field": {
    "name": "employee",
    "parent": 1
  }
}
PUT /department/_doc/7?routing=2
{
  "name": "Jane Park",
  "age": 23,
  "gender": "F",
  "join_field": {
    "name": "employee",
    "parent": 2
  }
}
PUT /department/_doc/8?routing=1
{
  "name": "Christina Parker",
  "age": 29,
  "gender": "F",
  "join_field": {
    "name": "employee",
    "parent": 1
  }
}

Querying by parent ID

Querying child documents by parent

Querying parent by child documents

Multi-level relations

Parent/child inner hits

Terms lookup mechanism

Join limitations

Join field performance considerations

profile
Never stop asking why

0개의 댓글