QA ( SQuAD1.1 / BERT, LUKE )

Hyun·2022년 5월 10일

NLP

목록 보기

3/8

NLU

자연어 이해 는 텍스트 분류, 자연어 추론 및 이야기 이해와 같은 다양한 작업을 포함하는 자연어 처리의 중요한 분야입니다. 자연어 이해로 지원되는 응용 프로그램은 질문 답변(QA)에서 자동 추론에 이르기까지 다양합니다.

출처 : paperswithcode

QA ( Question Answering )

BERT-QA(Question-Answering, 질문-응답) task workflow 보기

문맥을 읽고 질문에 응답하는 모델 설계

QA는 주어진 질문(Question)과 문맥(Context)의 정보에 대한 이해를 기반으로 응답하는 것이다.

데이터 : SQuAD1.1

SQuAD page 바로가기

SQuAD ( Stanford Question Answering Dataset, 스탠포드 질문 답변 데이터 셋 ) : 독해 dataset
➡ 지문(passage)과 그 지문(P)에 대한 질문(Q)과 답(A)을 data로 가지는 dataset

500개 이상의 Wikipedia 기사에 100,000개 이상의 질문-답변 쌍이 포함됨
Input data : Passage(P, Document, Context), Question(Q)
- Question : crowdworkers(기계가 아닌 사람)가 제기
label ( Answering ) : Passage속 sub-Sequence ( 답변이 존재하는 Context의 시작과 끝의 index)
- SQuAD1.1에서 Answer은 항상 Passage 속 하위 Sequence로 구성되어야 한다.

다음은 SQuAD dataset에서 제공하는 샘플 문맥과 QA이다.

P	In meteorology, precipitation is any product of the condensation of atmospheric water vapor that falls under gravity. The main forms of precipitation include drizzle, rain, sleet, snow, graupel and hail... Precipitation forms as smaller droplets coalesce via collision with other raindrops or ice crystals within a cloud. Short, intense periods of rain in scattered locations are called “showers”.
QA 1	What causes precipitation to fall? gravity
QA 2	What is another main form of precipitation besides drizzle, rain, snow, sleet and hail? graupel
QA 3	Where do water droplets collide with ice crystals to form precipitation? within a cloud

SOTA 모델

1. BERT

NLU sub-task의 거의 모든 벤치마크의 상위권에 위치하는 BERT에 대해서 알아보자.

task	dataset	rank	dataset의 특징
NLU	LexGLUE	1	법률 사례 관련 dataset
QA	SQuAD1.1	8	500개 이상의 Wikipedia 기사에 대한 100,000개 이상의 질문-답변 쌍
Text 분류	AG News	2	영어 뉴스 카테고리를 4개의 label로 구분

BERT가 나타난 배경
pre-train에서 단방향의 architecture만 사용할 수 있다는 한계점으로 인해 token-level task에서 좋은 성능을 보이지 못하는 문제점이 있었다.

pre-training 과정에서 단 방향이 아닌 양방향(bidirectional) 학습하여 token-level task에도 뛰어난 성능을 보인다.
주요 개념과 키워드
- BERT는 pre-train을 통해 task에 대한 학습이 아닌 language 자체에 대한 학습을 하기 때문에 task에 맞게 fine-tuning을 했을 때 뛰어난 성능을 보인다.
  - Bidirectional : 양방향성
  - pre-trained : 사전 훈련
  - Feature-based Approach : 2가지 방식
    - 1. pre-trained된 model을 feature로 사용하는 방식
    - 1. pre-trained된 feature을 concat해서 model의 input으로 사용하는 방식
  - Fine-tuning Approach : language에 대해 학습된 값으로 initialized된 상태에서 task specific한 layer를 추가한 뒤 fine-tuning을 시작
    - fine-tuning : 미세 조정 Pre-training에 비해 매우 빠른 시간 내에 완료
- BERT는 MLM과 NSP라는 2가지의 Unsupervised task를 사용해 Pre-training을 수행한다.
  - MLM (Masked Language Model) : Random하게 Mask된 단어를 예측
  - NSP (Next Sentence Prediction) : 다음 문장 예측
    1. 두 문장이 자연스럽게 연결되는지 확인.
    2. label은 3가지( 수반, 중립, 모순 )로 이루어져있다.

2. LUKE ( Language Understanding with Knowledge-based Embeddings )

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

LUKE 바로가기

We propose LUKE, a new contextualized representations pecifically designed to address entityrelated tasks. LUKE is trained to predict randomly masked words and entities using a large amount of entity-annotated corpus obtained from Wikipedia.

We introduce an entity-aware self-attention mechanism, an effective extension of the original mechanism of transformer. The proposed mechanism considers the type of the tokens words or entities) when computing attention scores.

entity와 연관된 task에 적합한 새로운 BERT기반 pretrained MLM을 제안
pretraining 단계에서, entity에 해당하는 embedding도 같이 학습
LUKE는 Wikipedia로부터 얻은 많은 양의 entity 연관 corpus를 사용하여 훈련된 transformer 기반인 모델이다.
RoBERTa-LARGE을 기반으로 하는 모델
주요 키워드
- entity
- Static Entity Representations
- Bidirectional
- Contextualized Word Representations
- transformer