Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models (2023)

박상우·2023년 9월 7일

Chameleon LLM

Paper Review

목록 보기

30/51

Introduction

LLM은 매우 눈에 띄는 tool
CoT 등 다양한 method에도 불구하고 LLM의 근본적 한계인 (up-to-date information, mathematical reasoning, inability to utilize specialized models)은 여전히 존재
New domain에 대응할 수 있는 general한 모델은 아직 존재하지 않음
이러한 한계를 해결하기 위해 Chameleon a plug and play compositional reasoning framework를 제안
- tool-augmented LLM과 달리 Chameleon은 LLM 포함 더 많은 tool을 사용
- LLM을 natural language planner로 사용
- 통합된 모듈은 일종의 sequential program으로써 문맥과 쿼리를 업데이트 함
Tabular context를 포함한 mathematical benchmark인 TabMWP와 multi-modal QA인 ScienceQA에서 SOTA 달성
Summary
- LLM의 내재적인 한계를 극복한 모델인 plug-and-play compositional reasoning framework Chameleon 제안
- LLM을 natural language planner로 사용하여 다양한 모듈을 결합
- SOTA 달성

Compositional Reasoning

NMN등 compositional module network가 존재했지만 module configuration에 제약이 있었음
우리는 NMN에서 motivated
- 다만 카멜레온은 expensive supervision task-specific program이 모델링시 필요하지 않음
- 어떠한 training이 필요하지 않음, 단지 in-context learning으로써 LLM이 program을 generate

Tool-Augmented Language Models

LLM의 내재적인 한계 (new data에 취약, 외부 tool 이용 불가, 수리 해석)에 대응하기 위해 web search, domain-specific knowledge 등 external resource를 활용하는 경우가 존재
또한 vision model, Hugging Face model, Azure model을 활용한 LLM 모델도 존재
- 그러나 대부분이 supervision에 의존하고 한정된 tool을 사용함
- 따라서 유연하지 않고 적응력이 낮음
- 카멜레온은 natural language instruction을 통해 각 모듈의 역할을 정하고 예시를 불러옴
  - 즉, tool의 type과 source를 유연하게 사용
AutoGPT와 같은 정신을 공유
- AutoGPT는 아직 개발이 덜 되어 있는 반면, 우리는 그 아이디어를 사용해 효과 및 벤치마크 성능을 보임

General Framework: Chameleon

자동 통합 해석 프레임워크 카멜레온
- 넓은 범위의 문제를 해결할 수 있는 수용력을 가진 모듈들을 통합
- LLM-based planner로 original problem을 sub-task로 decompose하여 task-specific tool을 사용해 해결 가능케 함
- Natural language 가이드에 따라 program을 생성하여 에러 가능성이 낮고, 확장 가능성이 높으며, 유저 친화적임

Formulate

input query $\mathcal{x_0}$
module inventory $M_i \in \mathcal{M}$
constraints $\mathcal{G}$
natural language planner $\mathcal{P}$
planning task instruction $\mathcal{I}$
plan $p = M^1,...,M^T$ is generated

다음과 같이 plan이 generated

sequential하게 모듈이 실행되고, output을 생성
$c^{t-1}$ 은 cached information (image semantic, retrieved knowledge 등 이전 모듈의 결과)

new output에 따라 update
update input, update cache function은 hand-desined for each state

최종적으로 결과 생성

Application of Chameleon

Module Inventory

Knowledge Retrieval

$M_{kr}$
additional background knowledge를 retrieve함
science, mathematics 등 domain specific한 task에서 유리

Bing Search

$M_{bs}$
Knowledge retrieval과 유사하게, wide-ranging task-relevant knowledge를 얻음
broader or up-to-date information을 aim

Query Generator

$M_{qg}$
original problem은 task-relavant information을 retrieve 하기에 tailored query가 부족
problem에 맞게 query를 생성해주는 모듈
Bing Search 이전에 사용하면 좋은 효과

Image Captioner

$M_{ic}$
image의 caption을 generate
visual data를 language로 전환

Text Detector

$M_{td}$
diagram, charts, tables, maps, other visual element에서 text를 추출해야 할 때 사용

Row Lookup

$M_{rl}$
query가 tabular context를 포함할 때 relevant query를 가져오는 모듈

Column Lookup

$M_{cl}$
row lookup과 유사하게 relevant한 column을 가져오는 모듈

Table Verbalizer

$M_{tv}$
구조화된 table을 text로 변환하여 downstream module에게 이해를 돕게 하는 모듈

Program Generator

$M_{pg}$
query를 해결할 수 있는 python program을 생성 (if-else statements 등)

Program Verifier

$M_{pv}$
hallucination과 program의 error여부를 확인하기 위한 모듈
reliability와 accuracy를 상승시켜줌

Program Executor

$M_{pe}$
program을 실행 및 결과를 생성
bridge 역할

Solution Generator

$M_{sg}$
all cached information을 사용해서 solution 생성
CoT를 적용해 planner가 direct하게 모듈에 employ할 수 있음

Answer Generator

$M_{ag}$
rule-base approach (extract and normalize answer)
pipeline의 마지막 모듈로서, concise하고 task-specific한 answer를 생성

Science Question Answering

질문에 답변하는 것은 다양한 툴과 스킬을 필요로 함

박상우

이전 포스트

Large Language Models are Zero-Shot Reasoners (NIPS 2022)

다음 포스트

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models (2023)

Paper Review

Introduction

Compositional Reasoning

Tool-Augmented Language Models

General Framework: Chameleon

Formulate

Application of Chameleon

Module Inventory

Knowledge Retrieval

Bing Search

Query Generator

Image Captioner

Text Detector

Row Lookup

Column Lookup

Table Verbalizer

Program Generator

Program Verifier

Program Executor

Solution Generator

Answer Generator

Science Question Answering

Large Language Models are Zero-Shot Reasoners (NIPS 2022)

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules (2023)

0개의 댓글

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models (2023)

Paper Review

Introduction

Related Work

Compositional Reasoning

Tool-Augmented Language Models

General Framework: Chameleon

Formulate

Application of Chameleon

Module Inventory

Knowledge Retrieval

Bing Search

Query Generator

Image Captioner

Text Detector

Row Lookup

Column Lookup

Table Verbalizer

Program Generator

Program Verifier

Program Executor

Solution Generator

Answer Generator

Science Question Answering

Large Language Models are Zero-Shot Reasoners (NIPS 2022)

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules (2023)

0개의 댓글