LM2: Large Memory Models

ㅇㅇ·2025년 2월 18일

목록 보기

108/113

메모용

모양 재미있고 좋은데 memory bank가 순전히 input에만 반응하니 불완전하지 않나?
외부 감각만 기억하는게 아니라 내 '생각', 즉 내적 청각에도 cross attention해서 기억해야하지 않나?
그럼 '생각' 후 forget/input gate를 한번 더 거치도록 해서 한번의 timestep동안 2번 메모리 업데이트 하도록? 즉 output 문장을 multihead attention에 넣어서 input과 똑같은 처리를 시켜준다?
또는 아예 output 결과를 만든후 input이랑 같이 한번에 업데이트를 한다? 이건 구현의 선택지기하다

감정/충격은 RL과 연동

LM2 outperforms state-of-the-art (SOTA) memory model Recurrent Memory Transformer (RMT)
Bulatov et al. (2022) by up to 80.4%

The decoder
block processes input sequences using positional embeddings, while the memory module interacts with these
embeddings via cross attention mechanisms. We use a skip connection between the multi-head attention
and the memory modules to facilitate learning and maintain the original intermediate embeddings of the
Transformer. The memory updates are controlled by learnable control gates, denoted as F, I, and O, which
correspond to the forget, input, and output gates, respectively

As depicted in Figure 1, we introduce an explicit memory module, named the memory bank M ∈ R
N×d×d
,
designed to store long-term memory.
라고 하지만 이건 보조장치고 전체 네트워크 파라미터를 업데이트하는 게 아니므로 장기기억보단 단기기억이라고 부르는게 맞지 않나.

Here, N denotes the number of memory slots, while d represents the
hidden dimension of each slot. For simplicity, each memory slot is initialized as an identity matrix: Mr = Id×d,
where r ∈ {1, . . . , N} and Id×d is the identity matrix.
그냥 identity matrix로 초기화하네. (이래도 되나?)
그리고 왜 dxd지?

M∈R^(N×d×d)로 설정한 이유는 다음과 같습니다:

유연한 변환 능력:

d×d 행렬은 d차원 벡터를 다른 d차원 벡터로 변환할 수 있는 선형 변환을 표현할 수 있습니다
이는 각 메모리 슬롯이 단순한 벡터 저장소가 아닌, 입력 정보를 동적으로 변환할 수 있는 능력을 가지게 합니다

초기화의 의미:

각 슬롯을 단위행렬(identity matrix)로 초기화함으로써:

학습 초기에는 입력을 그대로 유지(보존)합니다
학습이 진행되면서 필요한 변환을 학습할 수 있습니다

게이팅 메커니즘과의 호환성:

d×d 구조는 게이팅 메커니즘을 통한 정보 흐름 제어에 더 풍부한 표현력을 제공합니다
입력 정보의 특정 부분을 선택적으로 변환하고 저장할 수 있습니다

이러한 설계는 각 메모리 슬롯이 단순히 정보를 저장하는 것을 넘어서, 저장된 정보를 동적으로 처리하고 변환할 수 있는 능력을 가지도록 합니다.

memory bank에서 특정 E(기억 벡터?)를 추출해서 입력 E랑 cross attention하는 게 아니라 memory bank 통째로 cross attention한다.

related works의 transformer memory 논문들 찾아볼만할듯

ㅇㅇ

학과최약체

이전 포스트

읽은 리뷰

다음 포스트

LM2: Large Memory Models

논문리뷰

읽은 리뷰

Chain of Draft: Thinking Faster by Writing Less

0개의 댓글