Resurrecting Recurrent Neural Networks for Long Sequences

About_work·2023년 7월 25일

딥러닝

목록 보기

1/16

RNN
- fast inference on long sequences.
- hard to optimize and slow to train.
Deep state-space model(SSMs)
- long sequence modeling tasks에서 매우 성능 좋음.
- fast parallelizable training + fast inference.
we show that careful design of deep RNNs using standard signal propagation arguments can
- recover the impressive performance of deep SSMs on long-range reasoning tasks,
- while also matching their training speed.
we analyze and ablate a series of changes to standard RNNs including
- linearizing and diagonalizing the recurrence, using better parameterizations and initializations, and
- ensuring proper normalization of the forward pass.
이를 통해, SSMs의 성능에 대한 insight를 제공할 수 있게 되었고, Linear Recurrent Unit 이라는 모듈도 소개한다.(Long Range 성능도 좋고, computational efficiency도 갖췄다.)

새로운 것이 들어오면 이미 있는 것과 충돌을 시도하라.