**The above picture is a laptop power saving screen, and has nothing to do with this post.
Neural Machine Translation by Jointly Learning to Align and Translate (a.k.a. Bahdanau Attention)
Effective Approaches to Attention-based Neural Machine Translation (a.k.a. Luong Attention)
Attention Is All You Need (a.k.a. Transformer)
Improving Language Understanding by Generative Pre-Training (a.k.a. GPT-1)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (a.k.a. BERT)
Language Models are Unsupervised Multitask Learners (a.k.a. GPT-2)
Language Models are Few-Shot Learner (a.k.a. GPT-3)
Multi-Task Deep Neural Networks for Natural Language Understanding (a.k.a. MT-DNN)
MASS: Masked Sequence to Sequence Pre-training for Language Generation (a.k.a. MASS)
XLNet: Generalized Autoregressive Pretraining for Language Understanding (a.k.a. XLNet)
RoBERTa: A Robustly Optimized BERT Pretraining Approach (a.k.a. RoBERTa)
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a.k.a. BART)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (a.k.a. T5)
Robust Speech Recognition via Large-Scale Weak Supervision (a.k.a. Whisper)
If you see this post, please recommend papers.
your recommended paper 1
your recommended paper 2
your recommended paper 3
인터스피치 컨퍼런스
Data2vec(메타AI)
NVIDIA - Language Understanding Model
ERNIE
ULMFit
ELMo
HyperClova(Naver)
KoGPT(kakao)
KorBERT(ETRI)
Learning to Identify Ambiguous and Misleading News Headlines
Why Does Unsupervised Pre-training Help Deep Learning?
PaLM: Scaling Language Modeling with Pathways
LaMDA
Gopher
GLaM
DALL-E (to attain for language)
YOLO (computer vision, but to attain for language)
Chinchilla
BIG-bench
DialogBERT
A Neural Conversational Model(Google, Paper) - Generative Conversational model(생성 대화 모델)
Meena
LUKE(Deep Contextualized Entity Representations with Entity-aware Self-attention)
RNN