[U] Week 3 Day 2

이동찬·2022년 10월 12일

네이버 부스트캠프 AI Tech

목록 보기

15/18

1. 강의 복습 내용

목표 :
- 3주차 딥러닝 강의 4강까지 듣고 학습 정리하기
- 퀴즈 2개 풀기
- 멘토링 과제 해보기 (pytorch lightning 변환)
- 피어세션 전 스터디 내용 복기

결과 :
- 3주차 딥러닝 강의 4강까지 듣고 학습 정리하기 (O)
- 퀴즈 2개 풀기 (O)
- 멘토링 과제 해보기 (pytorch lightning 변환) (O)
- 피어세션 전 스터디 내용 복기 (O)

(1) Historical Review

What make you good deep learner?

Implementation Skills (ex. PyTorch)

Math Skills (Linear Algebra, Probability)

Knowing a lot of recent Papers

Key components of DL

Data : model can learn from
model : how to transform the data
loss function : quantifies the badness of the model

proxy of what we want to achieve

algorithm : adjust the parameters to minimize the loss

Deep Learning's Most Important Ideas - Denny Britz

연도	Ideas	설명
2012	`AlexNet`	딥러닝을 이용한 방법론으로 첫 우승
2013	`DQN`	알파고를 만든 방법론 오늘날의 딥마인드를 있게 한 논문
2014	`Encoder / Decoder`	Machine Translation
	`Adam Optimizer`	기본적으로 탁월한 성능 보장
2015	`GAN`(Generative Adversarial Network)
	`ResNet`(Residual Networks)	네트워크를 깊게 쌓을 수 있게 만든 방법론
2017	`Transformer`	Attention Is All You Need
2018	`BERT` (Bidirectional Encoder Representations from Transformers)	fine-tuned NLP models
2019	OpenAI, `BIG Language Models (GPT-3)`	an autoregressive language model with 175 billion parameters
2020	`Self-Supervised Learning`	SimCLR : a simple framework for contrastive learning of visual representations

(2) Neural Networks & Multi-Layer Perceptron

Neural Networks의 정의

Neural networks are computing systems vaguely inspired by the biological neural networks that constitute animal brains.
Neural networks are function approximators that stack affine transformations followed by nonlinear transformations.
- stack : 반복적
- affine transformations : 행렬을 곱하는 연산

Linear Neural Networks

input과 output이 1차원인 문제에서, 두 개를 연결하는 선형 모델을 찾는 것
→ 2개의 파라미터 (기울기 w, 절편 b) 찾는 문제
w와 b를 어떻게 찾을까?
→ compute the partial derivatives w.r.t. the optimization variables

$weight$
$\frac{\partial loss}{\partial w} = \frac{\partial }{\partial w}\frac{1}{N}\sum_{i=1}^{N}(y_i - \hat{y}_i)^2 \\= \frac{\partial }{\partial w}\frac{1}{N}\sum_{i=1}^{N}(y_i - wx_i - b)^2 \\ = -\frac{1}{N}\sum_{i=1}^{N}-2(y_i - wx_i - b)x_i$
$bias$
$\frac{\partial loss}{\partial b} = \frac{\partial }{\partial b}\frac{1}{N}\sum_{i=1}^{N}(y_i - \hat{y}_i)^2 \\= \frac{\partial }{\partial b}\frac{1}{N}\sum_{i=1}^{N}(y_i - wx_i - b)^2 \\ = -\frac{1}{N}\sum_{i=1}^{N}-2(y_i - wx_i - b)$

update(←) the optimization variables by Gradient descent
$w ← w - \eta\frac{\partial loss}{\partial w} \\ b ← b - \eta\frac{\partial loss}{\partial b}$
다차원(multi-dimensional) 적용도 가능

Matrix를 해석하는 한 가지 방법은 두 개의 벡터 공간 사이의 mapping으로 간주하는 것

단순히 층을 더 쌓는다면?

단순한 두 행렬의 곱셈은 사실상 1층의 layer와 다를 바가 없음
→ 그렇다면 여러 개의 층을 쌓기 위해 필요한 것은?
=> Nonlinear transform

MLP(Multi-Layer Perceptron)

단순한 행렬곱이 아니라면 nonlinear activation function이 필요

Activation functions

Loss functions

CE(Cross-Entropy)를 사용하는 이유
→ 보통 분류 task에서는 Predicted output이 one-hot vector로 표현됨
→ loss를 계산할 때, 해당 class에 속하는 값만 높여주면 됨
MLE(Maximum Likelihood Estimation)
- Probabilistic Task : 출력값이 숫자가 아니라 확률값일 때
→ 신뢰구간(confident interval), uncertainty 정보를 같이 찾고 싶을 때

(3) Optimization

2. 공부를 하며 고민한 내용, 고민 결과

Regression Task와 Classification Task의 loss function이 다른 이유는 뭘까?
Regression Task, Classification Task, Probabilistic Task의 Loss 함수는 PyTorch에서 어떻게 구현이 되어 있을까?
→ PyTorch official docs

참고 : Feedforward(순전파) vs Backpropagation(역전파) 개념

3. 과제 수행 과정/과제 결과물에 대한 정리

4. 피어 세션

[ 데일리스크럼 ]

cs224n과 Deep Learning Book 발표를 위한 3명의 발표자 선정
→ cs224n은 내가 자원해서 발표하기로 했고, Deep Learning Book은 chapter2, 3 각각 남규님과 윤호님이 발표하기로 결정

[ 피어세션 ]

cs224n 1강 발표 (동찬)
Deep Learning Book chapter2. Linear Algebra 발표 (남규님)
- SVD에 대한 블로그 (by 남규님) : https://darkpgmr.tistory.com/106

5. 회고

오늘의 모더레이터 : 남규님
금일 피어세션 시간에 cs224n 1강을 듣고 정리했던 내용을 바탕으로 팀원들에게 발표를 했음
cs224n 강의는 좋지만, Deep Learning Book까지 계속 스터디 해나갈지에 대한 고민

이동찬

NLP ML Engineer, MLOps

이전 포스트

[U] Week 3 Day 1

다음 포스트

[U] Week 3 Day 2

네이버 부스트캠프 AI Tech

1. 강의 복습 내용

(1) Historical Review

What make you good deep learner?

Key components of DL

Deep Learning's Most Important Ideas - Denny Britz

(2) Neural Networks & Multi-Layer Perceptron

Neural Networks의 정의

Linear Neural Networks

MLP(Multi-Layer Perceptron)

Loss functions

(3) Optimization

2. 공부를 하며 고민한 내용, 고민 결과

3. 과제 수행 과정/과제 결과물에 대한 정리

4. 피어 세션

5. 회고

[U] Week 3 Day 1

[U] Week 3 Day 3

0개의 댓글