[UoT] Introduction to Deep Learning (2)

지유경·2025년 2월 24일

Week3

ANNs Part2

Neural Network Architecture
Architecture: 신경망 내 뉴런과 연결 구조를 설명하는 개념
Multi-Layer Perceptron (MLP):

Feed-Forward 및 Fully-Connected
Linear Layers + Nonlinear Activations

Ex) MLP, CNN, RNN..

Neural Network Training

Training: 데이터에서 모델 가중치 및 파라미터를 효과적으로 학습하는 과정
Loss function: 모델의 예측이 실제 값과 얼마나 차이가 있는지 측정
Optimizer: 모델의 가중치를 조정하여 최적의 출력을 도출
- method: (1) 무작위로 가중치를 선택
(2) 한 번에 하나의 가중치를 변경하여 오류를 줄이는 방향으로 조정
(3) Gradient Descent 사용

Gradient Descent

1-layer의 경우: 경사 방향을 따라 가중치를 조정하여 최소 손실점으로 이동.
2-layer의 경우: non-convex 표면에서 최적화를 수행해야 함.

Critical Points (임계점) 유형

극소값(Local Minima)
극대값(Local Maxima)
안장점(Saddle Point)
평탄한 영역(Plateau)

Auto Differentiation

뉴럴 네트워크에서 기울기를 직접 계산하는 것은 복잡하고 오류 발생 가능성이 높음.
자동 미분을 지원하는 프레임워크: Pytorch, Tensorflow, Keras, Theano 등
연산 그래프(Computation Graph)를 활용한 미분 계산

Multi-Class Classification

숫자 0~9 중 하나를 분류하는 문제

Hyperparameter Tuning

Hyperparameter:
- Batch Size
- Learning Rate
- Size of Network (레이어 개수, 뉴런 개수)
- Activation Function
Batching
- 한 번의 학습에서 n개의 샘플을 사용하여 평균 손실을 계산
- 작은 배치 -> 최적호 과정이 불안정
- 너무 큰 배치 -> 계산 비용이 커짐
Learning Rate
- 너무 작으면 학습이 오래 걸림.
- 너무 크면 발산(unstable)
- 학습이 진행될수록 학습률을 줄이는 방식 사용
최적화 기법
- SGD
- Momentum
- RMSProp
- Adam (SGD + Momentum + RMSProp)

Regularization

L2 정규화(Weight Decay)
- 가중치의 크기를 줄여 모델이 복잡해지는 것을 방지
L1 정규화(Lasso)
- 가중치를 0으로 만들어 feature selection이 가능
Dropout
- 무작위로 뉴런을 비활성화하여 과적합 방지

예상 문제

What is ANN?

An ANN is a machine learning model inspired by the way neurons in the human brain process information. It consists of interconnected neurons with weights and can learn patterns from data to perform tasks such as predictions, classification, and regression.

Describe the key features of multilayer perceptron (MLP).

A MLP is a feed-forward neural network consisting of at least three layers (input layer, hidden layer output layer). Neurons in each layer have weights, and non-linearity is introduced through activation functions. MLPs are fully connected an d typically include linear layers followed by non-linear activation functions such as ReLU or sigmoid.

Describe the role of loss functions in the learning process of neural networks.

A loss function measures the difference between a model's predictions and the actual values. It is used in optimization algorithms like Gradient Descent to help the model find the optimal weights. Common loss functions include MSE and Cross-Entropy Loss.

What is Gradient Descent?

Gradient Descent is an optimization algorithm that updates weights in a neural network to minimize the loss function. It computes the gradient of the loss function and adjusts weights accordingly. Common variants include Stochastic Gradient Descent (SGD), Momentum, RMSProp, and Adam.

Explain the difference between binary classification and multi-class classification in PyTorch.

In binary classification, there is a single output neuron with a Sigmoid activation function, and the loss function used is BCEWithLogitsLoss().
In multi-class classification, the number of output neurons equeals the number of classes, and Softmax activation is applied. The loss function used is CrossEntropyLoss()

What is batch size, and explain the problem when it is too large or too small.

Batch size refers to the number of samples used in a single optimiztion step.
If too small, training becomes unstable, and the loss function fluctuates frequently.
If too large, computational cost increases, and optimization may slow down.

Explain overfitting and how to prevent it.

Overfitting occurs when a model is too optimized to the training data and lacks generalization to new data. It can be prevented using Dropout, L1/L2 Regularization, Data Augmenation, and Early Stopping.

지유경

M. Sc in Computer Science and Engineering, Mar 2024 to present / B. Sc in Computer Engineering, Mar 2020 to Feb 2024

이전 포스트

[UoT] Introduction to Deep Learning (2)

Week3

ANNs Part2

예상 문제

[UoT] Introduction to Deep Learning (1)

0개의 댓글