๐Ÿค– Perceptron์„ ์ดํ•ดํ•˜๊ณ  Pytorch๋กœ MLP ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ | ๋‚ด๊ฐ€๋ณด๋ ค๊ณ ์ •๋ฆฌํ•œAI๐Ÿง

HipJaengYiCatยท2023๋…„ 3์›” 26์ผ
1

DeepLearning

๋ชฉ๋ก ๋ณด๊ธฐ
2/16
post-thumbnail

preview

  • ์ด์ „ ์žฅ์—์„œ ๋”ฅ๋Ÿฌ๋‹๊ณผ ๋”ฅ๋Ÿฌ๋‹์˜ ์ฃผ์š” ์š”์†Œ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์•˜๋‹ค๋ฉด ์ด๋ฒˆ ์žฅ์—์„œ๋Š” ๋”ฅ๋Ÿฌ๋‹์˜ ์ฃผ์š” ์š”์†Œ ์ค‘ model์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋ ค๊ณ  ํ•œ๋‹ค.
  • ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ์‹œ์ดˆ๊ฐ€ ๋˜๋Š” ๋ชจ๋ธ์ธ perceptron์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ณ  ์—ฌ๋Ÿฌ Layer๋ฅผ ๊ฑฐ์น˜๋ฉฐ ๋”ฅ ํ•ด์ง„ Multi-Layer Perceptron์„ pytorch๋กœ ๊ตฌํ˜„ํ•ด๋ณด์ž!
  1. ํผ์…‰ํŠธ๋ก (Perceptron)์ด๋ž€?
  2. ๋‹ค์ธตํผ์…‰ํŠธ๋ก (Multi Layer Perceptron)์ด๋ž€?
  3. Pytorch๋กœ Multi Layer Perceptron ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ

ํผ์…‰ํŠธ๋ก (Perceptron)์ด๋ž€?

๋‰ด๋Ÿฐ์ด ํ•˜๋‚˜๋ฟ์ธ ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ํ˜•ํƒœ์˜ ์‹ ๊ฒฝ๋ง์ด๋‹ค. ์ธ๊ฐ„์˜ ๋‰ด๋Ÿฐ๊ณผ ๋น„์Šทํ•˜๊ฒŒ ์ž‘๋™ํ•˜๋„๋ก ๊ณ ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด๋‹ค.

๐Ÿ†€ ํผ์…‰ํŠธ๋ก ์€ ์ธ๊ฐ„์˜ ๋‰ด๋Ÿฐ๊ณผ ๋น„์Šทํ•˜๊ฒŒ ์ž‘๋™ํ•œ๋‹ค๊ณ  ํ•˜์˜€๋Š”๋ฐ ์ธ๊ฐ„์˜ ๋‰ด๋Ÿฐ์€ ์–ด๋–ค ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ• ๊นŒ?
๐Ÿ…ฐ ์ƒ๋ฌผํ•™์  ๋‰ด๋Ÿฐ์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ˆ˜์ƒ๋Œ๊ธฐ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ์„ธ๊ธฐ์˜ ์ „๊ธฐ์  ์‹ ํ˜ธ๋ฅผ ๋ฐ›๊ณ  ์ด ์‹ ํ˜ธ ์„ธ๊ธฐ์˜ ํ•ฉ์ด ์ •ํ•ด์ง„ ์ž„๊ณ—๊ฐ’์„ ๋„˜์œผ๋ฉด ์‹œ๋ƒ…์Šค๋ฅผ ํ†ตํ•ด ์ถœ๋ ฅ ์‹ ํ˜ธ๋ฅผ ๋ณธ๋‚ธ๋‹ค.

๐Ÿ†€ ํผ์…‰ํŠธ๋ก ์€ ์–ด๋–ป๊ฒŒ ์ธ๊ฐ„์˜ ๋‰ด๋Ÿฐ์„ ๋ชจ๋ฐฉํ•˜์˜€์„๊นŒ?
๐Ÿ…ฐ ํผ์…‰ํŠธ๋ก ์€ ์ „์ฒด ์ž…๋ ฅ ์‹ ํ˜ธ ์„ธ๊ธฐ์˜ ํ•ฉ์„ ๊ตฌํ•˜๋Š” ์„ ํ˜•๊ฒฐํ•ฉ(linear combination)๊ณผ ์ž…๋ ฅ์˜ ์‹ ํ˜ธ ์„ธ๊ธฐ์˜ ํ•ฉ์ด ์ž„๊ณ—๊ฐ’์„ ์ดˆ๊ณผํ•  ๋•Œ๋งŒ ์ถœ๋ ฅ ์‹ ํ˜ธ๋ฅผ ๋ณด๋‚ด๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜(activate function)๋ฅผ ํ†ตํ•ด ์ธ๊ฐ„์˜ ๋‰ด๋Ÿฐ์„ ๋ชจํ˜•ํ™”ํ•œ๋‹ค.

๐Ÿ‘‰ ์„ ํ˜•๊ฒฐํ•ฉ(linear combination)์ด๋ž€?

  • ์ž…๋ ฅ๊ฐ’์˜ ๊ฐ๊ฐ ๊ฐ€์ค‘์น˜๋ฅผ ๊ณฑํ•œ ๊ฐ’๋“ค์˜ ํ•ฉ์— bias(ํŽธํ–ฅ)์„ ๋”ํ•œ ๊ฐ’์œผ๋กœ ์ •์˜๋œ๋‹ค.

  • ์ด๋•Œ ํŽธํ–ฅ์„ ๋”ํ•ด์ฃผ๋Š” ์ด์œ ๋Š” ๊ฐ€์ค‘ํ•ฉ์„ ํ•จ์ˆ˜๋กœ ์ •์˜ํ•˜๋ฉด ํŽธํ–ฅ์€ y์ ˆํŽธ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์œผ๋ฉฐ ํŽธํ–ฅ์„ ์กฐ์ •ํ•ด์„œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก์ด ๋”์šฑ ์ •ํ™•ํ•˜๋„๋ก ์ง์„ ์˜ ์œ„์น˜๋ฅผ ์œ„์•„๋ž˜๋กœ ์กฐ์ ˆํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ์ฆ‰, ํŽธํ–ฅ์ด ์—†๋‹ค๋ฉด ์ง์„ ์€ ํ•ญ์ƒ ์›์ ์„ ์ง€๋‚˜๋ฏ€๋กœ ๊ทธ๋งŒํผ ๋ถ€์ •ํ™•ํ•œ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ฒŒ ๋œ๋‹ค

๐Ÿ‘‰ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(activate function)์ด๋ž€?

  • ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋Š” ๋‡Œ์—์„œ ๊ฒฐ์ •์„ ๋‚ด๋ฆฌ๋Š” ์—ญํ• ๋กœ ์ž…๋ ฅ์„ ์„ ํ˜•๊ฒฐํ•ฉํ•œ ๊ฐ€์ค‘ํ•ฉ์ด ์ •ํ•ด์ง„ ์ž„๊ณ„๊ฐ’๋ณด๋‹ค ํฌ๋ฉด ๋‰ด๋Ÿฐ์„ ํ™œ์„ฑํ™”์‹œํ‚จ๋‹ค

๐Ÿ‘‰ ํผ์…‰ํŠธ๋ก ์˜ ํ•™์Šต ๋ฐฉ๋ฒ•

  1. ๋‰ด๋Ÿฐ์ด ์ž…๋ ฅ์„ ์„ ํ˜•๊ฒฐํ•ฉํ•˜์—ฌ ๊ฐ€์ค‘ํ•ฉ์„ ๊ณ„์‚ฐํ•œ ๋’ค ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ์ž…๋ ฅํ•ด์„œ ์˜ˆ์ธก๊ฐ’ ^y์„ ๊ฒฐ์ •ํ•œ๋‹ค. ์ด ๊ณผ์ •์„ ์ˆœ์ „ํŒŒ๋ผ๊ณ  ํ•œ๋‹ค
  2. ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’์„ ๋น„๊ตํ•ด ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค
  3. ์˜ค์ฐจ์— ๋”ฐ๋ผ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•œ๋‹ค. ์˜ˆ์ธก๊ฐ’์ด ๋„ˆ๋ฌด ๋†’์œผ๋ฉด ์˜ˆ์ธก๊ฐ’์ด ๋‚ฎ์•„์ง€๋„๋ก ์กฐ์ •ํ•˜๊ณ , ์˜ˆ์ธก๊ฐ’์ด ๋„ˆ๋ฌด ๋‚ฎ์œผ๋ฉด ์˜ˆ์ธก๊ฐ’์ด ๋†’์•„์ง€๋„๋ก ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•œ๋‹ค
  4. ์ด ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•˜๋ฉฐ ์˜ค์ฐจ๊ฐ€ 0์— ๊ฐ€๊น๋„๋ก ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•œ๋‹ค.

๐Ÿ†€ ํ•˜๋‚˜์˜ ๋‰ด๋Ÿฐ์œผ๋กœ ๊ตฌ์„ฑ๋œ ํผ์…‰ํŠธ๋ก ์„ ์•Œ์•„๋ดค๋Š”๋ฐ ์„ธ์ƒ์˜ ๋ณต์žกํ•œ ๋ฌธ์ œ๋“ค์„ ํ•˜๋‚˜์˜ ๋‰ด๋Ÿฐ๋งŒ์œผ๋กœ ํ•ด๊ฒฐ๊ฐ€๋Šฅํ• ๊นŒ?
๐Ÿ…ฐ ํผ์…‰ํŠธ๋ก ์€ ๊ฒฐ๊ตญ ์„ ํ˜•ํ•จ์ˆ˜๋กœ ํ•™์Šต๋œ ๋‰ด๋ จ์ด ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ„๋Š” ๊ฒฝ๊ณ„๊ฐ€ ์ง์„ ์œผ๋กœ ๋ณต์žกํ•œ ๋ฌธ์ œ๋“ค์ธ ๋น„์„ ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ๋ถ„๋ฆฌํ•  ์ˆ˜ ์—†๋‹ค. ๋”ฐ๋ผ์„œ ํ•˜๋‚˜์˜ ๋‰ด๋Ÿฐ๋ณด๋‹จ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋‰ด๋Ÿฐ์„ ์ถ”๊ฐ€ํ•ด์•ผ ๋น„์„ ํ˜•๋ฐ์ดํ„ฐ์— ๋” ์ ํ•ฉํ•œ ๊ฒฝ๊ณ„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.


๋‹ค์ธต ํผ์…‰ํŠธ๋ก (Multi Layer Perceptron)์ด๋ž€?

๋‹ค์ธตํผ์…‰ํŠธ๋ก (Multi Layer Perceptron)์€ ํผ์…‰ํŠธ๋ก ์„ ์ด๋ฃจ์–ด์ง„ ์ธต(Layer) ์—ฌ๋Ÿฌ ๊ฐœ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ๋ถ™์—ฌ ๋†’์€ ํ˜•ํƒœ ์ด๋‹ค.

Pytorch๋กœ Multi Layer Perceptron ๊ตฌํ˜„ํ•˜๊ธฐ

  • pytorch๋ฅผ ์ด์šฉํ•˜์—ฌ ์‹ค์ œ Multi Layer Perceptron์„ ๊ตฌํ˜„ํ•ด๋ณด์ž
  • ํ•ด๋‹น ๋‚ด์šฉ์€ ๋„ค์ด๋ฒ„๋ถ€์ŠคํŠธ์บ ํ”„AITech์—์„œ ์ œ๊ณตํ•œ ์ฝ”๋“œ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์žฌ๊ฐ€๊ณตํ•˜์—ฌ ๋งŒ๋“ค์–ด์ง„ ์ฝ”๋“œ์ด๋‹ค.
  • ํ•„์ž๋Š” M1 ์‚ฌ์šฉ์ž๋กœ์„œ M1์˜ GPU๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์ถ”๊ฐ€ ํ•˜์˜€๋‹ค.

1. ํ•„์š”ํ•œ ํŒจํ‚ค์ง€ import ๋ฐ device ํ• ๋‹น

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
%matplotlib inline # ์…€์— ๊ทธ๋ž˜ํ”„ ์ถœ๋ ฅํ•˜๋„๋ก ์„ค์ •
%config InlineBackend.figure_format='retina' # ๋ ˆํ‹ฐ๋‚˜ ์„ค์ • - ํฐํŠธ ์ฃผ๋ณ€์ด ํ๋ฆฟํ•˜๊ฒŒ ๋ณด์ด๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•ด ๊ธ€์”จ๊ฐ€ ์ข€ ๋” ์„ ๋ช…ํ•˜๊ฒŒ ๋ณด์ž„

print ("PyTorch version:[%s]."%(torch.__version__)) # ํ† ์น˜ ๋ฒ„์ „ ํ™•์ธ

# device์— ์ผ๋ฐ˜ GPU or M1 GPU or CPU๋ฅผ ํ• ๋‹นํ•ด์ฃผ๋Š” ์ฝ”๋“œ
if torch.cuda.is_available() : # ์ผ๋ฐ˜ GPU ์‚ฌ์šฉ์‹œ
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
elif torch.backends.mps.is_available(): # ๋งฅ M1 GPU ์‚ฌ์šฉ์‹œ
    device = torch.device('mps:0' if torch.backends.mps.is_available() else 'cpu')
else:
    device = torch.device('cpu')
print ("device:[%s]."%(device))
'''
<์…€ ์ถœ๋ ฅ>
PyTorch version:[1.12.1].
device:[mps:0].
'''
  • ํ•„์ž๋Š” vscode์—์„œ ํ•ด๋‹น ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜์˜€์œผ๋ฉฐ device์— M1 GPU๋ฅผ ํ• ๋‹นํ•˜์˜€๋‹ค

2.MNIST ๋ฐ์ดํ„ฐ์…‹๋กœ DataLoader ์ƒ์„ฑํ•˜๊ธฐ

# MNIST ๋ฐ์ดํ„ฐ์…‹ ๋‹ค์šด๋กœ๋“œ
from torchvision import datasets,transforms
mnist_train = datasets.MNIST(root='./data/',train=True,transform=transforms.ToTensor(),download=True)
mnist_test = datasets.MNIST(root='./data/',train=False,transform=transforms.ToTensor(),download=True)

# DataLoader ์ƒ์„ฑ
BATCH_SIZE = 256
train_iter = torch.utils.data.DataLoader(mnist_train,batch_size=BATCH_SIZE,shuffle=True,num_workers=1)
test_iter = torch.utils.data.DataLoader(mnist_test,batch_size=BATCH_SIZE,shuffle=True,num_workers=1)
  • ๋ฐฐ์น˜์‚ฌ์ด์ฆˆ๋Š” 256, suffle์€ True์ด๋ฏ€๋กœ, ๋‹ค์šด๋กœ๋“œ ๋ฐ›์€ MNIST๋ฅผ ๋žœ๋ค์œผ๋กœ 256๊ฐœ์”ฉ ์ถ”์ถœํ•ด ์ค„ ๊ฒƒ์ด๋‹ค.

3. MLP ๋ชจ๋ธ ์ƒ์„ฑํ•˜๊ธฐ

class MultiLayerPerceptronClass(nn.Module):
    """
        Multilayer Perceptron (MLP) Class - nn.Module์„ ์ƒ์†ํ•˜๋Š” ํด๋ž˜์Šค์ž„
        __init__ : ๋ณ€์ˆ˜ ์ดˆ๊ธฐํ™”
            name : ๋ชจ๋ธ๋ช…
            xdim : input ๋ฐ์ดํ„ฐ ํฌ๊ธฐ
            hdim : ํžˆ๋“ ๋ ˆ์ด์–ด ํฌ๊ธฐ
            ydim : output ๋ฐ์ดํ„ฐ ํฌ๊ธฐ
            lin_1 : input - hidden1 ์„ ํ˜•๋ณ€ํ™˜
            lin_2 : hidden1 - output ์„ ํ˜•๋ณ€ํ™˜
            init_param : ํŒŒ๋ผ๋ฏธํ„ฐ ์ดˆ๊ธฐํ™”

        init_param : ํŒŒ๋ผ๋ฏธํ„ฐ ์ดˆ๊ธฐํ™”
            nn.init.kaiming_normal_(weight) : ๊ฐ€์ค‘์น˜ ํ…์„œ์— ์ •๊ทœ๋ถ„ํฌ N(0, std^2) ๋ฅผ ๋”ฐ๋ฅด๋Š” He ์ดˆ๊ธฐํ™”๋ฅผ ์‹คํ–‰ํ•จ
            nn.init.zeros_(bia)             : ํŽธํ–ฅ ํ…์„œ์— ์Šค์นผ๋ผ 0์œผ๋กœ ์ฑ„์›€

        forward : ์ˆœ์ „ํŒŒ ์‹คํ–‰
            input - ์„ ํ˜•๋ณ€ํ™˜1 - ํ™œ์„ฑํ™”ํ•จ์ˆ˜(๋ ๋ฃจ) - ์„ ํ˜•๋ณ€ํ™˜2 - output
    """
    def __init__(self,name='mlp',xdim=784,hdim=256,ydim=10):
        super(MultiLayerPerceptronClass,self).__init__()
        self.name = name
        self.xdim = xdim
        self.hdim = hdim
        self.ydim = ydim
        self.lin_1 = nn.Linear(self.xdim, self.hdim)
        self.lin_2 = nn.Linear(self.hdim, self.ydim)
        self.init_param() # initialize parameters

    def init_param(self):
        nn.init.kaiming_normal_(self.lin_1.weight)
        nn.init.zeros_(self.lin_1.bias)
        nn.init.kaiming_normal_(self.lin_2.weight)
        nn.init.zeros_(self.lin_2.bias)

    def forward(self,x):
        net = x
        net = self.lin_1(net)
        net = F.relu(net)
        net = self.lin_2(net)
        return net

M = MultiLayerPerceptronClass(name='mlp',xdim=784,hdim=256,ydim=10).to(device)
loss = nn.CrossEntropyLoss() # ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค
optm = optim.Adam(M.parameters(),lr=1e-3) # ์˜ตํ‹ฐ๋งˆ์ด์ € : ์•„๋‹ด, ํ•™์Šต๋ฅ  1e-3
  • MultiLayerPerceptronClass๋ผ๋Š” ์ด๋ฆ„์˜ ํด๋ž˜์Šค๋ฅผ ์ƒ์„ฑํ•˜์˜€๊ณ  ํ•ด๋‹น ํด๋ž˜์Šค๋Š” nn.Module์„ ์ƒ์†ํ•œ๋‹ค.
  • init_param ํ•จ์ˆ˜์—์„œ๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๋Š”๋ฐ ๊ฐ€์ค‘์น˜ ํ…์„œ์—๋Š” He ์ดˆ๊ธฐํ™” ์ดˆ๊ธฐํ™”๋ฅผ ์‹ค์‹œํ•˜๊ณ , ํŽธํ–ฅ์€ 0์œผ๋กœ ์ฑ„์›Œ์ฃผ์—ˆ๋‹ค
  • forward ํ•จ์ˆ˜์—์„œ Layer1 - ๋ ๋ฃจํ•จ์ˆ˜ - Layer2๋ฅผ ๊ฑฐ์ฒ˜ output์ด ๋‚˜์˜ค๋„๋ก ํ•˜์˜€๋‹ค
  • ๋ณ€์ˆ˜ M์— ์ƒ์„ฑํ•œ ๋ชจ๋ธ์„ ํ• ๋‹นํ•˜์˜€๊ณ , loss๋Š” ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹คํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ์˜ตํ‹ฐ๋งˆ์ด์ €๋กœ ์•„๋‹ด, ํ•™์Šต๋ฅ ์„ 1e-3์œผ๋กœ ์ง€์ •ํ•ด์ฃผ์—ˆ๋‹ค

4. ํ•™์Šตํ•  ํด๋ž˜์Šค๋ฅผ ๋งŒ๋“ค์—ˆ์œผ๋‹ˆ ํ‰๊ฐ€ํ•ด์ค„ ํ•จ์ˆ˜๋„ ์ƒ์„ฑํ•ด๋ณด์ž


def func_eval(model,data_iter,device):
    '''
    model
        ๋ชจ๋ธ ๋ณ€์ˆ˜ ์ง€์ •
    data_iter
        torch.utils.data.DataLoader๋กœ ์ง€์ •๋œ ๋ณ€์ˆ˜ ์ง€์ •
    device
        GPU or CPU ์ง€์ •
    '''
    with torch.no_grad():
    # with torch.no_grad()์˜ ์ฃผ๋œ ๋ชฉ์ ์€ autograd์„ ๋”์œผ๋กœ์„œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ์ค„์ด๊ณ  ์—ฐ์‚ฐ์†๋„ ํ–ฅ์ƒ์‹œํ‚ด
        model.eval() # evaluate (affects DropOut and BN)
        n_total,n_correct = 0,0
        for batch_in,batch_out in data_iter:
        # ๋ฐ์ดํ„ฐ์…‹์„ batch_size ๊ฐฏ์ˆ˜๋งŒํผ feed -> X.shape : (256,1,28,28), Y.shape : (256)
            y_trgt = batch_out.to(device)
            model_pred = model(batch_in.view(-1,28*28).to(device))
            # model(x๊ฐ’) (256, 1*28*28)์œผ๋กœ ๋ณ€ํ™˜, 1,28,28์ธ ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ์ค„๋กœ ํŽผ์นœ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค
            _,y_pred = torch.max(model_pred.data,1)
            # ์˜ˆ์ธก๊ฐ’ ์ค‘ ๊ฐ€์žฅ ๋†’์€ ๊ฐ’ 1๊ฐœ๋ฅผ ๋ฐ˜ํ™˜
            n_correct += (y_trgt == y_pred).sum().item()
            # ์ •๋‹ต๊ณผ ์˜ˆ์ธก๊ฐ’์ด ๊ฒฝ์šฐ๋งŒ ์นด์šดํŠธ
            n_total += batch_in.size(0)# feed๋œ ๋ฐ์ดํ„ฐ ์ˆ˜ ์นด์šดํŠธ
        val_accr = (n_correct/n_total) # ์ •ํ™•๋„ : ์ผ์น˜๊ฐ’ ์ˆ˜ / feed๋œ ๋ฐ์ดํ„ฐ ์ˆ˜
        model.train() # back to train mode -> ์—ญ์ „ํŒŒ์‹คํ–‰
    return val_accr
  • ์œ„ ์ฝ”๋“œ ์ค‘ ๋ณ€์ˆ˜ model_pred์— batch_in.view(-1,28*28) input๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ€๋ถ„์— ๊ด€ํ•œ ์„ค๋ช…์„ ๋ง๋ถ™์ด์ž๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค
# batch_in.view(-1,1*28*28) ์˜ˆ์‹œ
num = [1,2,3,4,5,6,7,8]
tensor = torch.tensor(num)
a = tensor.view(-1,1,2,2) # a.shape : (2, 1, 2 ,2)
print('๋ณ€ํ™˜ ์ „')
print(a)
print('๋ณ€ํ™˜ ํ›„')
a.view(-1,2*2*1)
'''
<์ถœ๋ ฅ>
๋ณ€ํ™˜ ์ „
tensor([[[[1, 2],
          [3, 4]]],

        [[[5, 6],
          [7, 8]]]])
๋ณ€ํ™˜ ํ›„
tensor([[1, 2, 3, 4],
        [5, 6, 7, 8]])
'''
  • ์ „์ฒด ๋ฐ์ดํ„ฐ๊ฐ€ shape์ด (1,1,2,2)์ธ 1๊ฐœ ๋ฐ์ดํ„ฐ๊ฐ€ 2๊ฐœ ์žˆ๋Š” ํ˜•ํƒœ๋ผ๊ณ  ํ•  ๋•Œ
  • ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ์ค„๋กœ ํŽผ์น˜๋ฉด ์ตœ์ข… shape์ด (2, 4)๊ฐ€ ๋˜๋Š” ๊ฒƒ์ด๋‹ค

5. ํ•™์Šต ์ง„ํ–‰

print ("Start training.")
M.init_param() # initialize parameters
M.train()
EPOCHS,print_every = 10,1 # ํ•™์Šต ํšŸ์ˆ˜, ์ถœ๋ ฅ ์กฐ๊ฑด
for epoch in range(EPOCHS):
    loss_val_sum = 0
    for batch_in,batch_out in train_iter:
    # batch_in - X, batch_out - Y
        # Forward path
        y_pred = M.forward(batch_in.view(-1, 28*28).to(device)) # ์˜ˆ์ธก๊ฐ’ ์ถ”์ถœ
        loss_out = loss(y_pred,batch_out.to(device)) #์ •๋‹ต๊ฐ’๊ณผ ์˜ˆ์ธก๊ฐ’์˜ loss ๊ณ„์‚ฐ

        # Update - backward path
        optm.zero_grad()    # reset gradient - ์ƒˆ๋กœ ๊ณ„์‚ฐํ•œ ๋ฏธ๋ถ„๊ฐ’์„ ๋„ฃ์–ด์ฃผ๊ธฐ ์ „์— ๊ธฐ์กด์— ๊ตฌํ•œ ๋ฏธ๋ถ„๊ฐ’์„ reset
        loss_out.backward() # backpropagate - ๋ฏธ๋ถ„๊ฐ’ ๊ณ„์‚ฐ
        optm.step()         # optimizer update - ๊ตฌํ•ด์ง„ ๋ฏธ๋ถ„๊ฐ’์œผ๋กœ ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ

        # loss ์ €์žฅ
        loss_val_sum += loss_out
    loss_val_avg = loss_val_sum/len(train_iter)
    # print_every ๋‹จ์œ„๋กœ loss์™€ ์ •ํ™•๋„ Print
    if ((epoch%print_every)==0) or (epoch==(EPOCHS-1)):
        train_accr = func_eval(M,train_iter,device)
        test_accr = func_eval(M,test_iter,device)
        print ("epoch:[%d] loss:[%.3f] train_accr:[%.3f] test_accr:[%.3f]."%
               (epoch,loss_val_avg,train_accr,test_accr))
print ("Done")

'''
<์ถœ๋ ฅ>
Start training.
epoch:[0] loss:[0.387] train_accr:[0.941] test_accr:[0.941].
epoch:[1] loss:[0.173] train_accr:[0.963] test_accr:[0.960].
epoch:[2] loss:[0.123] train_accr:[0.972] test_accr:[0.967].
epoch:[3] loss:[0.094] train_accr:[0.979] test_accr:[0.971].
epoch:[4] loss:[0.076] train_accr:[0.983] test_accr:[0.974].
epoch:[5] loss:[0.062] train_accr:[0.987] test_accr:[0.976].
epoch:[6] loss:[0.051] train_accr:[0.989] test_accr:[0.978].
epoch:[7] loss:[0.042] train_accr:[0.991] test_accr:[0.978].
epoch:[8] loss:[0.035] train_accr:[0.993] test_accr:[0.979].
epoch:[9] loss:[0.030] train_accr:[0.995] test_accr:[0.979].
Done
'''
  • ์ด 10๋ฒˆ์˜ ํ•™์Šต์„ ์ง„ํ–‰ํ•ด ๋ณด์•˜๋‹ค
  • ํ•™์Šต์€ ์ˆœ์ „ํŒŒ์™€ ์—ญ์ „ํŒŒ๋ฅผ ๋ฐ˜๋ณตํ•˜๋ฉด์„œ loss๋ฅผ ์ค„์—ฌ๋‚˜๊ฐ„๋‹ค

6. Test ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ ์„ฑ๋Šฅ ํ™•์ธํ•˜๊ธฐ

# test ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋žœ๋ค์œผ๋กœ 25๊ฐœ๋ฅผ ์ถ”์ถœ
n_sample = 25
sample_indices = np.random.choice(len(mnist_test.targets), n_sample, replace=False)
test_x = mnist_test.data[sample_indices]
test_y = mnist_test.targets[sample_indices]

# ๋žœ๋ค์ถ”์ถœํ•œ test ๋ฐ์ดํ„ฐ์…‹ ์˜ˆ์ธก๊ฐ’ ์ถ”์ถœ
with torch.no_grad():
    y_pred = M.forward(test_x.view(-1, 28*28).type(torch.float).to(device)/255.)
y_pred = y_pred.argmax(axis=1)

# ์ •๋‹ต๊ฐ’,์˜ˆ์ธก๊ฐ’,์ด๋ฏธ์ง€ ์‹œ๊ฐํ™”
plt.figure(figsize=(10,10))
for idx in range(n_sample):
    plt.subplot(5, 5, idx+1)
    plt.imshow(test_x[idx], cmap='gray')
    plt.axis('off')
    plt.title("Pred:%d, Label:%d"%(y_pred[idx],test_y[idx]))
plt.show()
print ("Done")

  • ๋ชจ๋ธ์ด ์ •ํ™•ํžˆ ๋ฐ์ดํ„ฐ๋ฅผ ๋งž์ถ”๋Š” ๊ฑธ๋กœ ๋ณด์—ฌ์ง„๋‹ค
profile
AI Learning, Parcelled Innovations, Carrying All

0๊ฐœ์˜ ๋Œ“๊ธ€