๐Ÿ“Œ ๋ณธ ๋‚ด์šฉ์€ Michigan University์˜ 'Deep Learning for Computer Vision' ๊ฐ•์˜๋ฅผ ๋“ฃ๊ณ  ๊ฐœ์ธ์ ์œผ๋กœ ํ•„๊ธฐํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. ๋‚ด์šฉ์— ์˜ค๋ฅ˜๋‚˜ ํ”ผ๋“œ๋ฐฑ์ด ์žˆ์œผ๋ฉด ๋ง์”€ํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํžˆ ๋ฐ˜์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
(Stanford์˜ cs231n๊ณผ ๋‚ด์šฉ์ด ๊ฑฐ์˜ ์œ ์‚ฌํ•˜๋‹ˆ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ๋„์›€ ๋˜์‹ค ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค)๐Ÿ“Œ




0. LastTime: Back prop

  • ๊ธฐ์กด ๋ฌธ์ œ์ ) ์ผ๋ฐ˜ ์„ ํ˜•๋ถ„๋ฅ˜ or fully connected network๋Š” ์ž…๋ ฅ์ด๋ฏธ์ง€์˜ 2D๊ณต๊ฐ„ ๊ตฌ์กฐ ์กด์ค‘X
    • ๋ฌด์กฐ๊ฑด 1D๋กœ ๋ฐ”๊ฟจ์–ด์•ผ๋จ
  • ํ•ด๊ฒฐ์ฑ…) ์ด๋ฏธ์ง€, ๊ณต๊ฐ„๊ตฌ์กฐ ๋‹ค๋ฃฐ์ค„ ์•„๋Š” operator ์ƒˆ๋กœ ์ •์˜ํ•˜๋ฉด๋จ




1. ๊ตฌ์„ฑ ์š”์†Œ

1) Fully Connected network

2) Convolutional Network




2. Fully Connected Layer

  • ๋ฒกํ„ฐํ™”




3. Convolution Layer

๐Ÿ“filter๊ฐ€ weight์—ญํ•  ํ•ด์คŒ(input์— ๋Œ€ํ•œ ์˜ํ–ฅ๋ ฅ ์ „๋‹ฌ)
1) ๊ตฌ์กฐ

  • input volume 3์ฐจ์› (3x32x32) (RGB x Height x Width)
  • filter
    • weight matrix์™€ ๋™์ผ ์—ญํ• 
    • filter์˜ RGB์™€ input์˜ RGB๋งž์ถ”๋Š”๊ฒƒ ์ค‘์š”!
    • input image์˜ ๋ชจ๋“  ๊ณต๊ฐ„์œ„์น˜๋กœ ์Šฌ๋ผ์•„๋“œํ•˜์—ฌ, ๋˜ ๋‹ค๋ฅธ 3์ฐจ์› ๊ณ„์‚ฐ
    • ์ž…๋ ฅ tensor์˜ ์ „์ฒด ๊นŠ์ด์— ๊ฑธ์ณ ํ™•์žฅ๋จ

2) first filter

  • filter
    • input image์˜ ๋‚ด๋ถ€ ์–ด๋”˜๊ฐ€์— ๋ถ™์ž„

    • input tensor์˜ ์ผ๋ถ€ ๊ณต๊ฐ„ ์œ„์น˜์— ํ• ๋‹น

      โ‡’ input tensor์™€ filter ์‚ฌ์ด์˜ ๋‚ด์ ๊ณฑ ์ง„ํ–‰

  • output
    • 1๊ฐœ์˜ element (single number)
    • 1๊ฐœ์˜ filter์™€ input tensor์˜ ์ž‘์€ local chunk ๊ณ„์‚ฐ
    • input image์˜ ์ด ์œ„์น˜๊ฐ€ ํ•˜๋‚˜์˜ filter์™€ ์–ผ๋งˆ๋‚˜ ์ผ์น˜ํ•˜๋Š”์ง€ ํšจ๊ณผ์ ์œผ๋กœ ์•Œ๋ ค์ฃผ๋Š” ๋‹จ์ผ ์Šค์นผ๋ผ์ˆซ์ž ๊ณ„์‚ฐ

3) second filter

  • activation map
    • filteringํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ถ€๋ฅด๋Š” ๋‹ค๋ฅธ ๋ง
  • green filter
    • ์•ž๊ณผ weight๊ฐ’์ด ๋‹ค๋ฅธ filter

4) ์—ฌ๋Ÿฌ๊ฐœ์˜ filter

  • ์—ฌ๋Ÿฌ filter (6x3x5x5)
    • ์ง‘ํ•ฉ 6๊ฐœ์˜ 3์ฐจ์› ํ•„ํ„ฐ
    • 6 x 3 x 5 x 5 = filter์ˆ˜ x input channel์ˆ˜ x (filterํฌ๊ธฐ)
    • 6๊ฐœ์˜ activation map ๋‚˜์˜ด
  • stack activation maps (6๊ฐœ)
    • ํฌ๊ธฐ๋Š” ๋‹ฌ๋ผ์กŒ์ง€๋งŒ ์ฒซ input๊ณผ ๊ฐ™์€ 3์ฐจ์› ๊ณต๊ฐ„๊ตฌ์กฐ ๋ณด์กด
    • ํ•ด๋‹น ํ•„ํ„ฐ์— ๋Œ€ํ•œ ์ „์ฒด ์ž…๋ ฅ์ด๋ฏธ์ง€์˜ ์‘์ง‘์ •๋„ ๋‚˜ํƒ€๋ƒ„
    • ํ•ฉ์„ฑ๊ณฑ์ธต์˜ ์ถœ๋ ฅ์˜ ๊ณต๊ฐ„๊ตฌ์กฐ ์ƒ๊ฐ ๊ฐ€๋Šฅ
  • 28 x 28 grid
    • ์ž…๋ ฅ tensor์˜ ๋™์ผ ๊ณต๊ฐ„ grid์— ํ•ด๋‹น (๊ฐ ์œ„์น˜์—์„œ ํ•ฉ์„ฑ๊ณฑ์ธต์€ ํŠน์ง•๋ฒกํ„ฐ ๊ณ„์‚ฐ)
  • 6-dim bias vector
    • ์›๋ž˜ ํ•˜๋‚˜์˜ filter๋‹น 1๊ฐœ bias์žˆ์Œ
    • ์ด 6์ฐจ์›(vector) bias ์žˆ๋Š” ๊ฒƒ

5) 3์ฐจ์› tensor batch

6) ์ผ๋ฐ˜ํ™”

  • input
    • N x Cin X H x W = (๊ฐœ์ˆ˜ x batch์˜ ๊ฐ ์ž…๋ ฅimage์— ์žˆ๋Š” ์ฑ„๋„์ˆ˜ x ๊ฐ ์ž…๋ ฅ ์š”์†Œ์˜ ๊ณต๊ฐ„ ํฌ๊ธฐ)
  • output
    • N x Cout x Hโ€™ x Wโ€™ = (๊ฐœ์ˆ˜ x filter๊ฐœ์ˆ˜(Cin๊ณผ ๋‹ค๋ฅผ์ˆ˜์žˆ์Œ) x input image์˜ H,W์™€ ๋‹ค๋ฅผ์ˆ˜O)




4. Stacking Convolutions

๐Ÿ“filter(conv)๋’ค์— ํ™œ์„ฑํ™”ํ•จ์ˆ˜ ๋„ฃ์–ด์„œ ์„ ํ˜• ๊ทน๋ณต

  • Convolution layer๋ฅผ stacking๊ฐ€๋Šฅ (๋”์ด์ƒ fully connected layer X)

  • ํ•ด์„
    • input
      • 3์ฐจ์› tensor batch N๊ฐœ
    • W1 : 6x3x5x5
      • ์˜๋ฏธ: 6๊ฐœ์˜ convolution filter
    • Nx3x32x32
      • ๋ถ€๋ฅด๋Š” ์ด๋ฆ„: 3 layer CNN with input in Red, first hidden layer blue, second hidden layer green
  • Q. 2๊ฐœ convolution layers stackํ•˜๋ฉด ์–ด์ผ€๋จ? A. ๋˜ ๋‹ค๋ฅธ convolution ์–ป์Œ (y=W2W1x๋„ ์—ฌ์ „ํžˆ linear classifier์ž„)
    • ๊ฐ convolution ์ž‘์—…์ž์ฒด๊ฐ€ linear ์—ฐ์‚ฐ์ž์ด๋ฏ€๋กœ, ํ•˜๋‚˜์˜ convolution์„ ๋˜ stackํ•˜๋ฉด, ๋˜ ๋‹ค๋ฅธ ํ•ฉ์„ฑ๊ณฑ ๋งŒ๋“ค์–ด์ง

      โ‡’ ๊ทน๋ณต) ๊ฐ ์„ ํ˜• ์—ฐ์‚ฐ ์‚ฌ์ด์— ๋น„์„ ํ˜• ํ™œ์„ฑํ™”ํ•จ์ˆ˜ ์‚ฝ์ž…

      (fully connected layer๊ฐ™์ด 3์ฐจ์› tensor์˜ ๊ฐ ์š”์†Œ์— ๋Œ€ํ•ด ์ž‘๋™)




5. ํ•™์Šต๊ฐ€๋Šฅํ•œ convolutional filter

๐Ÿ“MLP: ๋ชจ๋“  ์ด๋ฏธ์ง€ template์„ ํ•™์Šตํ•จ- 1๋ฒˆ์งธ์—์„œ๋Š” ๋ชจ์„œ๋ฆฌ ์œ„์ฃผ
1) ๊ธฐ์กด linear classifier

  • conv filter๊ฐ€ ๋ญ˜ ํ•™์Šตํ•˜๋Š”๊ฐ€? โ†’ one template per class (1์ฐจ์›๋งŒ ๊ฐ€๋Šฅํ•ด์„œ)

2) MLP

  • conv filter๊ฐ€ ๋ญ˜ ํ•™์Šตํ•˜๋Š”๊ฐ€? โ†’ Bank of whole image templates (๋‹ค์ฐจ์›๋„ ๊ฐ€๋Šฅํ•ด์„œ)
  • fully connected: ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ์ „์ฒด ํฌ๊ธฐ์— ๊ฑธ์ณ ํ™•์žฅ๋จ
    = fully connected network์™€ 1๋ฒˆ์งธ layer๋Š” ๊ฐ๊ฐ ์ž…๋ ฅ ์ด๋ฏธ์ง€์™€ ๋™์ผ ํฌ๊ธฐ๋ฅผ ๊ฐ–๋Š” template bank ๊ฐ€์ง

3) First layer conv filters

  • conv filter๊ฐ€ ๋ญ˜ ํ•™์Šตํ•˜๋Š”๊ฐ€? โ†’ local image templates (๋ชจ์„œ๋ฆฌ, ๋ฐ˜๋Œ€ ์ƒ‰ ํ•™์Šต)

  • ex. green blob next to red blob
    • image์˜ ๋ฐ˜๋Œ€ ์ƒ‰์ƒ์„ ์ฐพ๊ณ ์žˆ๋‹ค๋Š” ๊ฒƒ
    • 1๋ฒˆ์งธ convolution ์—ฐ์‚ฐํ›„ 2๋ฒˆ์งธ feature์ด ๋ญ”์ง€์— ๋Œ€ํ•œ ํ•ด์„์€ ํ•ด๋‹น 3D ์ถœ๋ ฅ tensor์—์„œ ๊ฐ ํ™œ์„ฑํ™” map์ด ๊ฐ ์œ„์น˜์˜ ์ •๋„ ์ œ๊ณต
    • ํ•ด๋‹น chunk๊ฐ€ 1๋ฒˆ์งธ layer์—์„œ ํ•™์Šต๋œ ๊ฐ template๊ณผ ์–ผ๋งˆ๋‚˜ ์ผ์น˜ํ•˜๋Š”์ง€ (ex. ํ•ฉ์„ฑ๊ณฑ ๋„คํŠธ์›Œํฌ์˜ 1๋ฒˆ์งธ ์ธต์—์„œ ํ•™์Šต๋œ ์ด๋Ÿฌํ•œ ํ•„ํ„ฐ๋“ค๊ณผ ์œ ์‚ฌํšจ๊ณผ)




6. Padding

๐Ÿ“๊ธฐ์กด: W-K+1 โ†’ ํŒจ๋”ฉํ›„: W-K+1+2P

  • ๊นŠ์ด, ์ฑ„๋„ ๊ณ ๋ คX

1) ๊ธฐ๋ณธ

  • ์ถœ๋ ฅํฌ๊ธฐ (์ผ๋ฐ˜ํ™”) : W-K+1
  • ๋ฌธ์ œ์  : feature map์ด ๊ฐ layer๋งˆ๋‹ค ์ค„์–ด๋“ฆ(๊ณต๊ฐ„ ์ฐจ์›์ด ์ค„์–ด๋“ฆ)

2) ํ•ด๊ฒฐ์ฑ…: ํŒจ๋”ฉ

  • zero-padding
  • ํŒจ๋”ฉ ์ถ”๊ฐ€ ํ›„
  • output ์ผ๋ฐ˜ํ™” ์‹ : W-K+1+2P
  • Same padding
    • ์ž…์ถœ๋ ฅ์ด ๋™์ผ ๊ณต๊ฐ„ ํฌ๊ธฐ ๊ฐ€์ ธ์„œ โ†’ ๊ณต๊ฐ„ ํฌ๊ธฐ ์ถ”๋ก ์ด ์‰ฌ์›Œ์ง




7. Receptive fields

1) 1๊ฐœ์˜ conv layer ์ ์šฉ

                      (1)                               (2)

  • ํ•ด์„
    • output image์˜ ๊ฐ๊ฐ์˜ ๊ณต๊ฐ„ ์œ„์น˜๋Š” input image์˜ local region์—๋งŒ ์˜์กด
    • ex. 2๋Š” 1์˜ ์˜์—ญ์—๋งŒ ์˜์กด
      • 1์˜ ์˜์—ญ: receptive field of the value of output tensor

2) ์—ฌ๋Ÿฌ๊ฐœ์˜ conv layer๋ฅผ stack

  • ํ•ด์„
    • ๋…น์ƒ‰ ์˜์—ญ์€ ์ „์ด์ ์œผ๋กœ ๋งจ ์™ผ์ชฝ input tensor์˜ ์ฃผํ™ฉ๋ถ€๋ถ„์˜ ๊ณต๊ฐ„์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง
  • 2๊ฐ€์ง€ ํ•ด์„
    • receptive field in the input: ์—ฌ๋Ÿฌ๊ฐœ์˜ ํ•ฉ์„ฑ๊ณฑ์ธต ๊ฑฐ์นœ ํ›„, ํ•ด๋‹น ๋‰ด๋Ÿฐ ๊ฐ’์— ์˜ํ–ฅ ๋ฏธ์น ์ˆ˜์žˆ๋Š” input image์˜ ๊ณต๊ฐ„ ํฌ๊ธฐ
    • receptive field in the previous layer: ์ด์ „์ธต์˜ ์˜ํ–ฅ
  • ๋ฌธ์ œ์ 
    • ๋งค์šฐ ๋†’์€ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋กœ ์ž‘์—…ํ•˜๋ ค๋ฉด โ†’ conv layer๋งŽ์ด ์Œ“์•„์•ผ๋จ
      = ๋งค์šฐ ํฐ receptive field ์œ ์ง€์œ„ํ•ด ์—„์ฒญ ๋งŽ์€ conv layer ์Œ“์•„์•ผ๋จ
  • ํ•ด๊ฒฐ์ฑ…
    • stride ์จ์„œ downsamplingํ•˜๊ธฐ




8. Stride convolution

๐Ÿ“receptive field ๋” ๋นจ๋ฆฌ ๊ตฌ์ถ• โ†’ ((W-K+2P)/S)+1)
1) Stride

  • ๊ฐœ๋…
    • ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์œ„์น˜์— conv filter ๋ฐฐ์น˜ํ•˜๋Š” ๋Œ€์‹ , ๊ฐ€๋Šฅํ•œ N์˜ ์œ„์น˜๋งˆ๋‹ค ๋ฐฐ์น˜
    • ex. stride=2 โ†’ output=3x3
  • output
    • downsample๋จ
    • receptive field๋ฅผ ๋” ๋นจ๋ฆฌ ๊ตฌ์ถ•๊ฐ€๋Šฅ
      • ๋ชจ๋“  layer์—์„œ receptive field๊ฐ€ 2๋ฐฐ๊ฐ€ ๋˜๊ธฐ ๋•Œ๋ฌธ
    • ((W-K+2P)/S)+1




9. Recap: Convolution Example

๐Ÿ“๊ฑ ์ผ๋ฐ˜ํ™” ์‹๋“ค ์ •๋ฆฌ
1) output volume size?

  • ์ฃผ์˜์ 
    • output์€ filter ๊ฐœ์ˆ˜์™€ ๋™์ผํ•ด์•ผ๋จ !!!!

2) Number of learnable parameters?

  • ์ผ๋ฐ˜ํ™” ์‹
    • filter ๊ฐœ์ˆ˜ (channel์ˆ˜filterํฌ๊ธฐ(k*k)+1(bias))

3) Number of multiply-add operations?

  • ์ผ๋ฐ˜ํ™” ์‹
    • output volume size * 1filter

4) example: 1x1 convolution




10. Convolution Summary

  • ๋น„๊ต
fully connected layer1x1 conv layer
๊ณต๊ฐ„๊ตฌ์กฐ ํŒŒ๊ดด : ์ „์ฒด tensor ํ•˜๋‚˜๋กœ ํ‰๋ฉดํ™” โ†’ ๋ฒกํ„ฐ ์ถœ๋ ฅ๊ณต๊ฐ„๊ตฌ์กฐ ์œ ์ง€
: ์‹ ๊ฒฝ๋ง ๋‚ด๋ถ€์˜ adapter ์‚ฌ์šฉ




11. Other types of convolution




12. Pooling Layers

๐Ÿ“pooling๋” ์“ฐ๋Š” ์ด์œ : ํ•™์ŠตํŒŒ๋ผ๋ฏธํ„ฐX, ๊ฐ’์ด ์•ˆ๋ณ€ํ•จ

  • ๊ฐœ๋…

    • ํ•™์Šต ๋งค๊ฐœ๋ณ€์ˆ˜(ํŒŒ๋ผ๋ฏธํ„ฐ) ์—†์Œ
  • ํŒŒ๋ผ๋ฏธํ„ฐ

    • kernel size
    • stride
    • pooling function
  • Max pooling

    a. Max pooling with 2x2
    (= kernel size (2,2) & stride=2)
    โ†’ kernel size = stride๋ฉด ์ค‘์ฒฉ๋˜๋Š” ์˜์—ญX
    b. pooling์ด stride๋ณด๋‹ค ๋” ๋งŽ์ด ์“ฐ์ด๋Š” ์ด์œ 

    • ์ด์œ 1) max pooling๊ฐ™์€ ๊ฒฝ์šฐ, ๋ณ€ํ™˜์— ์ผ์ •๋Ÿ‰์˜ ๋ถˆ๋ณ€์„ฑ ์žˆ์Œ(ํ•ด๋‹น ๊ตฌ์—ญ์—์„œ ๊ฐ’์ด ์•ˆ๋ณ€ํ•จ)
    • ์ด์œ 2) learnable parameter์—†์Œ
  • pooling summary




13. ์•ž์„œ ๋ฐฐ์šด๊ฒƒ๋“ค ๊ฒฐํ•ฉํ•œ Convolutional Networks

1) ๊ธฐ๋ณธ CNN ๊ตฌ์กฐ

  • ์•ž์„œ ๋ฐฐ์šด๊ฒƒ๋“ค ์—ฐ๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ• ๋งŽ์Œ
    • ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์žˆ์–ด์„œ

2) ์˜ˆ์‹œ: LeNet-5

  • ํ•ด์„
    • convํ›„์— relu๋„ฃ๋Š”๊ฒŒ ์ผ๋ฐ˜์ 
  • Q. maxpool๋กœ ๋น„์„ ํ˜•ํ•˜๊ฒŒ ๋งŒ๋“ค์ˆ˜์žˆ๋Š”๋ฐ ์™œ reluํ•จ? A. ๊ฑ relu๋„ฃ๋Š”๊ฒŒ ์ผ๋ฐ˜์  (๋” ๋งŽ์€ ๊ทœ์น™์„ฑ ์ œ๊ณต)




14. (Fully connected Network์—์„œ) Batch Normalization

๐Ÿ“์„ ํ˜•์ ์œผ๋กœ ํ• ์ˆ˜์žˆ์Œ
1) ๊ฐœ๋…

  • ๋„คํŠธ์›Œํฌ ๋‚ด๋ถ€์— ์ผ์ข…์˜ layer์ถ”๊ฐ€ํ•˜์—ฌ deep network๋ฅผ trainํ• ์ˆ˜ ์žˆ๋„๋ก
  • (ํ‰๊ท =0, ๋‹จ์œ„๋ถ„์‚ฐ๋ถ„ํฌ ์žˆ๋„๋ก) ์ด์ „ layer๋กœ๋ถ€ํ„ฐ ๋‚˜์˜จ ๊ฒฐ๊ณผ๋ฅผ ์–ด๋–ค ๋ฐฉ์‹์œผ๋กœ๋“  ์ •๊ทœํ™”ํ•˜๊ธฐ

2) ์™œ ์ •๊ทœํ™”ํ•ด์•ผ๋จ?

  • internal covariate shift (ICS) โ†“ โ†’ ํ•™์Šต ๊ณผ์ • ์•ˆ์ •ํ™”, ์ตœ์ ํ™” โ†‘
    • ICS: ํ•™์Šต๊ณผ์ •์—์„œ ๊ฐ ์ธต์˜ ์ž…๋ ฅ ๋ถ„ํฌ๊ฐ€ ๋ณ€ํ•˜๋Š” ๊ฒƒ ์˜๋ฏธ
    • ์ผ๋ฐ˜์ ์œผ๋กœ ์‹ ๊ฒฝ๋ง์—์„œ ํ•™์Šต์ด ์ง„ํ–‰๋ ์ˆ˜๋ก ๊ฐ€์ค‘์น˜, ํŽธํ–ฅ์ด ์—…๋ฐ์ดํŠธ ๋˜๊ณ , ์ด๋Š” ๊ฐ ์ธต์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ์˜ํ–ฅ์คŒ
      โ†’ ์ด๋กœ ์ธํ•ด ์ด์ „ ์ธต์—์„œ ํ•™์Šตํ•œ ํ‘œํ˜„๋“ค์ด ๋ณ€๊ฒฝ๋˜์–ด ๋‹ค์Œ ์ธต์— ์˜ํ–ฅ ๋ฏธ์น˜๊ฒŒ ๋จ
    • ICS๋Š” ํ•™์Šต์˜ ์•ˆ์ •์„ฑ, ์†๋„ โ†“์‹œํ‚ด

3) Batch Norm ์‹

  • backprop์— ์‚ฌ์šฉ์ด ์–ด์ผ€๋˜๋ƒ
    • ๋ฏธ๋ถ„ ๊ฐ€๋Šฅ ํ•จ์ˆ˜์—ฌ์„œ gradient๋ฅผ ์ „๋‹ฌํ•ด์ค„์ˆ˜O

4) ํŠน์ง•

  • (ICS์ œ๊ฑฐ ์œ„ํ•ด) ๊ฐ layer์˜ ์ž…๋ ฅ feature ๋ถ„ํฌ๋ฅผ re-centering, re-scaling
  • ๊ฐ layer๋งˆ๋‹ค input์˜ ๋ถ„ํฌ๊ฐ€ ๋‹ฌ๋ผ์ง€๋Š” ๊ฒƒ ๋ฐฉ์ง€

  • batch โ†•

    • ๊ฐ ๋ฒกํ„ฐ์˜ ํ‰๊ท ๊ฐ’
  • ์ฒซ๋ฒˆ์งธ ์‹

    • ์ฑ„๋„๋ณ„ ํ‰๊ท ๊ฐ’
  • ๋‘๋ฒˆ์žฌ ์‹

    • ์ฑ„๋„๋ณ„ ๋ถ„์‚ฐ
  • ์„ธ๋ฒˆ์งธ ์‹

    • 1,2๋ฒˆ์งธ ์‹ ๊ฐ€์ ธ์™€์„œ ์ •๊ทœํ™”
    • e : 0์œผ๋กœ ๋‚˜๋ˆ„์ง€ ์•Š๊ธฐ ์œ„ํ•จ(์ž‘์€ ์ƒ์ˆ˜)
    • zero centered ๋จ
  • ๋ฌธ์ œ์  (ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋„ฃ๋Š” ๊ถ๊ทน์  ์ด์œ )

    • Q. zero mean์ผ๋•Œ, ์ œ์•ฝ์ด ๋งŽ์ด ๊ฑธ๋ฆฌ์ง€ ์•Š์„๊นŒ?
  • ํ•ด๊ฒฐ์ฑ…

    • ๊ธฐ๋ณธ ์ •๊ทœํ™” ํ›„ ์ถ”๊ฐ€ ์ž‘์—…(ํ•™์Šต๊ฐ€๋Šฅ ํŒŒ๋ผ๋ฏธํ„ฐ ๋„ฃ๊ธฐ) ํ•„์š”

      โ‡’ identity function ์„ ์ปค๋ฒ„ํ•ด์คŒ

5) ํ•™์Šต ์‹œ batch normalization

  • ์ตœ์ข… batch normalization ์‹

    • x^i,j\hat{x}_{i,j} : ์ •๊ทœํ™”๋œ input
    • yi,jy_{i,j} : ๋ฒกํ„ฐ์˜ ๊ฐ ์š”์†Œ์—์„œ ๋ณด๊ณ ์žํ•˜๋Š” ํ‰๊ท , ๋ถ„์‚ฐ์ด ๋ญ”์ง€ ์Šค์Šค๋กœ ํ•™์Šต ๊ฐ€๋Šฅ

6) ๊ฒ€์ฆ(test)์‹œ batch normalization

a. ๋ฌธ์ œ์ 

  • ๋‹ค๋ฅธ input ๋„ฃ์—ˆ๋Š”๋ฐ, ๋„ฃ๋Š” ๊ฒƒ์ด ๋‹ฌ๋ผ๋„ ์ ์ˆ˜ ๊ฐ™์•„๋ฒ„๋ฆผ
  • ๊ฐ™์€ input ๋„ฃ์—ˆ๋Š”๋ฐ, output์ด ๋‹ฌ๋ผ์ ธ๋ฒ„๋ฆผ
  • ex. ๊ณ ์–‘์ด ์‚ฌ์ง„ ๋„ฃ์—ˆ์„๋•Œ์˜ ์ ์ˆ˜์™€, ๊ฐ•์•„์ง€ ์‚ฌ์ง„ ๋„ฃ์—ˆ์„๋•Œ์˜ ์ ์ˆ˜๊ฐ€ ๊ฐ™์•„๋ฒ„๋ฆด๋•Œ
  • ex. ์›น์„œ๋น„์Šค์—์„œ ๋™์‹œ์— ๊ณ ๊ฐ์ด ๊ฐ™์€ ์ž๋ฃŒ ์—…๋กœ๋“œํ–ˆ๋Š”๋ฐ output์ด ๋‹ค๋ฅผ๋•Œ

b. ํ•ด๊ฒฐ์ฑ…

  • ๋ฐฐ์น˜์˜ ์š”์†Œ์— ๋Œ€ํ•ด ๋ชจ๋ธ์ด ๋…๋ฆฝ์ ์ด์—ฌ์•ผ๋จ (= train๊ณผ test ๋‘˜๋‹ค์—์„œ ์„ฑ๋Šฅ ์ข‹์•„์•ผ๋จ)

  • train ์‹œ์—๋Š” ๊ฒฝํ—˜์ ์œผ๋กœ ํ•˜์ง€๋งŒ, test์—์„œ๋Š” ๊ทธ๋ ‡์ง€ X (= batch์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๊ณ„์‚ฐX)

  • ๋ฐฉ๋ฒ•

    • train
      • ๋ชจ๋“  ์ƒˆ๋กœ์šด ๋ฒกํ„ฐ์™€ ์‹œ๊ทธ๋งˆ ๋ฒกํ„ฐ์˜ ๋ชจ๋“  ํ‰๊ท  ์ค‘ ์ผ๋ถ€ ์‹คํ–‰ ์ค‘์ธ ์ง€์ˆ˜ ํ‰๊ท  ์ถ”์ 
    • test
      • batch ์š”์†Œ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹ , โˆ—โˆ—Mj,**M_j, ฯƒ\sigma ๊ฐ™์€ ๊ณ ์ • scalar** ๋ ๊ฒƒ (์ƒ์ˆ˜)
      • test ์‹œ๊ฐ„ ๋ฐฐ์น˜์˜ ์š”์†Œ๊ฐ„์— ๋…๋ฆฝ์„ฑ ํšŒ๋ณต ๊ฐ€๋Šฅ
  • ์ˆ˜์‹

    • ํ•ด์„
      • Mj,ฯƒM_j, \sigma๊ฐ€ ์ƒ์ˆ˜๋ฉด, yi,jy_{i,j}๊ฐ€ linear ์—ฐ์‚ฐ๋  ๊ฒƒ
        • test์—์„œ batch์ •๊ทœํ™”๊ฐ€ ๋…๋ฆฝ์ 
        • test์— ์ด์ „ ์„ ํ˜• ์—ฐ์‚ฐ์ž๊ฐ€ ์œตํ•ฉํ• ์ˆ˜์žˆ๊ธฐ์— test์‹œ๊ฐ„ ์˜ค๋ฒ„ํ—ค๋“œ=0์ด ๋จ
        • ex. CNN์—์„œ ๋ฐฐ์น˜์ •๊ทœํ™” ๋’ค์— conv ์žˆ์œผ๋ฉด, 2๊ฐœ์˜ ์„ ํ˜•์—ฐ์‚ฐ์„ 1๊ฐœ๋กœ ์œตํ•ฉ๊ฐ€๋Šฅ
        • yi,jy_{i,j}: (๋ฐฐ์œจ, ์ด๋™) scaling๋‹จ๊ณ„์—์„œ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜ ฮณ\gamma ๊ณฑํ•˜๊ณ  ํ•™์Šต๋œ ๊ฐ’์œผ๋กœ ฮฒj\beta_j๋งŒํผ ์ด๋™




15. (Convolutional Network์—์„œ) Batch Normalization

๐Ÿ“์žฅ์ (train์‰ฝ๊ฒŒ, LR๋†’๊ฒŒ,regularization, test ์ถ”๊ฐ€๋น„์šฉx) ๋ฐ ๋‹จ์  ์กด์žฌ
1) ๋น„๊ต

  • fully connected
    • ์•ž์„œ ๋งํ•œ๊ฒƒ
  • convolutional
    • batch ์ฐจ์›์— ๋Œ€ํ•œ ํ‰๊ท ํ™” + spatial(๊ณต๊ฐ„์ฐจ์›)์˜ input์— ๋Œ€ํ•œ ํ‰๊ท ํ™”

2) ์œ„์น˜

  • FC๋’ค or ํ™œ์„ฑํ™”ํ•จ์ˆ˜ ์•ž์— ์œ„์น˜ํ•จ

3) ํŠน์ง•

  • ์žฅ์ 
    • ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง train์„ ๋” ์‰ฝ๊ฒŒ ๋งŒ๋“ฆ
    • ๋” ๋†’์€ LR์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ณ , ๋” ๋นจ๋ฆฌ ์ˆ˜๋ ด ๊ฐ€๋Šฅ
    • ๋„คํŠธ์›Œํฌ ์ดˆ๊ธฐํ™”์— ๋” ๊ฒฌ๊ณ ํ•ด์ง
    • ํ•™์Šต ์ค‘ regularization๊ณผ ๊ฐ™์€ ์—ญํ• 
    • test์‹œ ์ถ”๊ฐ€ ๋น„์šฉ ์—†์Œ: conv์™€ ๋ณ‘ํ•ฉ ๊ฐ€๋Šฅ
  • ๋‹จ์ 
    • ์ด๋ก ์  ํ•ด์„ ๋ถ€์กฑ : ์ตœ์ ํ™”์— ๋„์›€ ๋˜๋Š” ์ด์œ ์— ๋Œ€ํ•œ ์ •ํ™•ํ•œ ์ดํ•ดX
    • train, test์—์„œ ๋‹ค๋ฅด๊ฒŒ ๋™์ž‘ํ•จ โ†’ ํ”ํ•œ ๋ฒ„๊ทธ์˜ ์›์ธ์ด ๋จ




16. Layer Normalization

๐Ÿ“๋ฐฐ์น˜์ฐจ์›ํ‰๊ท X, D ํ‰๊ท O

  • Batch norm์˜ train- test์—์„œ ๋‹ค๋ฅธ ์ž‘์—…ํ•˜๋Š” ๊ฒƒ์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…
  • ๊ธฐ์กด์— ๋Œ€ํ•œ ๋ณ€ํ˜• โ†’ transformer, RNN์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ

1) ํŠน์ง•

  • ๊ธฐ์กด๊ณผ ๊ณตํ†ต์ 
    • MM, ฯƒ\sigma ๊ตฌํ•˜๊ณ  ์ •๊ทœํ™” ๊ณผ์ •์€ ๋˜‘๊ฐ™์Œ
  • ์ฐจ์ด์ 
    • ๋ฐฐ์น˜์ฐจ์›์— ๋Œ€ํ•œ ํ‰๊ท  ๋Œ€์‹ , ๊ธฐ๋Šฅ ์ฐจ์›(D)์— ๋Œ€ํ•œ ํ‰๊ท ๊ณ„์‚ฐ (โ†”)
    • train ์š”์†Œ์— ์˜์กด ์•ˆํ•˜๋ฏ€๋กœ, train๊ณผ test์— ๊ฐ™์€ ์ž‘์—… ๊ฐ€๋Šฅ




17. Instance Normalization

๐Ÿ“๋ฐฐ์น˜์ฐจ์›ํ‰๊ท X, D ํ‰๊ท X, ๊ณต๊ฐ„์ฐจ์› ํ‰๊ท O

  • (CNN์—์„œ) ๊ณต๊ฐ„ ์ฐจ์›์— ๋Œ€ํ•ด์„œ๋งŒ ํ‰๊ท  ๊ตฌํ•จ




18. ์ตœ์ข… ๋น„๊ต

๐Ÿ“CNN์—์„œ Batch norm, Layer norm, Instance norm, Group norm

(์˜ˆ๋ฅผ ๋“ค์–ด input์ด ์ด๋ฏธ์ง€set (2x3x64x64)๊ฐ€ input์ด๋ผ๊ณ  ํ–ˆ์„ ๋•Œ..)

  • Batch norm: ์ด๋ฏธ์ง€set์ „๋ถ€์™€ ๊ฐ๊ฐ์˜ ์ฑ„๋„๋ณ„๋กœ ์ •๊ทœํ™”ํ•œ๋‹ค. (2๊ฐœ์˜ ์ด๋ฏธ์ง€์™€ R, 2๊ฐœ์˜ ์ด๋ฏธ์ง€์™€ G, 2๊ฐœ์˜ ์ด๋ฏธ์ง€์™€ B)(2x1x64x64)
  • Layer norm: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ์ •๊ทœํ™” ํ•œ๋‹ค. (1x3x64x64)
  • Instance norm: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ๊ฐ๊ฐ์˜ ์ฑ„๋„๋ณ„๋กœ ์ •๊ทœํ™” ํ•œ๋‹ค. (์ฑ„๋„ R์— ๋Œ€ํ•œ 1๊ฐœ์˜ ์ด๋ฏธ์ง€)(1x1x64x64)
  • Group norm: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ์ฑ„๋„์— ๋Œ€ํ•ด ์ •๊ทœํ™” ํ•œ๋‹ค. (์ฑ„๋„ R๊ณผ G์— ๋Œ€ํ•œ 1๊ฐœ์˜ ์ด๋ฏธ์ง€)(1x2x64x64)
profile
๐Ÿ–ฅ๏ธ

0๊ฐœ์˜ ๋Œ“๊ธ€