FUNDAMENTAL | 22. Regularization

yeonkยท2021๋…„ 11์›” 6์ผ
0

aiffel-ai-bootcamp

๋ชฉ๋ก ๋ณด๊ธฐ
35/38
post-thumbnail

20211101

๐Ÿ’กKey Point๐Ÿ’ก
1. Regularization ๊ฐœ๋… ์ดํ•ด ๋ฐ ์ •๊ทœํ™”(Normalization)์™€ ๊ตฌ๋ถ„
2. L1 regularization, L2 regularization์˜ ์ฐจ์ด
3. Lp norm, Dropout, Batch Normalization์— ๋Œ€ํ•ด ํ•™์Šต



1. Regularization


์ •์น™ํ™”. ์˜ค๋ฒ„ํ”ผํŒ… ํ•ด๊ฒฐ๋ฒ• ์ค‘ ํ•˜๋‚˜.
์˜ค๋ฒ„ํ”ผํŒ…์„ ๋ฐฉํ•ด (train loss ์ฆ๊ฐ€ but validation/test loss ๋Š” ๊ฐ์†Œ)

์˜ค๋ฒ„ํ”ผํŒ…: train set ๊ฒฐ๊ณผ๋Š” ์ข‹๊ณ , validation/test set ๊ฒฐ๊ณผ๋Š” ์•ˆ์ข‹์€ ํ˜„์ƒ






2. Normalization


์ •๊ทœํ™”. ์ ํ•ฉํ•˜๊ฒŒ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •.
0๊ณผ 1์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋ฅผ ์กฐ์ •(z-score๋กœ ๋ฐ”๊พธ๊ฑฐ๋‚˜ minmax scaler ์‚ฌ์šฉ)

๋ฐ์ดํ„ฐ ๊ฑฐ๋ฆฌ ๊ฐ„ ์ธก์ •์ด ํ”ผ์ฒ˜ ๊ฐ’์˜ ๋ฒ”์œ„ ๋ถ„ํฌ ํŠน์„ฑ์— ์˜ํ•ด ์™œ๊ณก์œผ๋กœ ์ธํ•œ ํ•™์Šต ๋ฐฉํ•ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•จ






3. L1 Regularization(Lasso)


ฮฒ^lasso:=argminโกฮฒ12Nโˆ‘i=1N(yiโˆ’ฮฒ0โˆ’โˆ‘j=1pxijฮฒj)2+ฮปโˆ‘j=1pโˆฃฮฒjโˆฃ\hat{ฮฒ}^{lasso}:={arg} \min{ฮฒ}\frac{1}{2N} \sum^N_{i=1}(y_iโˆ’ฮฒ_0โˆ’\sum_{j=1}^px_{ij}ฮฒ_j)^2 +ฮป \sum_{j=1}^pโˆฃฮฒ_jโˆฃ






L1 norm

  • norm: ๋ฒกํ„ฐ๋‚˜ ํ–‰๋ ฌ, ํ•จ์ˆ˜ ๋“ฑ์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋‚˜ํƒ€๋ƒ„
โˆฃโˆฃxโˆฃโˆฃp:=(โˆ‘i=1nโˆฃxiโˆฃp)1/pโˆฃโˆฃxโˆฃโˆฃ_p:=(\sum_{i=1}^nโˆฃx_iโˆฃ^p)^{1/p}






  • p=1p=1 ์ด๋ฉด L1 norm์€ โˆฃโˆฃxโˆฃโˆฃ1:=โˆ‘i=1nโˆฃxiโˆฃโˆฃโˆฃxโˆฃโˆฃ_1:=\sum_{i=1}^nโˆฃx_iโˆฃ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

  • ์œ„ lasso ๋’ท์ชฝ ์ˆ˜์‹๊ณผ ์ผ์น˜ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค!






4. L2 Regularization(Ridge)


ฮฒ^ridge:=argminโกฮฒ12Nโˆ‘i=1N(yiโˆ’ฮฒ0โˆ’โˆ‘j=1pxijฮฒj)2+ฮปโˆ‘j=1pฮฒj2\hat{ฮฒ}^{ridge}:={arg} \min{ฮฒ}\frac{1}{2N} \sum^N_{i=1}(y_iโˆ’ฮฒ_0โˆ’\sum_{j=1}^px_{ij}ฮฒ_j)^2 +ฮป \sum_{j=1}^p{ฮฒ_j}^2






5. L1 vs L2 Regularization


	์ถœ์ฒ˜ : https://en.wikipedia.org/wiki/Lasso_(statistics)



  • L1 Regularization

    • ๊ฐ€์ค‘์น˜๊ฐ€ ์ ์€ ๋ฒกํ„ฐ์— ํ•ด๋‹นํ•˜๋Š” ๊ณ„์ˆ˜๋ฅผ 0์œผ๋กœ ๋ณด๋ƒ„

    • ์ฐจ์› ์ถ•์†Œ์™€ ๋น„์Šทํ•œ ์—ญํ• 

    • โˆฃฮฒโˆฃโˆฃฮฒโˆฃ๋ฅผ ์ด์šฉ โ†’ ๋งˆ๋ฆ„๋ชจ ํ˜•ํƒœ์˜ ์ œ์•ฝ์กฐ๊ฑด

    • ๋ฌธ์ œ๊ฐ€ ์ œ์•ฝ์กฐ๊ฑด๊ณผ ๋งŒ๋‚˜๋Š” ์ง€์ ์ด ํ•ด

    • ์ผ๋ถ€ ์ถ•์—์„œ โˆฃฮฒโˆฃโˆฃฮฒโˆฃ ๊ฐ’์„ 0์œผ๋กœ ๋ณด๋ƒ„



  • L2 regularization

    • ์›์˜ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚จ

    • 0์— ๊ฐ€๊น๊ฒŒ (์˜จ์ „ํžˆ 0์€ X) ๋ณด๋ƒ„

    • Lasso๋ณด๋‹ค ์ˆ˜๋ ด์ด ๋น ๋ฆ„






6. Lp norm


Norm: ๋ฒกํ„ฐ, ํ•จ์ˆ˜, ํ–‰๋ ฌ์— ๋Œ€ํ•ด ํฌ๊ธฐ๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ






vector norm

โˆฃโˆฃxโˆฃโˆฃp:=(โˆ‘i=1nxip)1/pโˆฃโˆฃxโˆฃโˆฃ_p:=(\sum_{i=1}^nx_i^p)^{1/p}






Infinity norm(p=โˆžp=โˆž)

โˆฃโˆฃxโˆฃโˆฃโˆž:=maxโก(x)โˆฃโˆฃxโˆฃโˆฃ_โˆž:=\max(x)






matrix norm

  • nn x mm ํ–‰๋ ฌ AA
  • p=1p = 1 ์ผ ๋•Œ: ์ปฌ๋Ÿผ์˜ ํ•ฉ์ด ๊ฐ€์žฅ ํฐ ๊ฐ’ ์ถœ๋ ฅ
    โˆฃโˆฃAโˆฃโˆฃ1=maxโก1โ‰คjโ‰คnโˆ‘i=1mโˆฃaijโˆฃโˆฃโˆฃAโˆฃโˆฃ_1= \max_{1โ‰คjโ‰คn}\sum_{i=1}^mโˆฃa_{ij}โˆฃ



  • p=โˆžp = โˆž ์ผ ๋•Œ: ๋กœ์šฐ์˜ ํ•ฉ์ด ๊ฐ€์žฅ ํฐ ๊ฐ’ ์ถœ๋ ฅ
    โˆฃโˆฃAโˆฃโˆฃโˆž=maxโก1โ‰คiโ‰คmโˆ‘j=1nโˆฃaijโˆฃโˆฃโˆฃAโˆฃโˆฃ_โˆž= \max_{1โ‰คiโ‰คm}\sum_{j=1}^nโˆฃa_{ij}โˆฃ






7. Dropout


์ •๋ณด๋ฅผ ๋ชจ๋“  ๋‰ด๋Ÿฐ์— ์ „๋‹ฌํ•˜์ง€ ์•Š๊ณ  ๋žœ๋คํ•˜๊ฒŒ ๋ฒ„๋ฆฌ๋ฉด์„œ ์ „๋‹ฌํ•˜๋Š” ๊ธฐ๋ฒ•

  • ์ด์ „์—๋Š” fully connected architecture๋กœ ๋ชจ๋“  ๋‰ด๋Ÿฐ ์—ฐ๊ฒฐ

  • ๋“œ๋กญ์•„์›ƒ ์ถœํ˜„ ์ดํ›„ ๋žœ๋คํ•˜๊ฒŒ ์ผ๋ถ€ ๋‰ด๋Ÿด๋งŒ ์„ ํƒํ•˜์—ฌ ์ •๋ณด ์ „๋‹ฌ

  • Regularization layer

  • ํ™•๋ฅ ์ด ๋„ˆ๋ฌด ๋†’์œผ๋ฉด ํ•™์Šต์ด ์ž˜ ๋˜์ง€ ์•Š๊ณ , ๋„ˆ๋ฌด ๋‚ฎ์œผ๋ฉด fully connected layer์™€ ๊ฐ™์Œ

  • fully connected layer์—์„œ ์˜ค๋ฒ„ํ”ผํŒ… โ†’ Dropout layer ์ถ”๊ฐ€






8. Batch Normalization


gradient vanishing, explode ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•






์ฐธ๊ณ  ์ž๋ฃŒ


Norm (mathematics)

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Dropout layer

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift






์†”์ง ๊ณ ๋ฐฑํ•˜๋ฉด,,, ํ•ด๋‹น ๋…ธ๋“œ๋Š” ๋จธ๋ฆฌ์— ์ž˜ ์•ˆ๋“ค์–ด์˜จ๋‹ค.. ๋” ๊ณต๋ถ€ํ•ด๋ด์•ผํ•  ๊ฒƒ ๊ฐ™๋‹ค ใ… ใ… 

0๊ฐœ์˜ ๋Œ“๊ธ€