딥러닝 기본 : MLP로 mnist 이미지 분류.

Henry Lee·2021년 8월 15일
0

Deep Learning A to Z (MLP, mnist)

Prepare & Pre-process

import tensorflow as tf

Load mnist dataset

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train.shape

(60000, 28, 28)

print(x_train.shape, ': train set dim')
print(x_test.shape, ': test set dun')
(60000, 28, 28) : train set dim
(10000, 28, 28) : test set dun

MLP (Multi Layer Perceptron)

  • Dataset : mnist
  • Goal : test set의 라벨을 모른다 가정하고, test set의 accuracy를 최대로 하는 것이 목표.
  • Model의 train_epochs = 10 고정.
  • Validation 과정을 생략한다.

Modeling I

  • Structute : MLP (Multi Layer Perceptron)
  • activation : sigmoid
  • optimizer : SGD (Stochastic Gradient Descent)
  • loss : MAE (Mean Absolute Error)
  • metric : accuracy
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(128, activation='sigmoid'),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='sgd',
              loss='mae',
              metrics=['accuracy'])

model.summary()

Model: "sequential"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 128)               100480    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1195
Epoch 2/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1195
Epoch 3/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1195
Epoch 4/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1195
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1195
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1195
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1195
Epoch 8/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1195
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1195
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1195
313/313 [==============================] - 1s 1ms/step - loss: 4.3630 - accuracy: 0.1241

[4.3629984855651855, 0.12409999966621399]

정확도 12% 남짓.. 쓸모없는 모델이 완성되었다. 버리자!

다음 모델에서 개선할 점은?

  • 모델 구조 내 활성화함수 (actviation)
    • sigmoid
    • relu
  • 컴파일 단계에서 최적화 (optimizer)
    • SGD
    • Adam
  • 컴파일 단계에서 손실함수 (loss func)
    • MAE
    • Sparse Categorical Crossentropy

Modeling II

  • Structute : MLP (Multi Layer Perceptron)
  • activation : relu (Rectified Linear Unit)
  • optimizer : SGD (Stochastic Gradient Descent)
  • loss : MAE (Mean Absolute Error)
  • metric : accuracy
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='sgd',
              loss='mae',
              metrics=['accuracy'])

model.summary()

Model: "sequential_1"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0967
Epoch 2/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0967
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 4.3737 - accuracy: 0.0967
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 4.3737 - accuracy: 0.0967
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0967
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0967
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0967
Epoch 8/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0967
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0967
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0967
313/313 [==============================] - 1s 1ms/step - loss: 4.3630 - accuracy: 0.1005

[4.3629984855651855, 0.10050000250339508]

활성화함수를 바꾸자, 정확도 10% 남짓.. 더 쓸모없는 모델이 완성되었다. 버리자!

왜 개선되지 않았을까?

  • 활성화함수를 sigmoid에서 relu로 바꾸는 것은 backpropagation 단계에서의 gradient vanishing을 방지하는 효과를 가져오는데, 모델 구조가 전혀 deep하지 않아 효과가 없는 것으로 추정.

다음 모델에서 개선할 점은?

  • 컴파일 단계에서 최적화 (optimizer)
    • SGD
    • Adam
  • 컴파일 단계에서 손실함수 (loss func)
    • MAE
    • Sparse Categorical Crossentropy

Modeling III

  • Structute : MLP (Multi Layer Perceptron)
  • activation : relu (Rectified Linear Unit)
  • optimizer : Adam (Adaptive Gradient Descent Momentum)
  • loss : MAE (Mean Absolute Error)
  • metric : accuracy
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
              loss='mae',
              metrics=['accuracy'])

model.summary()

Model: "sequential_2"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_2 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_5 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1275
Epoch 2/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1085
Epoch 3/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.1021
Epoch 4/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0990
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0982
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0983
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0955
Epoch 8/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0899
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0708
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 4.3737 - accuracy: 0.0740
313/313 [==============================] - 0s 1ms/step - loss: 4.3630 - accuracy: 0.0818

[4.3629984855651855, 0.08179999887943268]

Optimizer를 바꾸자, 정확도 8% 남짓.. 더 쓸모없는 모델이 완성되었다. 버리자!

왜 개선되지 않았을까?

  • Optimizer를 SGD에서 Adam으로 바꾸는 것은 Local optima 혹은 minima를 벗어나기 위한 방법이지만, 해당 문제에서는 Loss Function이 제대로 기능하지 않아 그 이슈에까지 도달하지 않는 것으로 추정.
  • 사실, Epoch breif에서 loss 값이 변화없다는 것을 캐치했을 때부터 loss func을 바꿨어야 했다.

다음 모델에서 개선할 점은?

  • 컴파일 단계에서 손실함수 (loss func)
    • MAE
    • Sparse Categorical Crossentropy

Modeling IV

  • Structute : MLP (Multi Layer Perceptron)
  • activation : sigmoid
  • optimizer : SGD (Stochastic Gradient Descent)
  • loss : Sparse Categorical Crossentropy (VS. Categorical Crossentropy)
  • metric : accuracy
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(128, activation='sigmoid'),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='sgd',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "sequential_3"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_3 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_6 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_7 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.6591 - accuracy: 0.8305
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3703 - accuracy: 0.8993
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3165 - accuracy: 0.9128
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2899 - accuracy: 0.9199
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2742 - accuracy: 0.9224
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2565 - accuracy: 0.9288
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2499 - accuracy: 0.9292
Epoch 8/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2375 - accuracy: 0.9315
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2316 - accuracy: 0.9339
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2233 - accuracy: 0.9364
313/313 [==============================] - 0s 1ms/step - loss: 0.2237 - accuracy: 0.9365

[0.22368119657039642, 0.9365000128746033]

Loss Function을 바꾸자, 정확도 94% 남짓.. 엄청난 개선을 이뤘다. 겟챠!

왜 개선되었을까

  • 클래스(라벨)가 10개인 분류 문제에서 cross_entropy를 사용한 loss func 접근이 상당한 효과를 보았다고 추정.

다음 모델에서 개선할 점은?

  • 모델 구조 내 활성화함수 (actviation)
    • sigmoid
    • relu
  • 컴파일 단계에서 최적화 (optimizer)
    • SGD
    • Adam

Modeling V

  • Structute : MLP (Multi Layer Perceptron)
  • activation : relu
  • optimizer : SGD (Stochastic Gradient Descent)
  • loss : Sparse Categorical Crossentropy (VS. Categorical Crossentropy)
  • metric : accuracy
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='sgd',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "sequential_4"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_4 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_9 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 44.9805 - accuracy: 0.1229
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 2.3024 - accuracy: 0.1128
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 2.2928 - accuracy: 0.1173
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 2.2690 - accuracy: 0.1347
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 2.1969 - accuracy: 0.1732
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 2.1317 - accuracy: 0.2011
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 2.1366 - accuracy: 0.2027
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 2.1529 - accuracy: 0.1903
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 2.1590 - accuracy: 0.1882
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 2.1221 - accuracy: 0.1962
313/313 [==============================] - 1s 1ms/step - loss: 2.1777 - accuracy: 0.1714

[2.1777477264404297, 0.17139999568462372]

개선된 모델 (정확도 90% 이상)에서 활성화함수를 바꾸자, 정확도 17% 남짓.. 쓸모없는 모델이 되어버렸다. 버리자!

왜 개선되지 않았을까?

  • gradient vanishing이 나타날 정도로 모델이 deep하지 않다. 특히 relu는 음의 영역 (-)에서는 기울기가 0가 되는데 이것이 악영향을 미친 것으로 추정.
  • 그리고 Local Optima 혹은 minima에서 빠져나오지 못한 것으로 추정.

다음 모델에서 개선할 점은?

  • 컴파일 단계에서 최적화 (optimizer)
    • SGD
    • Adam

Modeling VI

  • Structute : MLP (Multi Layer Perceptron)
  • activation : relu
  • ptimizer : Adam
  • loss : Sparse Categorical Crossentropy (VS. Categorical Crossentropy)
  • metric : accuracy
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "sequential_5"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_5 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_10 (Dense)             (None, 128)               100480    
_________________________________________________________________
dense_11 (Dense)             (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

Epoch 1/10
1875/1875 [==============================] - 5s 2ms/step - loss: 2.2622 - accuracy: 0.8564
Epoch 2/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3762 - accuracy: 0.9103
Epoch 3/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2804 - accuracy: 0.9274
Epoch 4/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2414 - accuracy: 0.9388
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2220 - accuracy: 0.9453
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2057 - accuracy: 0.9478
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1980 - accuracy: 0.9525
Epoch 8/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1906 - accuracy: 0.9527
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1898 - accuracy: 0.9545
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1732 - accuracy: 0.9576
313/313 [==============================] - 1s 1ms/step - loss: 0.2694 - accuracy: 0.9499

[0.2694116234779358, 0.9498999714851379]

퇴보한 모델에서 Optimizer를 바꾸자, 정확도 95% 남짓.. 다시 성능을 회복했다. 겟챠!

왜 개선되었을까

  • Adam optimizer가 Local optima 혹은 minima를 벗어나게 해줘, 다시 Global optima 혹은 minima에 도달한 것으로 추정.

다음 모델에서 개선할 점은?

  • 이제 해당 모델 구조에서 이룰 수 있는 성능에는 한계로 추정.
  • 다른 레이어를 추가하는 등의 모델 구조에서 개선이 필요.

Normalization

model = tf.keras.models.Sequential([
                                    tf.keras.layers.experimental.preprocessing.Normalization(input_shape=(28,28)),
                                    tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "sequential_6"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
normalization (Normalization (None, 28, 28)            57        
_________________________________________________________________
flatten_6 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 128)               100480    
_________________________________________________________________
dense_13 (Dense)             (None, 10)                1290      
=================================================================
Total params: 101,827
Trainable params: 101,770
Non-trainable params: 57
_________________________________________________________________
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

Epoch 1/10
1875/1875 [==============================] - 5s 2ms/step - loss: 2.4881 - accuracy: 0.8688
Epoch 2/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3483 - accuracy: 0.9146
Epoch 3/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2768 - accuracy: 0.9309
Epoch 4/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2467 - accuracy: 0.9376
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2291 - accuracy: 0.9438
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2164 - accuracy: 0.9467
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2065 - accuracy: 0.9483
Epoch 8/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2010 - accuracy: 0.9507
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1862 - accuracy: 0.9539
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1811 - accuracy: 0.9551
313/313 [==============================] - 1s 2ms/step - loss: 0.3318 - accuracy: 0.9413

[0.33178871870040894, 0.9412999749183655]

효과를 확인하지 못했다.

왜?

  • 데이터가 정규화되어 (데이터분포가 정규분포를 따름) 학습이 더 용이해질거라고 추정했지만, 큰 효과를 보지 못했다.
  • 왜 그런지 생각해보자.
  • 배치 단위로 정규화를 진행하다보니 배치별(총 1,875개) 평균&분산에 차이가 발생해 학습을 저해했을 것 같다.

Dropout

  • Over-fittin을 방지하는 규제 역할을 한다.
  • Dropout은 보통 활성함수 뒤에 위치한다.
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dropout(0.2),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "sequential_7"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_7 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_14 (Dense)             (None, 128)               100480    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_15 (Dense)             (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

Epoch 1/10
1875/1875 [==============================] - 5s 2ms/step - loss: 2.2989 - accuracy: 0.7486
Epoch 2/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.5936 - accuracy: 0.8425
Epoch 3/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4868 - accuracy: 0.8708
Epoch 4/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.4313 - accuracy: 0.8868
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3994 - accuracy: 0.8959
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3773 - accuracy: 0.9021
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3629 - accuracy: 0.9077
Epoch 8/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3489 - accuracy: 0.9120
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3404 - accuracy: 0.9135
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3254 - accuracy: 0.9153
313/313 [==============================] - 1s 1ms/step - loss: 0.3147 - accuracy: 0.9367

[0.31474700570106506, 0.9366999864578247]

규제화(Regulazation) 수단으로 Dropout을 적용하자 거시적 성능에서는 개선이 없었다. 하지만 학습에서의 성능보다 Test set에서의 성능이 좋았다. 겟챠!

왜 그럴까

  • Dropout이 규제화 역할을 하여 모델 성능이 일반화되었다고 추정.

Scaling features

  • mnist data의 각 scala는 0~255 사이 값을 가지므로, 255로 나눠주면 그 값이 0~1을 갖게 된다.
model = tf.keras.models.Sequential([
                                    tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "sequential_8"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_8 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_16 (Dense)             (None, 128)               100480    
_________________________________________________________________
dense_17 (Dense)             (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
model.fit(x_train/255, y_train, epochs=10)
model.evaluate(x_test/255, y_test)

Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2543 - accuracy: 0.9276
Epoch 2/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1116 - accuracy: 0.9665
Epoch 3/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0765 - accuracy: 0.9774
Epoch 4/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0567 - accuracy: 0.9826
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0441 - accuracy: 0.9863
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0350 - accuracy: 0.9894
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0281 - accuracy: 0.9914
Epoch 8/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0220 - accuracy: 0.9930
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0190 - accuracy: 0.9939
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0160 - accuracy: 0.9951
313/313 [==============================] - 1s 1ms/step - loss: 0.0847 - accuracy: 0.9797

[0.08471287041902542, 0.9797000288963318]

(Scale + ing)을 적용하자 학습에서 99% 정확도를 달성하고, test set에서 97% 남짓의 성능을 보인다. 겟챠!

왜 개선되었을까

  • feature value의 range가 줄면서 가중치 값을 적정선으로 보정하는 역할을 했을 것으로 추정.

Ensemble

I. Scaling + Dropout

model = tf.keras.models.Sequential([
                                    tf.keras.layers.Flatten(input_shape=(28,28)),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dropout(0.2),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "sequential_9"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_9 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_18 (Dense)             (None, 128)               100480    
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_19 (Dense)             (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
model.fit(x_train/255, y_train, epochs=10)
model.evaluate(x_test/255, y_test)

Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3022 - accuracy: 0.9118
Epoch 2/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1445 - accuracy: 0.9574
Epoch 3/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.1075 - accuracy: 0.9667
Epoch 4/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0872 - accuracy: 0.9736
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0754 - accuracy: 0.9758
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0681 - accuracy: 0.9789
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0584 - accuracy: 0.9814
Epoch 8/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0528 - accuracy: 0.9828
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0482 - accuracy: 0.9842
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0439 - accuracy: 0.9854
313/313 [==============================] - 0s 1ms/step - loss: 0.0672 - accuracy: 0.9814

[0.06720677018165588, 0.9814000129699707]

II. Scaling + Normalization + Dropout

model = tf.keras.models.Sequential([
                                    tf.keras.layers.experimental.preprocessing.Normalization(input_shape=(28,28)),
                                    tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(128, activation='relu'),
                                    tf.keras.layers.Dropout(0.2),
                                    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "sequential_10"

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
normalization_1 (Normalizati (None, 28, 28)            57        
_________________________________________________________________
flatten_10 (Flatten)         (None, 784)               0         
_________________________________________________________________
dense_20 (Dense)             (None, 128)               100480    
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_21 (Dense)             (None, 10)                1290      
=================================================================
Total params: 101,827
Trainable params: 101,770
Non-trainable params: 57
_________________________________________________________________
model.fit(x_train/255, y_train, epochs=10)
model.evaluate(x_test/255, y_test)

Epoch 1/10
1875/1875 [==============================] - 5s 2ms/step - loss: 0.3075 - accuracy: 0.9100
Epoch 2/10
1875/1875 [==============================] - 5s 2ms/step - loss: 0.1455 - accuracy: 0.9576
Epoch 3/10
1875/1875 [==============================] - 5s 2ms/step - loss: 0.1072 - accuracy: 0.9668
Epoch 4/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0892 - accuracy: 0.9728
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0765 - accuracy: 0.9764
Epoch 6/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.0661 - accuracy: 0.9794
Epoch 7/10
1875/1875 [==============================] - 5s 2ms/step - loss: 0.0586 - accuracy: 0.9813
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0555 - accuracy: 0.9823
Epoch 9/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0497 - accuracy: 0.9838
Epoch 10/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0443 - accuracy: 0.9852
313/313 [==============================] - 1s 1ms/step - loss: 0.0669 - accuracy: 0.9798

[0.06686274707317352, 0.9797999858856201]

결론

Ensemble I (Scaling + Dropout) 모델에서 98% 성능으로 가장 높은 성능을 냈다.

이 성능은 dropout layer를 통해서 일반화된 성능이기도 하다. (Not over-fitting)

또, 모델 도출의 과정에서

  • Loss Func
  • Optimizer
  • Activation Func
  • Scaling
  • Dropout 의 효과를 배웠고,
  • 배치별 정규화 (Normalization)의 효과가 없음을 배웠다.
profile
Today I Learned. AI Engineer.

0개의 댓글