แ„‚ ๐Ÿ”ถ [18 ์ผ์ฐจ] : EXPLORATION 6. ์˜ํ™” ๋ฆฌ๋ทฐ ๊ฐ์„ฑ๋ถ„์„

๋ฐฑ๊ฑดยท2022๋…„ 1์›” 20์ผ
0

IMDB ๋ฐ์ดํ„ฐ์…‹ ๋ถ„์„

  • IMDb Large Movie Dataset์€ 50000๊ฐœ์˜ ์˜์–ด๋กœ ์ž‘์„ฑ๋œ ์˜ํ™” ๋ฆฌ๋ทฐ ํ…์ŠคํŠธ๋กœ ๊ตฌ์„ฑ
  • ๊ธ์ •์€ 1, ๋ถ€์ •์€ 0์˜ ๋ผ๋ฒจ
  • 50000๊ฐœ์˜ ๋ฆฌ๋ทฐ ์ค‘ ์ ˆ๋ฐ˜์ธ 25000๊ฐœ๊ฐ€ ํ›ˆ๋ จ์šฉ ๋ฐ์ดํ„ฐ,
  • ๋‚˜๋จธ์ง€ 25000๊ฐœ๋ฅผ ํ…Œ์ŠคํŠธ์šฉ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉํ•˜๋„๋ก ์ง€์ •
imdb = tf.keras.datasets.imdb

# IMDb ๋ฐ์ดํ„ฐ์…‹ ๋‹ค์šด๋กœ๋“œ 
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)
print("ํ›ˆ๋ จ ์ƒ˜ํ”Œ ๊ฐœ์ˆ˜: {}, ํ…Œ์ŠคํŠธ ๊ฐœ์ˆ˜: {}".format(len(x_train), len(x_test)))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
17465344/17464789 [==============================] - 0s 0us/step
17473536/17464789 [==============================] - 0s 0us/step
ํ›ˆ๋ จ ์ƒ˜ํ”Œ ๊ฐœ์ˆ˜: 25000, ํ…Œ์ŠคํŠธ ๊ฐœ์ˆ˜: 25000

  • imdb.load_data() ํ˜ธ์ถœ ์‹œ ๋‹จ์–ด์‚ฌ์ „์— ๋“ฑ์žฌํ•  ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜(num_words)๋ฅผ 10000์œผ๋กœ ์ง€์ •ํ•˜๋ฉด, ๊ทธ ๊ฐœ์ˆ˜๋งŒํผ์˜ word_to_index ๋”•์…”๋„ˆ๋ฆฌ๊นŒ์ง€ ์ƒ์„ฑ๋œ ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ์…‹์ด ์ƒ์„ฑ

๋ฐ์ดํ„ฐ ํ™•์ธ

print(x_train[0])  # 1๋ฒˆ์งธ ๋ฆฌ๋ทฐ๋ฐ์ดํ„ฐ
print('๋ผ๋ฒจ: ', y_train[0])  # 1๋ฒˆ์งธ ๋ฆฌ๋ทฐ๋ฐ์ดํ„ฐ์˜ ๋ผ๋ฒจ
print('1๋ฒˆ์งธ ๋ฆฌ๋ทฐ ๋ฌธ์žฅ ๊ธธ์ด: ', len(x_train[0]))
print('2๋ฒˆ์งธ ๋ฆฌ๋ทฐ ๋ฌธ์žฅ ๊ธธ์ด: ', len(x_train[1]))

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
๋ผ๋ฒจ: 1
1๋ฒˆ์งธ ๋ฆฌ๋ทฐ ๋ฌธ์žฅ ๊ธธ์ด: 218
2๋ฒˆ์งธ ๋ฆฌ๋ทฐ ๋ฌธ์žฅ ๊ธธ์ด: 189

ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹Œ ์ด๋ฏธ ์ˆซ์ž๋กœ Encoding ๋œ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์ž„.

  • encode์— ์‚ฌ์šฉํ•œ ๋”•์…”๋„ˆ๋ฆฌ ํ™•์ธ
word_to_index = imdb.get_word_index()
index_to_word = {index:word for word, index in word_to_index.items()}
print(index_to_word[1])     # 'the' ๊ฐ€ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค. 
print(word_to_index['the'])  # 1 ์ด ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 0s 0us/step
1654784/1641221 [==============================] - 0s 0us/step
the
1

๋ฐ์ดํ„ฐ ๋ณด์ •

IMDb ๋ฐ์ดํ„ฐ์…‹์˜ ํ…์ŠคํŠธ ์ธ์ฝ”๋”ฉ์„ ์œ„ํ•œ word_to_index, index_to_word๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ๋ณด์ •
๋ฐ์ดํ„ฐ ์ œ๊ณต์ž๊ฐ€ ๊ทธ๋ ‡๊ฒŒ ํ•˜๋ž˜

  • word_to_index๋Š” IMDb ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์…‹์˜ ๋‹จ์–ด ์ถœํ˜„ ๋นˆ๋„ ๊ธฐ์ค€์œผ๋กœ ๋‚ด๋ฆผ์ฐจ์ˆ˜ ์ •๋ ฌ
#์‹ค์ œ ์ธ์ฝ”๋”ฉ ์ธ๋ฑ์Šค๋Š” ์ œ๊ณต๋œ word_to_index์—์„œ index ๊ธฐ์ค€์œผ๋กœ 3์”ฉ ๋’ค๋กœ ๋ฐ€๋ ค ์žˆ์Šต๋‹ˆ๋‹ค.  
word_to_index = {k:(v+3) for k,v in word_to_index.items()}

์ฒ˜์Œ ๋ช‡ ๊ฐœ ์ธ๋ฑ์Šค๋Š” ์‚ฌ์ „์— ์ •์˜๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค

word_to_index[""] = 0
word_to_index[""] = 1
word_to_index[""] = 2 # unknown
word_to_index[""] = 3

index_to_word = {index:word for word, index in word_to_index.items()}

print(index_to_word[1]) # '' ๊ฐ€ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.
print(word_to_index['the']) # 4 ์ด ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.
print(index_to_word[4]) # 'the' ๊ฐ€ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.


>>\<BOS>  
  4  
  the  
  
### Decoding ํ•˜๊ธฐ
```python
print(get_decoded_sentence(x_train[0], index_to_word))
print('๋ผ๋ฒจ: ', y_train[0])  # 1๋ฒˆ์งธ ๋ฆฌ๋ทฐ๋ฐ์ดํ„ฐ์˜ ๋ผ๋ฒจ

this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert \ is an amazing actor and now the same being director \ father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for \ and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also \ to the two little boy's that played the \ of norman and paul they were just brilliant children are often left out of the \ list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all
๋ผ๋ฒจ: 1

  • pad_sequences๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์…‹ ์ƒ์˜ ๋ฌธ์žฅ์˜ ๊ธธ์ด๋ฅผ ํ†ต์ผํ•ด์•ผ ํ•จ
  • ๋ฌธ์žฅ ์ตœ๋Œ€ ๊ธธ์ด maxlen์˜ ๊ฐ’ ์„ค์ •๋„ ์ „์ฒด ๋ชจ๋ธ ์„ฑ๋Šฅ์— ์˜ํ–ฅ
  • ์ ์ ˆํ•œ ๊ฐ’์„ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์˜ ๋ถ„ํฌ๋ฅผ ํ™•์ธ

๋ฐ์ดํ„ฐ์…‹์˜ ๋ถ„ํฌ ํ™•์ธ

total_data_text = list(x_train) + list(x_test)
# ํ…์ŠคํŠธ๋ฐ์ดํ„ฐ ๋ฌธ์žฅ๊ธธ์ด์˜ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•œ ํ›„
num_tokens = [len(tokens) for tokens in total_data_text]
num_tokens = np.array(num_tokens)
# ๋ฌธ์žฅ๊ธธ์ด์˜ ํ‰๊ท ๊ฐ’, ์ตœ๋Œ€๊ฐ’, ํ‘œ์ค€ํŽธ์ฐจ๋ฅผ ๊ณ„์‚ฐํ•ด ๋ณธ๋‹ค. 
print('๋ฌธ์žฅ๊ธธ์ด ํ‰๊ท  : ', np.mean(num_tokens))
print('๋ฌธ์žฅ๊ธธ์ด ์ตœ๋Œ€ : ', np.max(num_tokens))
print('๋ฌธ์žฅ๊ธธ์ด ํ‘œ์ค€ํŽธ์ฐจ : ', np.std(num_tokens))

# ์˜ˆ๋ฅผ๋“ค์–ด, ์ตœ๋Œ€ ๊ธธ์ด๋ฅผ (ํ‰๊ท  + 2*ํ‘œ์ค€ํŽธ์ฐจ)๋กœ ํ•œ๋‹ค๋ฉด,  
max_tokens = np.mean(num_tokens) + 2 * np.std(num_tokens)
maxlen = int(max_tokens)
print('pad_sequences maxlen : ', maxlen)
print('์ „์ฒด ๋ฌธ์žฅ์˜ {}%๊ฐ€ maxlen ์„ค์ •๊ฐ’ ์ด๋‚ด์— ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. '.format(np.sum(num_tokens < max_tokens) / len(num_tokens)))

๋ฌธ์žฅ๊ธธ์ด ํ‰๊ท  : 234.75892
๋ฌธ์žฅ๊ธธ์ด ์ตœ๋Œ€ : 2494
๋ฌธ์žฅ๊ธธ์ด ํ‘œ์ค€ํŽธ์ฐจ : 172.91149458735703
pad_sequences maxlen : 580
์ „์ฒด ๋ฌธ์žฅ์˜ 0.94536%๊ฐ€ maxlen ์„ค์ •๊ฐ’ ์ด๋‚ด์— ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

์œ„์˜ ๊ฒฝ์šฐ maxlen=580

ํŒจํŒ…์„ pre๋กœ ํ• ๊นŒ post๋กœ ํ• ๊นŒ

post ํŒจ๋”ฉ

x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train,
                                                        value=word_to_index["<PAD>"],
                                                        padding='post', # ํ˜น์€ 'pre'
                                                        maxlen=maxlen)

x_test = tf.keras.preprocessing.sequence.pad_sequences(x_test,
                                                       value=word_to_index["<PAD>"],
                                                       padding='post', # ํ˜น์€ 'pre'
                                                       maxlen=maxlen)

print(x_train.shape)

(25000, 580)

pre ํŒจ๋”ฉ

## RNN  Pre์‚ฌ์šฉ
x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train,
                                                        value=word_to_index["<PAD>"],
                                                        padding='pre', # ํ˜น์€ 'pre'
                                                        maxlen=maxlen)

x_test = tf.keras.preprocessing.sequence.pad_sequences(x_test,
                                                       value=word_to_index["<PAD>"],
                                                       padding='pre', # ํ˜น์€ 'pre'
                                                       maxlen=maxlen)

print(x_train.shape)

(25000, 580)

์ด ๊ฒฐ๊ณผ๋กœ๋Š” ์•Œ์ˆ˜ ์—†๋Š” ๊ฑฐ ์•„๋‹Œ๊ฐ€???

  • RNN์€ ์ž…๋ ฅ๋ฐ์ดํ„ฐ๊ฐ€ ์ˆœ์ฐจ์ ์œผ๋กœ ์ฒ˜๋ฆฌ๋˜์–ด, ๊ฐ€์žฅ ๋งˆ์ง€๋ง‰ ์ž…๋ ฅ์ด ์ตœ์ข… state ๊ฐ’์— ๊ฐ€์žฅ ์˜ํ–ฅ์„ ๋งŽ์ด ๋ฏธ์น˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ๋งˆ์ง€๋ง‰ ์ž…๋ ฅ์ด ๋ฌด์˜๋ฏธํ•œ padding์œผ๋กœ ์ฑ„์›Œ์ง€๋Š” ๊ฒƒ์€ ๋น„ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ 'pre'๊ฐ€ ํ›จ์”ฌ ์œ ๋ฆฌํ•˜๋ฉฐ, 10% ์ด์ƒ์˜ ํ…Œ์ŠคํŠธ ์„ฑ๋Šฅ ์ฐจ์ด๋ฅผ ๋ณด์ด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

RNN ๋ชจ๋ธ ์„ค๊ณ„

vocab_size = 10000    # ์–ดํœ˜ ์‚ฌ์ „์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค(10,000๊ฐœ์˜ ๋‹จ์–ด)
word_vector_dim = 16  # ์›Œ๋“œ ๋ฒกํ„ฐ์˜ ์ฐจ์› ์ˆ˜ (๋ณ€๊ฒฝ ๊ฐ€๋Šฅํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ)

# model ์„ค๊ณ„ - ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ฝ”๋“œ๋ฅผ ์ง์ ‘ ์ž‘์„ฑํ•ด ์ฃผ์„ธ์š”.
model = tf.keras.Sequential()
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(vocab_size, word_vector_dim, input_shape=(None,)))
model.add(tf.keras.layers.GlobalMaxPooling1D())
model.add(tf.keras.layers.Dense(8, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential_6"


Layer (type) Output Shape Param #
\=================================================================
embedding_6 (Embedding) (None, None, 16) 160000


global_max_pooling1d_2 (Glob (None, 16) 0


dense_8 (Dense) (None, 8) 136


dense_9 (Dense) (None, 1) 9
\=================================================================
Total params: 160,145
Trainable params: 160,145
Non-trainable params: 0


๊ทธ๋ž˜์„œ ์ด ๊ฒฐ๊ณผ๊ฐ€ ๋ชจ๋‹ค?

model ํ›ˆ๋ จ ์ „์—, ํ›ˆ๋ จ์šฉ ๋ฐ์ดํ„ฐ์…‹ 25000๊ฑด ์ค‘ 10000๊ฑด์„ ๋ถ„๋ฆฌํ•˜์—ฌ ๊ฒ€์ฆ์…‹(validation set)์œผ๋กœ ์‚ฌ์šฉ

# validation set 10000๊ฑด ๋ถ„๋ฆฌ
x_val = x_train[:10000]   
y_val = y_train[:10000]

# validation set์„ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ 15000๊ฑด
partial_x_train = x_train[10000:]  
partial_y_train = y_train[10000:]

print(partial_x_train.shape)
print(partial_y_train.shape)

(15000, 580)
(15000,)

Model ํ•™์Šต ์‹œ์ž‘

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
              
epochs=20  # ๋ช‡ epoch๋ฅผ ํ›ˆ๋ จํ•˜๋ฉด ์ข‹์„์ง€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๋ฉด์„œ ๋ฐ”๊พธ์–ด ๋ด…์‹œ๋‹ค. 

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=epochs,
                    batch_size=512,
                    validation_data=(x_val, y_val),
                    verbose=1)

Epoch 1/20
30/30 [==============================] - 8s 13ms/step - loss: 0.6919 - accuracy: 0.4991 - val_loss: 0.6889 - val_accuracy: 0.5412
Epoch 2/20
30/30 [==============================] - 0s 8ms/step - loss: 0.6846 - accuracy: 0.6340 - val_loss: 0.6796 - val_accuracy: 0.6835
Epoch 3/20
30/30 [==============================] - 0s 7ms/step - loss: 0.6701 - accuracy: 0.7469 - val_loss: 0.6618 - val_accuracy: 0.7336
Epoch 4/20
30/30 [==============================] - 0s 7ms/step - loss: 0.6429 - accuracy: 0.7901 - val_loss: 0.6296 - val_accuracy: 0.7706
Epoch 5/20
30/30 [==============================] - 0s 6ms/step - loss: 0.5986 - accuracy: 0.8169 - val_loss: 0.5863 - val_accuracy: 0.7821
Epoch 6/20
30/30 [==============================] - 0s 6ms/step - loss: 0.5440 - accuracy: 0.8301 - val_loss: 0.5386 - val_accuracy: 0.7953
Epoch 7/20
30/30 [==============================] - 0s 6ms/step - loss: 0.4867 - accuracy: 0.8430 - val_loss: 0.4944 - val_accuracy: 0.8038
Epoch 8/20
30/30 [==============================] - 0s 6ms/step - loss: 0.4342 - accuracy: 0.8543 - val_loss: 0.4580 - val_accuracy: 0.8120
Epoch 9/20
30/30 [==============================] - 0s 6ms/step - loss: 0.3889 - accuracy: 0.8669 - val_loss: 0.4301 - val_accuracy: 0.8173
Epoch 10/20
30/30 [==============================] - 0s 6ms/step - loss: 0.3508 - accuracy: 0.8789 - val_loss: 0.4081 - val_accuracy: 0.8233
Epoch 11/20
30/30 [==============================] - 0s 6ms/step - loss: 0.3185 - accuracy: 0.8877 - val_loss: 0.3916 - val_accuracy: 0.8276
Epoch 12/20
30/30 [==============================] - 0s 6ms/step - loss: 0.2912 - accuracy: 0.8984 - val_loss: 0.3796 - val_accuracy: 0.8309
Epoch 13/20
30/30 [==============================] - 0s 6ms/step - loss: 0.2678 - accuracy: 0.9060 - val_loss: 0.3706 - val_accuracy: 0.8361
Epoch 14/20
30/30 [==============================] - 0s 6ms/step - loss: 0.2473 - accuracy: 0.9137 - val_loss: 0.3644 - val_accuracy: 0.8371
Epoch 15/20
30/30 [==============================] - 0s 6ms/step - loss: 0.2296 - accuracy: 0.9202 - val_loss: 0.3603 - val_accuracy: 0.8391
Epoch 16/20
30/30 [==============================] - 0s 6ms/step - loss: 0.2137 - accuracy: 0.9257 - val_loss: 0.3578 - val_accuracy: 0.8413
Epoch 17/20
30/30 [==============================] - 0s 6ms/step - loss: 0.1996 - accuracy: 0.9314 - val_loss: 0.3570 - val_accuracy: 0.8418
Epoch 18/20
30/30 [==============================] - 0s 6ms/step - loss: 0.1867 - accuracy: 0.9367 - val_loss: 0.3573 - val_accuracy: 0.8428
Epoch 19/20
30/30 [==============================] - 0s 6ms/step - loss: 0.1750 - accuracy: 0.9422 - val_loss: 0.3583 - val_accuracy: 0.8430
Epoch 20/20
30/30 [==============================] - 0s 6ms/step - loss: 0.1639 - accuracy: 0.9479 - val_loss: 0.3604 - val_accuracy: 0.8423

ํ•™์Šต์ด ๋๋‚œ ๋ชจ๋ธ์„ ํ…Œ์ŠคํŠธ ์…‹์œผ๋กœ ํ‰๊ฐ€

results = model.evaluate(x_test,  y_test, verbose=2)

print(results)

782/782 - 1s - loss: 0.3870 - accuracy: 0.8310
[0.3870142102241516, 0.8309599757194519]

  • model.fit() ๊ณผ์ • ์ค‘์˜ train/validation loss, accuracy ๋“ฑ์ด ๋งค epoch๋งˆ๋‹ค history ๋ณ€์ˆ˜์— ์ €์žฅ

  • ์ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ๊ทธ๋ ค ๋ณด๋ฉด, ์ˆ˜ํ–‰ํ–ˆ๋˜ ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต์ด ์ž˜ ์ง„ํ–‰๋˜์—ˆ๋Š”์ง€, ์˜ค๋ฒ„ํ”ผํŒ… ํ˜น์€ ์–ธ๋”ํ”ผํŒ…ํ•˜์ง€ ์•Š์•˜๋Š”์ง€, ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ์•„์ด๋””์–ด๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์ข‹์€ ์ž๋ฃŒ

Epoch์˜ hostory ๊ทธ๋ž˜ํ”„๋กœ( ์„ฑ๋Šฅ๊ฐœ์„  ํ™•์ธ)

history_dict = history.history
print(history_dict.keys()) # epoch์— ๋”ฐ๋ฅธ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค๋ณผ ์ˆ˜ ์žˆ๋Š” ํ•ญ๋ชฉ๋“ค

dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

import matplotlib.pyplot as plt

acc = history_dict['accuracy']
val_acc = history_dict['val_accuracy']
loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(acc) + 1)

# "bo"๋Š” "ํŒŒ๋ž€์ƒ‰ ์ "์ž…๋‹ˆ๋‹ค
plt.plot(epochs, loss, 'bo', label='Training loss')
# b๋Š” "ํŒŒ๋ž€ ์‹ค์„ "์ž…๋‹ˆ๋‹ค
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

  • Training and validation loss๋ฅผ ๊ทธ๋ ค ๋ณด๋ฉด, ๋ช‡ epoch๊นŒ์ง€์˜ ํŠธ๋ ˆ์ด๋‹์ด ์ ์ ˆํ•œ์ง€ ์ตœ์ ์ ์„ ์ถ”์ •
  • validation loss์˜ ๊ทธ๋ž˜ํ”„๊ฐ€ train loss์™€์˜ ์ด๊ฒฉ์ด ๋ฐœ์ƒํ•˜๊ฒŒ ๋˜๋ฉด ๋” ์ด์ƒ์˜ ํŠธ๋ ˆ์ด๋‹์€ ๋ฌด์˜๋ฏธ

Training and validation accuracy

plt.clf()   # ๊ทธ๋ฆผ์„ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

Word2Vec์˜ ์ ์šฉ

  • word Embedding
    - ๋ผ๋ฒจ๋ง ๋น„์šฉ์ด ๋งŽ์ด ๋“œ๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๊ฐ์„ฑ๋ถ„์„์˜ ๋น„์šฉ์„ ์ ˆ๊ฐํ•˜๋ฉด์„œ ์ •ํ™•๋„๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•์œผ๋กœ ๋‹จ์–ด์˜ ํŠน์„ฑ์„ ์ €์ฐจ์› ๋ฒกํ„ฐ๊ฐ’์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ(word embedding) ๊ธฐ๋ฒ•

  • model์˜ ์ฒซ ๋ฒˆ์งธ ๋ ˆ์ด์–ด๋Š” ๋ฐ”๋กœ Embedding ๋ ˆ์ด์–ด
    - ์‚ฌ์ „์˜ ๋‹จ์–ด ๊ฐœ์ˆ˜ X ์›Œ๋“œ ๋ฒกํ„ฐ ์‚ฌ์ด์ฆˆ๋งŒํผ์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง„ ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ
    - Embedding ๋ ˆ์ด์–ด์— ํ•™์Šต๋œ ์›Œ๋“œ ๋ฒกํ„ฐ๋“ค๋„ ์˜๋ฏธ ๊ณต๊ฐ„์ƒ์— ์œ ์˜๋ฏธํ•œ ํ˜•ํƒœ๋กœ ํ•™์Šต

  • ์›Œ๋“œ ๋ฒกํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š”๋ฐ ์œ ์šฉํ•œ gensim ํŒจํ‚ค์ง€ ๋ฒ„์ „ ํ™•์ธ

embedding_layer = model.layers[0]
weights = embedding_layer.get_weights()[0]
print(weights.shape)    # shape: (vocab_size, embedding_dim)

(10000, 300)

# ํ•™์Šตํ•œ Embedding ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํŒŒ์ผ์— ์จ์„œ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. 
word2vec_file_path = os.getenv('HOME')+'/aiffel/sentiment_classification/data/word2vec.txt'
f = open(word2vec_file_path, 'w')
f.write('{} {}\n'.format(vocab_size-4, word_vector_dim))  # ๋ช‡๊ฐœ์˜ ๋ฒกํ„ฐ๋ฅผ ์–ผ๋งˆ ์‚ฌ์ด์ฆˆ๋กœ ๊ธฐ์žฌํ• ์ง€ ํƒ€์ดํ‹€์„ ์”๋‹ˆ๋‹ค.

# ๋‹จ์–ด ๊ฐœ์ˆ˜(์—์„œ ํŠน์ˆ˜๋ฌธ์ž 4๊ฐœ๋Š” ์ œ์™ธํ•˜๊ณ )๋งŒํผ์˜ ์›Œ๋“œ ๋ฒกํ„ฐ๋ฅผ ํŒŒ์ผ์— ๊ธฐ๋กํ•ฉ๋‹ˆ๋‹ค. 
vectors = model.get_weights()[0]
for i in range(4,vocab_size):
    f.write('{} {}\n'.format(index_to_word[i], ' '.join(map(str, list(vectors[i, :])))))
f.close()

gensim์—์„œ ์ œ๊ณตํ•˜๋Š” ํŒจํ‚ค์ง€๋ฅผ ์ด์šฉํ•ด, ์œ„์— ๋‚จ๊ธด ์ž„๋ฒ ๋”ฉ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฝ์–ด์„œ word vector๋กœ ์‚ฌ์šฉ

from gensim.models.keyedvectors import Word2VecKeyedVectors

word_vectors = Word2VecKeyedVectors.load_word2vec_format(word2vec_file_path, binary=False)
vector = word_vectors['computer']
vector

์œ„์™€ ๊ฐ™์ด ์–ป์€ ์›Œ๋“œ ๋ฒกํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žฌ๋ฏธ์žˆ๋Š” ์‹คํ—˜์„ ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์›Œ๋“œ ๋ฒกํ„ฐ๊ฐ€ ์˜๋ฏธ ๋ฒกํ„ฐ ๊ณต๊ฐ„์ƒ์— ์œ ์˜๋ฏธํ•˜๊ฒŒ ํ•™์Šต๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•˜๋Š” ๋ฐฉ๋ฒ• ์ค‘์—, ๋‹จ์–ด๋ฅผ ํ•˜๋‚˜ ์ฃผ๊ณ  ๊ทธ์™€ ๊ฐ€์žฅ ์œ ์‚ฌํ•œ ๋‹จ์–ด์™€ ๊ทธ ์œ ์‚ฌ๋„๋ฅผ ํ™•์ธํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค. gensim์„ ์‚ฌ์šฉํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

word_vectors.similar_by_word("love")

word2Vec ๋ชจ๋ธ ๊ฐ€์ ธ์˜ค๊ธฐ

$ ln -s ~/data/GoogleNews-vectors-negative300.bin.gz ~/aiffel/sentiment_classification/data

from gensim.models import KeyedVectors

word2vec_path = os.getenv('HOME')+'/aiffel/sentiment_classification/data/GoogleNews-vectors-negative300.bin.gz'
word2vec = KeyedVectors.load_word2vec_format(word2vec_path, binary=True, limit=1000000)
vector = word2vec['computer']
vector     # ๋ฌด๋ ค 300dim์˜ ์›Œ๋“œ ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค.

array([ 1.07421875e-01, -2.01171875e-01, 1.23046875e-01, 2.11914062e-01,
-9.13085938e-02, 2.16796875e-01, -1.31835938e-01, 8.30078125e-02,
2.02148438e-01, 4.78515625e-02, 3.66210938e-02, -2.45361328e-02,
2.39257812e-02, -1.60156250e-01, -2.61230469e-02, 9.71679688e-02,
-6.34765625e-02, 1.84570312e-01, 1.70898438e-01, -1.63085938e-01,
-1.09375000e-01, 1.49414062e-01, -4.65393066e-04, 9.61914062e-02,
1.68945312e-01, 2.60925293e-03, 8.93554688e-02, 6.49414062e-02,
3.56445312e-02, -6.93359375e-02, -1.46484375e-01, -1.21093750e-01,
-2.27539062e-01, 2.45361328e-02, -1.24511719e-01, -3.18359375e-01,
-2.20703125e-01, 1.30859375e-01, 3.66210938e-02, -3.63769531e-02,
-1.13281250e-01, 1.95312500e-01, 9.76562500e-02, 1.26953125e-01,
6.59179688e-02, 6.93359375e-02, 1.02539062e-02, 1.75781250e-01,
-1.68945312e-01, 1.21307373e-03, -2.98828125e-01, -1.15234375e-01,
5.66406250e-02, -1.77734375e-01, -2.08984375e-01, 1.76757812e-01,
2.38037109e-02, -2.57812500e-01, -4.46777344e-02, 1.88476562e-01,
5.51757812e-02, 5.02929688e-02, -1.06933594e-01, 1.89453125e-01,
-1.16210938e-01, 8.49609375e-02, -1.71875000e-01, 2.45117188e-01,
-1.73828125e-01, -8.30078125e-03, 4.56542969e-02, -1.61132812e-02,
1.86523438e-01, -6.05468750e-02, -4.17480469e-02, 1.82617188e-01,
2.20703125e-01, -1.22558594e-01, -2.55126953e-02, -3.08593750e-01,
9.13085938e-02, 1.60156250e-01, 1.70898438e-01, 1.19628906e-01,
7.08007812e-02, -2.64892578e-02, -3.08837891e-02, 4.06250000e-01,
-1.01562500e-01, 5.71289062e-02, -7.26318359e-03, -9.17968750e-02,
-1.50390625e-01, -2.55859375e-01, 2.16796875e-01, -3.63769531e-02,
2.24609375e-01, 8.00781250e-02, 1.56250000e-01, 5.27343750e-02,
1.50390625e-01, -1.14746094e-01, -8.64257812e-02, 1.19140625e-01,
-7.17773438e-02, 2.73437500e-01, -1.64062500e-01, 7.29370117e-03,
4.21875000e-01, -1.12792969e-01, -1.35742188e-01, -1.31835938e-01,
-1.37695312e-01, -7.66601562e-02, 6.25000000e-02, 4.98046875e-02,
-1.91406250e-01, -6.03027344e-02, 2.27539062e-01, 5.88378906e-02,
-3.24218750e-01, 5.41992188e-02, -1.35742188e-01, 8.17871094e-03,
-5.24902344e-02, -1.74713135e-03, -9.81445312e-02, -2.86865234e-02,
3.61328125e-02, 2.15820312e-01, 5.98144531e-02, -3.08593750e-01,
-2.27539062e-01, 2.61718750e-01, 9.86328125e-02, -5.07812500e-02,
1.78222656e-02, 1.31835938e-01, -5.35156250e-01, -1.81640625e-01,
1.38671875e-01, -3.10546875e-01, -9.71679688e-02, 1.31835938e-01,
-1.16210938e-01, 7.03125000e-02, 2.85156250e-01, 3.51562500e-02,
-1.01562500e-01, -3.75976562e-02, 1.41601562e-01, 1.42578125e-01,
-5.68847656e-02, 2.65625000e-01, -2.09960938e-01, 9.64355469e-03,
-6.68945312e-02, -4.83398438e-02, -6.10351562e-02, 2.45117188e-01,
-9.66796875e-02, 1.78222656e-02, -1.27929688e-01, -4.78515625e-02,
-7.26318359e-03, 1.79687500e-01, 2.78320312e-02, -2.10937500e-01,
-1.43554688e-01, -1.27929688e-01, 1.73339844e-02, -3.60107422e-03,
-2.04101562e-01, 3.63159180e-03, -1.19628906e-01, -6.15234375e-02,
5.93261719e-02, -3.23486328e-03, -1.70898438e-01, -3.14941406e-02,
-8.88671875e-02, -2.89062500e-01, 3.44238281e-02, -1.87500000e-01,
2.94921875e-01, 1.58203125e-01, -1.19628906e-01, 7.61718750e-02,
6.39648438e-02, -4.68750000e-02, -6.83593750e-02, 1.21459961e-02,
-1.44531250e-01, 4.54101562e-02, 3.68652344e-02, 3.88671875e-01,
1.45507812e-01, -2.55859375e-01, -4.46777344e-02, -1.33789062e-01,
-1.38671875e-01, 6.59179688e-02, 1.37695312e-01, 1.14746094e-01,
2.03125000e-01, -4.78515625e-02, 1.80664062e-02, -8.54492188e-02,
-2.48046875e-01, -3.39843750e-01, -2.83203125e-02, 1.05468750e-01,
-2.14843750e-01, -8.74023438e-02, 7.12890625e-02, 1.87500000e-01,
-1.12304688e-01, 2.73437500e-01, -3.26171875e-01, -1.77734375e-01,
-4.24804688e-02, -2.69531250e-01, 6.64062500e-02, -6.88476562e-02,
-1.99218750e-01, -7.03125000e-02, -2.43164062e-01, -3.66210938e-02,
-7.37304688e-02, -1.77734375e-01, 9.17968750e-02, -1.25000000e-01,
-1.65039062e-01, -3.57421875e-01, -2.85156250e-01, -1.66992188e-01,
1.97265625e-01, -1.53320312e-01, 2.31933594e-02, 2.06054688e-01,
1.80664062e-01, -2.74658203e-02, -1.92382812e-01, -9.61914062e-02,
-1.06811523e-02, -4.73632812e-02, 6.54296875e-02, -1.25732422e-02,
1.78222656e-02, -8.00781250e-02, -2.59765625e-01, 9.37500000e-02,
-7.81250000e-02, 4.68750000e-02, -2.22167969e-02, 1.86767578e-02,
3.11279297e-02, 1.04980469e-02, -1.69921875e-01, 2.58789062e-02,
-3.41796875e-02, -1.44042969e-02, -5.46875000e-02, -8.78906250e-02,
1.96838379e-03, 2.23632812e-01, -1.36718750e-01, 1.75781250e-01,
-1.63085938e-01, 1.87500000e-01, 3.44238281e-02, -5.63964844e-02,
-2.27689743e-05, 4.27246094e-02, 5.81054688e-02, -1.07910156e-01,
-3.88183594e-02, -2.69531250e-01, 3.34472656e-02, 9.81445312e-02,
5.63964844e-02, 2.23632812e-01, -5.49316406e-02, 1.46484375e-01,
5.93261719e-02, -2.19726562e-01, 6.39648438e-02, 1.66015625e-02,
4.56542969e-02, 3.26171875e-01, -3.80859375e-01, 1.70898438e-01,
5.66406250e-02, -1.04492188e-01, 1.38671875e-01, -1.57226562e-01,
3.23486328e-03, -4.80957031e-02, -2.48046875e-01, -6.20117188e-02],
dtype=float32)

300dim์˜ ๋ฒกํ„ฐ๋กœ ์ด๋ฃจ์–ด์ง„ 300๋งŒ ๊ฐœ์˜ ๋‹จ์–ด์ž…๋‹ˆ๋‹ค. ์ด ๋‹จ์–ด ์‚ฌ์ „์„ ๋ฉ”๋ชจ๋ฆฌ์— ๋ชจ๋‘ ๋กœ๋”ฉํ•˜๋ฉด ์•„์ฃผ ๋†’์€ ํ™•๋ฅ ๋กœ ์—ฌ๋Ÿฌ๋ถ„์˜ ์‹ค์Šตํ™˜๊ฒฝ์— ๋ฉ”๋ชจ๋ฆฌ ์—๋Ÿฌ๊ฐ€ ๋‚  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ KeyedVectors.load_word2vec_format ๋ฉ”์„œ๋“œ๋กœ ์›Œ๋“œ ๋ฒกํ„ฐ๋ฅผ ๋กœ๋”ฉํ•  ๋•Œ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ์ƒ์œ„ 100๋งŒ ๊ฐœ๋งŒ limt์œผ๋กœ ์กฐ๊ฑด์„ ์ฃผ์–ด ๋กœ๋”ฉํ–ˆ์Šต๋‹ˆ๋‹ค.
๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ถฉ๋ถ„ํ•˜๋‹ค๋ฉด limt=None์œผ๋กœ ํ•˜์‹œ๋ฉด 300๋งŒ ๊ฐœ๋ฅผ ๋ชจ๋‘ ๋กœ๋”ฉ

# ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋‹ค์†Œ ๋งŽ์ด ์†Œ๋น„ํ•˜๋Š” ์ž‘์—…์ด๋‹ˆ ์œ ์˜ํ•ด ์ฃผ์„ธ์š”.
word2vec.similar_by_word("love")

[('loved', 0.6907791495323181),
('adore', 0.6816873550415039),
('loves', 0.661863386631012),
('passion', 0.6100708842277527),
('hate', 0.600395679473877),
('loving', 0.5886635780334473),
('affection', 0.5664337873458862),
('undying_love', 0.5547304749488831),
('absolutely_adore', 0.5536840558052063),
('adores', 0.5440906882286072)]

Word2Vec์—์„œ ์ œ๊ณตํ•˜๋Š” ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋“ค๋ผ๋ฆฌ๋Š” ์˜๋ฏธ์  ์œ ์‚ฌ๋„๊ฐ€ ๊ฐ€๊นŒ์šด ๊ฒƒ์ด ์„œ๋กœ ๊ฐ€๊น๊ฒŒ ์ œ๋Œ€๋กœ ํ•™์Šต๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์šฐ๋ฆฌ๋Š” ์ด์ „ ์Šคํ…์—์„œ ํ•™์Šตํ–ˆ๋˜ ๋ชจ๋ธ์˜ ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด๋ฅผ Word2Vec์˜ ๊ฒƒ์œผ๋กœ ๊ต์ฒดํ•˜์—ฌ ๋‹ค์‹œ ํ•™์Šต์‹œ์ผœ ๋ณผ ๊ฒƒ

vocab_size = 10000    # ์–ดํœ˜ ์‚ฌ์ „์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค(10,000๊ฐœ์˜ ๋‹จ์–ด)
word_vector_dim = 300  # ์›Œ๋“œ ๋ฒกํ„ฐ์˜ ์ฐจ์›์ˆ˜
embedding_matrix = np.random.rand(vocab_size, word_vector_dim)

# embedding_matrix์— Word2Vec ์›Œ๋“œ ๋ฒกํ„ฐ๋ฅผ ๋‹จ์–ด ํ•˜๋‚˜์”ฉ๋งˆ๋‹ค ์ฐจ๋ก€์ฐจ๋ก€ ์นดํ”ผํ•œ๋‹ค.
for i in range(4,vocab_size):
    if index_to_word[i] in word2vec:
        embedding_matrix[i] = word2vec[index_to_word[i]]
from tensorflow.keras.initializers import Constant

vocab_size = 10000    # ์–ดํœ˜ ์‚ฌ์ „์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค(10,000๊ฐœ์˜ ๋‹จ์–ด)
word_vector_dim = 300  # ์›Œ๋“œ ๋ฒกํ„ฐ์˜ ์ฐจ์› ์ˆ˜ 

# ๋ชจ๋ธ ๊ตฌ์„ฑ
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(vocab_size, 
                                 word_vector_dim, 
                                 embeddings_initializer=Constant(embedding_matrix),  # ์นดํ”ผํ•œ ์ž„๋ฒ ๋”ฉ์„ ์—ฌ๊ธฐ์„œ ํ™œ์šฉ
                                 input_length=maxlen, 
                                 trainable=True))   # trainable์„ True๋กœ ์ฃผ๋ฉด Fine-tuning
model.add(tf.keras.layers.Conv1D(16, 7, activation='relu'))
model.add(tf.keras.layers.MaxPooling1D(5))
model.add(tf.keras.layers.Conv1D(16, 7, activation='relu'))
model.add(tf.keras.layers.GlobalMaxPooling1D())
model.add(tf.keras.layers.Dense(8, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid')) 

model.summary()

Model: "sequential_11"


Layer (type) Output Shape Param #
\=================================================================
embedding_11 (Embedding) (None, 580, 300) 3000000


conv1d_10 (Conv1D) (None, 574, 16) 33616


max_pooling1d_5 (MaxPooling1 (None, 114, 16) 0


conv1d_11 (Conv1D) (None, 108, 16) 1808


global_max_pooling1d_7 (Glob (None, 16) 0


dense_18 (Dense) (None, 8) 136


dense_19 (Dense) (None, 1) 9
\=================================================================
Total params: 3,035,569
Trainable params: 3,035,569
Non-trainable params: 0


# ํ•™์Šต์˜ ์ง„ํ–‰
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
              
epochs=20  # ๋ช‡ epoch๋ฅผ ํ›ˆ๋ จํ•˜๋ฉด ์ข‹์„์ง€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๋ฉด์„œ ๋ฐ”๊พธ์–ด ๋ด…์‹œ๋‹ค. 

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=epochs,
                    batch_size=512,
                    validation_data=(x_val, y_val),
                    verbose=1)
# ํ…Œ์ŠคํŠธ์…‹์„ ํ†ตํ•œ ๋ชจ๋ธ ํ‰๊ฐ€
results = model.evaluate(x_test,  y_test, verbose=2)

print(results)

Epoch 1/20
30/30 [==============================] - 3s 78ms/step - loss: 0.7040 - accuracy: 0.4949 - val_loss: 0.6931 - val_accuracy: 0.5030
Epoch 2/20
30/30 [==============================] - 2s 73ms/step - loss: 0.6928 - accuracy: 0.5125 - val_loss: 0.6934 - val_accuracy: 0.5054
Epoch 3/20
30/30 [==============================] - 2s 72ms/step - loss: 0.6916 - accuracy: 0.5375 - val_loss: 0.6938 - val_accuracy: 0.4943
Epoch 4/20
30/30 [==============================] - 2s 72ms/step - loss: 0.6899 - accuracy: 0.5364 - val_loss: 0.6944 - val_accuracy: 0.4943
Epoch 5/20
30/30 [==============================] - 2s 73ms/step - loss: 0.6878 - accuracy: 0.5438 - val_loss: 0.6928 - val_accuracy: 0.5160
Epoch 6/20
30/30 [==============================] - 2s 74ms/step - loss: 0.6850 - accuracy: 0.6023 - val_loss: 0.6931 - val_accuracy: 0.5050
Epoch 7/20
30/30 [==============================] - 2s 74ms/step - loss: 0.6814 - accuracy: 0.6922 - val_loss: 0.6927 - val_accuracy: 0.5073
Epoch 8/20
30/30 [==============================] - 2s 74ms/step - loss: 0.6747 - accuracy: 0.7303 - val_loss: 0.6916 - val_accuracy: 0.5308
Epoch 9/20
30/30 [==============================] - 2s 73ms/step - loss: 0.6658 - accuracy: 0.7083 - val_loss: 0.6911 - val_accuracy: 0.5227
Epoch 10/20
30/30 [==============================] - 2s 73ms/step - loss: 0.6477 - accuracy: 0.7989 - val_loss: 0.6901 - val_accuracy: 0.5374
Epoch 11/20
30/30 [==============================] - 2s 73ms/step - loss: 0.6121 - accuracy: 0.8204 - val_loss: 0.6895 - val_accuracy: 0.5319
Epoch 12/20
30/30 [==============================] - 2s 72ms/step - loss: 0.5450 - accuracy: 0.8669 - val_loss: 0.6987 - val_accuracy: 0.5382
Epoch 13/20
30/30 [==============================] - 2s 72ms/step - loss: 0.4603 - accuracy: 0.8816 - val_loss: 0.7146 - val_accuracy: 0.5394
Epoch 14/20
30/30 [==============================] - 2s 72ms/step - loss: 0.3577 - accuracy: 0.9413 - val_loss: 0.7043 - val_accuracy: 0.5624
Epoch 15/20
30/30 [==============================] - 2s 72ms/step - loss: 0.2657 - accuracy: 0.9613 - val_loss: 0.7190 - val_accuracy: 0.5716
Epoch 16/20
30/30 [==============================] - 2s 72ms/step - loss: 0.1955 - accuracy: 0.9753 - val_loss: 0.7560 - val_accuracy: 0.5704
Epoch 17/20
30/30 [==============================] - 2s 72ms/step - loss: 0.1357 - accuracy: 0.9881 - val_loss: 0.8460 - val_accuracy: 0.5593
Epoch 18/20
30/30 [==============================] - 2s 72ms/step - loss: 0.0985 - accuracy: 0.9936 - val_loss: 0.8920 - val_accuracy: 0.5555
Epoch 19/20
30/30 [==============================] - 2s 72ms/step - loss: 0.0710 - accuracy: 0.9954 - val_loss: 0.8277 - val_accuracy: 0.5821
Epoch 20/20
30/30 [==============================] - 2s 72ms/step - loss: 0.0540 - accuracy: 0.9965 - val_loss: 0.8434 - val_accuracy: 0.5819

  • Word2Vec์„ ์ •์ƒ์ ์œผ๋กœ ์ž˜ ํ™œ์šฉํ•˜๋ฉด ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ๋ณด๋‹ค 5% ์ด์ƒ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ ์ ˆํ•œ ๋ชจ๋ธ ๊ตฌ์„ฑ, ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๊ฐ์ • ๋ถ„์„ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ•œ์œผ๋กœ ๋Œ์–ด์˜ฌ๋ ค ๋ด…์‹œ๋‹ค
profile
๋งˆ์ผ€ํŒ…์„ ์œ„ํ•œ ์ธ๊ณต์ง€๋Šฅ ์„ค๊ณ„์™€ ์Šคํƒ€ํŠธ์—… Log

0๊ฐœ์˜ ๋Œ“๊ธ€