๐ŸงฉTensorflow Certification ์ทจ๋“ํ•˜๊ธฐ - part 10. ์‹ค์ „ (House Hold Electric Power Consumption)

vincaยท2023๋…„ 1์›” 3์ผ
0

๐ŸŒ• AI/DL -Tenserflow Certification

๋ชฉ๋ก ๋ณด๊ธฐ
10/11

House Hold Electric Power Consumption

  • Individual House Hold Electric Power Consumption Dataset์„ ํ™œ์šฉํ•œ ์˜ˆ์ธก

ABOUT THE DATASET
Original Source:
https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption
The original 'Individual House Hold Electric Power Consumption Dataset'
has Measurements of electric power consumption in one household with
a one-minute sampling rate over a period of almost 4 years.
Different electrical quantities and some sub-metering values are available.
For the purpose of the examination we have provided a subset containing
the data for the first 60 days in the dataset. We have also cleaned the
dataset beforehand to remove missing values. The dataset is provided as a
csv file in the project.

The dataset has a total of 7 features ordered by time.

INSTRUCTIONS
Complete the code in following functions:
1. windowed_dataset()
2. solution_model()
The model input and output shapes must match the following
specifications.
1. Model input_shape must be (BATCH_SIZE, N_PAST = 24, N_FEATURES = 7),
since the testing infrastructure expects a window of past N_PAST = 24
observations of the 7 features to predict the next 24 observations of
the same features.
2. Model output_shape must be (BATCH_SIZE, N_FUTURE = 24, N_FEATURES = 7)
3. DON'T change the values of the following constants
N_PAST, N_FUTURE, SHIFT in the windowed_dataset()
BATCH_SIZE in solution_model() (See code for additional note on
BATCH_SIZE).
4. Code for normalizing the data is provided - DON't change it.
Changing the normalizing code will affect your score.
HINT: Your neural network must have a validation MAE of approximately 0.055 or
less on the normalized validation dataset for top marks.
WARNING: Do not use lambda layers in your model, they are not supported
on the grading infrastructure.
WARNING: If you are using the GRU layer, it is advised not to use the
'recurrent_dropout' argument (you can alternatively set it to 0),
since it has not been implemented in the cuDNN kernel and may
result in much longer training times.

Solution

์ˆœ์„œ ์š”์•ฝ

  1. import: ํ•„์š”ํ•œ ๋ชจ๋“ˆ import
  2. ์ „์ฒ˜๋ฆฌ: ํ•™์Šต์— ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  3. ๋ชจ๋ธ๋ง(model): ๋ชจ๋ธ์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
  4. ์ปดํŒŒ์ผ(compile): ๋ชจ๋ธ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  5. ํ•™์Šต (fit): ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.

1. import ํ•˜๊ธฐ

ํ•„์š”ํ•œ ๋ชจ๋“ˆ์„ import ํ•ฉ๋‹ˆ๋‹ค.

import urllib
import os
import zipfile
import pandas as pd

import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv1D, LSTM, Bidirectional
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint

2.1 ์ „์ฒ˜๋ฆฌ (Load dataset)

tensorflow-datasets๋ฅผ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

def download_and_extract_data():
    url = 'https://storage.googleapis.com/download.tensorflow.org/data/certificate/household_power.zip'
    urllib.request.urlretrieve(url, 'household_power.zip')
    with zipfile.ZipFile('household_power.zip', 'r') as zip_ref:
        zip_ref.extractall()
download_and_extract_data()
df = pd.read_csv('household_power_consumption.csv', sep=',', infer_datetime_format=True, index_col='datetime', header=0)
df.head(10)

2.2 ์ „์ฒ˜๋ฆฌ (๋ฐ์ดํ„ฐ ์ •๊ทœํ™”)

๋ฐ์ดํ„ฐ์˜ ์Šค์ผ€์ผ(Scale)์„ 0 ~ 1 ์‚ฌ์ด๋กœ ์ •๊ทœํ™” ํ•ฉ๋‹ˆ๋‹ค.

def normalize_series(data, min, max):
    data = data - min
    data = data / max
    return data
# FEATURES์— ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ Column ๊ฐœ์ˆ˜ ๋Œ€์ž…
N_FEATURES = len(df.columns)

# ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ numpy array์œผ๋กœ ๊ฐ€์ ธ์™€ data์— ๋Œ€์ž…
data = df.values

# ๋ฐ์ดํ„ฐ ์ •๊ทœํ™”
data = normalize_series(data, data.min(axis=0), data.max(axis=0))
data

2.3 ์ „์ฒ˜๋ฆฌ (๋ฐ์ดํ„ฐ ๋ถ„ํ• )

# ๋ฐ์ดํ„ฐ์…‹ ๋ถ„ํ•  (0.8). 
# ๊ธฐ์กด 0.5 -> 0.8๋กœ ๋ณ€๊ฒฝ // ๋‹ค๋ฅธ ๋น„์œจ๋กœ ๋ณ€๊ฒฝ ๊ฐ€๋Šฅ
split_time = int(len(data) * 0.8)
x_train = data[:split_time]
x_valid = data[split_time:]

2.4 ์ „์ฒ˜๋ฆฌ (Windowed Dataset ์ƒ์„ฑ)

This line converts the dataset into a windowed dataset where a
window consists of both the observations to be included as features and the targets.

Don't change the shift parameter. The test windows are
created with the specified shift and hence it might affect your
scores. Calculate the window size so that based on the past 24 observations (observations at time steps t=1,t=2,...t=24) of the 7 variables

in the dataset, you predict the next 24 observations
(observations at time steps t=25,t=26....t=48) of the 7 variables of the dataset.

Hint: Each window should include both the past observations and
the future observations which are to be predicted. Calculate the
window size based on n_past and n_future.

def windowed_dataset(series, batch_size, n_past=24, n_future=24, shift=1):
    ds = tf.data.Dataset.from_tensor_slices(series)
    ds = ds.window(size=(n_past + n_future), shift = shift, drop_remainder = True)
    ds = ds.flat_map(lambda w: w.batch(n_past + n_future))
    ds = ds.shuffle(len(series))
    ds = ds.map(
        lambda w: (w[:n_past], w[n_past:])
    )
    return ds.batch(batch_size).prefetch(1)

train_set๊ณผ valid_set์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

validation_generator์— ๋Œ€ํ•œ from_from_directory๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.

# ๋‹ค์Œ 4๊ฐœ์˜ ์˜ต์…˜์€ ์ฃผ์–ด ์ง‘๋‹ˆ๋‹ค.
BATCH_SIZE = 32 # ๋ณ€๊ฒฝ ๊ฐ€๋Šฅํ•˜๋‚˜ ๋” ์˜ฌ๋ฆฌ๋Š” ๊ฒƒ์€ ๋น„์ถ” (๋‚ด๋ฆฌ๋Š” ๊ฒƒ์€ ๊ฐ€๋Šฅํ•˜๋‚˜ ์‹œ๊ฐ„ ์˜ค๋ž˜ ๊ฑธ๋ฆผ)
N_PAST = 24 # ๋ณ€๊ฒฝ ๋ถˆ๊ฐ€.
N_FUTURE = 24 # ๋ณ€๊ฒฝ ๋ถˆ๊ฐ€.
SHIFT = 1 # ๋ณ€๊ฒฝ ๋ถˆ๊ฐ€.
train_set = windowed_dataset(series=x_train, 
                             batch_size=BATCH_SIZE,
                             n_past=N_PAST, 
                             n_future=N_FUTURE,
                             shift=SHIFT)

valid_set = windowed_dataset(series=x_valid, 
                             batch_size=BATCH_SIZE,
                             n_past=N_PAST, 
                             n_future=N_FUTURE,
                             shift=SHIFT)

3. ๋ชจ๋ธ ์ •์˜ (Sequential)

์ด์ œ Modeling์„ ํ•  ์ฐจ๋ก€์ž…๋‹ˆ๋‹ค.

model = tf.keras.models.Sequential([
    Conv1D(filters=32, 
            kernel_size=3,
            padding="causal",
            activation="relu",
            input_shape=[N_PAST, 7],
            ),
    Bidirectional(LSTM(32, return_sequences=True)),
    Dense(32, activation="relu"),
    Dense(16, activation="relu"),
    Dense(N_FEATURES)
])

๋ชจ๋ธ ๊ฒฐ๊ณผ ์š”์•ฝ

model.summary()

4. ์ปดํŒŒ์ผ (compile)

# learning_rate=0.0005, Adam ์˜ต์น˜๋งˆ์ด์ €
optimizer =  tf.keras.optimizers.Adam(learning_rate=0.0005)

model.compile(loss='mae',
              optimizer=optimizer,
              metrics=["mae"]
              )

ModelCheckpoint: ์ฒดํฌํฌ์ธํŠธ ์ƒ์„ฑ

val_loss ๊ธฐ์ค€์œผ๋กœ epoch ๋งˆ๋‹ค ์ตœ์ ์˜ ๋ชจ๋ธ์„ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ModelCheckpoint๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

  • checkpoint_path๋Š” ๋ชจ๋ธ์ด ์ €์žฅ๋  ํŒŒ์ผ ๋ช…์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
  • ModelCheckpoint์„ ์„ ์–ธํ•˜๊ณ , ์ ์ ˆํ•œ ์˜ต์…˜ ๊ฐ’์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค.
checkpoint_path='model/my_checkpoint.ckpt'

checkpoint = ModelCheckpoint(checkpoint_path,
                             save_weights_only=True,
                             save_best_only=True,
                             monitor='val_loss',
                             verbose=1,
                             )

5. ํ•™์Šต (fit)

model.fit(train_set, 
        validation_data=(valid_set), 
        epochs=20, 
        callbacks=[checkpoint], 
        )

ํ•™์Šต ์™„๋ฃŒ ํ›„ Load Weights (ModelCheckpoint)

ํ•™์Šต์ด ์™„๋ฃŒ๋œ ํ›„์—๋Š” ๋ฐ˜๋“œ์‹œ load_weights๋ฅผ ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด, ์—ด์‹ฌํžˆ ModelCheckpoint๋ฅผ ๋งŒ๋“  ์˜๋ฏธ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค.

model.load_weights(checkpoint_path)

๊ฒ€์ฆ์ด ํ•˜๊ณ ์‹ถ๋‹ค๋ฉด..

# HINT: Your neural network must have a validation MAE of approximately 0.055 or
# less on the normalized validation dataset for top marks.
model.evaluate(valid_set)
profile
๋ถ‰์€ ๋ฐฐ ์˜ค์ƒ‰ ๋”ฑ๋‹ค๊ตฌ๋ฆฌ ๊ฐœ๋ฐœ์ž ๐ŸฆƒCloud & DevOps

0๊ฐœ์˜ ๋Œ“๊ธ€