[Tensorflow] WARNING:tensorflow:Your input ran out of data

Junmo KIM·2021년 1월 17일

tensorflow

목록 보기

2/2

주피터 노트북에서 tf2 자격증 공부를 하던 도중

WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 10000 batches). You may need to use the repeat() function when building your dataset.

라는 경고가 발생하였다.

학습 데이터는 2개의 클래스로 구성된 1027개의 이미지 데이터이며 fit에 넣은 인자는 아래와 같다.

# Add our data-augmentation parameters to ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255.,
                                   rotation_range = 40,
                                   width_shift_range = 0.2,
                                   height_shift_range = 0.2,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator( rescale = 1.0/255. )

# Flow training images in batches of 20 using train_datagen generator
train_generator = train_datagen.flow_from_directory(train_dir,
                                                    batch_size = 20,
                                                    class_mode = 'binary', 
                                                    target_size = (150, 150))     

# Flow validation images in batches of 20 using test_datagen generator
validation_generator =  test_datagen.flow_from_directory( validation_dir,
                                                          batch_size  = 20,
                                                          class_mode  = 'binary', 
                                                          target_size = (150, 150))

history = model.fit(
            train_generator,
            validation_data = validation_generator,
            steps_per_epoch = 100,
            epochs = 100,
            validation_steps = 50,
            verbose = 2,
            callbacks=[callbacks])

steps_per_epoch = 100
batch_size = 20

위와 같이 설정을 했었는데 구글링을 해보니 steps_per_epoch의 값은 (Trainging size) / (batch_size)로 설정한다고 한다. 즉 만약 model.fit에서 steps_per_epoch를 따로 설정하지 않았더라면 해당 값은 1027 // 20 = 52로 자동으로 설정된다. 따라서 학습 progress bar를 보면 1epoch당 step이 52로 설정된다 (validation set도 마찬가지).

즉 사전에 설정한 batch의 크기와 학습데이터의 크기에 영향을 받게 된다. 위 경고는 가지고 있는 데이터 수에 비해 너무 많은 step을 설정해서 발생하는 것 같다.

하지만 어제 작성한 ImageDataGenerator에 대한 글에서는 augmentation을 위해 steps_per_epoch를 조절한다라고 나와있는데, 그렇다면 해당 인자의 값을 디폴트 값인 52보다 크게 잡으면 자동으로 augmentaion이 진행되어서 학습이 되어야 하는게 아닌가라는 의문이 생겼다. 왜 자동으로 augmentation이 되지 않는걸까?
검색해보면 ImageDataGenerator가 augmentation을 해주지만 데이터의 수를 늘리지는 않는다라고 나오는데... 이 부분은 좀 더 찾아봐야 할 것 같다.

Junmo KIM

Integrated M.S.-Ph.D. (Korea Univ.)

이전 포스트

[Tensorflow] WARNING:tensorflow:Your input ran out of data

tensorflow

[Tensorflow] ImageDataGenerator

0개의 댓글