【问题标题】:Why does Keras' fit_generator sometimes not call on_epoch_end() on validation data generator?为什么 Keras 的 fit_generator 有时不会在验证数据生成器上调用 on_epoch_end()?
【发布时间】:2020-01-15 13:26:34
【问题描述】:

我注意到 Keras 有时无法调用我的 keras.utils.Sequence 验证数据 生成器的 on_epoch_end() 方法,尤其是当模型评估的每一步都很快时(例如,当批量大小为小)。

例如,这是一个最小的工作示例,用于演示 Keras 在批量大小为 1 和批量大小为 64 上的行为差异:

    import numpy as np
    from tensorflow.keras import layers, models
    from tensorflow.keras.utils import Sequence

    FEATURE_SIZE = 512 ** 2


    class DataGenerator(Sequence):

        def __init__(self, batch_size, log=False):
            self.batch_size = batch_size
            self.log = log

        def __len__(self):
            return 1

        def __getitem__(self, i):
            return np.ones((self.batch_size, FEATURE_SIZE)), np.ones((self.batch_size, 1))  # Some dummy data

        def on_epoch_end(self):
            if self.log:
                print('on_epoch_end() called')


    def train(batch_size):
        print('Training with batch_size =', batch_size)
        training_generator = DataGenerator(batch_size)
        test_generator = DataGenerator(batch_size, log=True)

        model = models.Sequential()
        model.add(layers.Dense(4, activation='sigmoid', input_shape=[FEATURE_SIZE]))
        model.add(layers.Dense(1, activation='sigmoid', input_shape=[FEATURE_SIZE]))
        model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

        model.fit_generator(generator=training_generator, validation_data=test_generator, epochs=5, verbose=0)


    train(batch_size=1)
    train(batch_size=64)

我得到了输出:

    Training with batch_size = 1
    on_epoch_end() called

    Training with batch_size = 64
    on_epoch_end() called
    on_epoch_end() called
    on_epoch_end() called
    on_epoch_end() called
    on_epoch_end() called

从输出中可以清楚地看出,on_epoch_end() 被调用的次数取决于批量大小。

这是非常有问题的,因为我的数据生成器希望 on_epoch_end() 在每个 epoch 之后被可靠地调用一次。

有谁知道如何解决这个问题?


我使用的是 Keras 版本:

    tensorflow.keras.__version__
    Out[143]: '2.1.6-tf'

【问题讨论】:

  • 嗨@mchen,尝试在google colab上执行您的代码,并且工作正常。 tf 版本 2.0.0

标签: tensorflow machine-learning keras tf.keras


【解决方案1】:

我在您的代码中添加了以下语句并执行,on_epoch_end 只为 batch_size=64 调用了一次,而对于 batch_size=1 甚至没有调用一次。另外,我使用的是相同版本的tensorflow.keras.__version__

import tensorflow as tf
print(tensorflow.keras.__version__)

代码 -

import numpy as np
import tensorflow as tf
print(tensorflow.keras.__version__)
from tensorflow.keras import layers, models
from tensorflow.keras.utils import Sequence

FEATURE_SIZE = 512 ** 2


class DataGenerator(Sequence):

    def __init__(self, batch_size, log=False):
        self.batch_size = batch_size
        self.log = log

    def __len__(self):
        return 1

    def __getitem__(self, i):
        return np.ones((self.batch_size, FEATURE_SIZE)), np.ones((self.batch_size, 1))  # Some dummy data

    def on_epoch_end(self):
        if self.log:
            print('on_epoch_end() called')


def train(batch_size):
    print('Training with batch_size =', batch_size)
    training_generator = DataGenerator(batch_size)
    test_generator = DataGenerator(batch_size, log=True)

    model = models.Sequential()
    model.add(layers.Dense(4, activation='sigmoid', input_shape=[FEATURE_SIZE]))
    model.add(layers.Dense(1, activation='sigmoid', input_shape=[FEATURE_SIZE]))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

    model.fit_generator(generator=training_generator, validation_data=test_generator, epochs=5, verbose=0)

train(batch_size=1)
train(batch_size=64)

输出 -

2.1.6-tf
Training with batch_size = 1
Training with batch_size = 64
on_epoch_end() called

对您的代码进行了一些修改以使用callbacks,如下所示,它按预期工作正常。

固定代码 -

import numpy as np
import tensorflow as tf
print(tensorflow.keras.__version__)
from tensorflow.keras import layers, models
from tensorflow.keras.utils import Sequence

FEATURE_SIZE = 512 ** 2

# Define the Required Callback Function
class printepoch(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
      print("on_epoch_end() called")

printepochs = printepoch() 

class DataGenerator(Sequence):

    def __init__(self, batch_size, log=False):
        self.batch_size = batch_size
        self.log = log

    def __len__(self):
        return 1

    def __getitem__(self, i):
        return np.ones((self.batch_size, FEATURE_SIZE)), np.ones((self.batch_size, 1))  # Some dummy data

    # def on_epoch_end(self):
    #     if self.log:
    #         print('on_epoch_end() called')


def train(batch_size):
    print('Training with batch_size =', batch_size)
    training_generator = DataGenerator(batch_size)
    test_generator = DataGenerator(batch_size)

    model = models.Sequential()
    model.add(layers.Dense(4, activation='sigmoid', input_shape=[FEATURE_SIZE]))
    model.add(layers.Dense(1, activation='sigmoid', input_shape=[FEATURE_SIZE]))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

    model.fit_generator(generator=training_generator, validation_data=test_generator, epochs=5, verbose=0, callbacks=[printepochs])


train(batch_size=1)
train(batch_size=64)

输出 -

2.1.6-tf
Training with batch_size = 1
on_epoch_end() called
on_epoch_end() called
on_epoch_end() called
on_epoch_end() called
on_epoch_end() called
Training with batch_size = 64
on_epoch_end() called
on_epoch_end() called
on_epoch_end() called
on_epoch_end() called
on_epoch_end() called

但是当我在2.3.0-tf 版本中运行您的代码时,它运行得非常好。可能在以后的版本中修复了。

希望这能回答你的问题。

【讨论】:

    猜你喜欢
    • 2020-04-25
    • 2020-03-14
    • 2019-09-11
    • 2020-07-19
    • 2018-11-23
    • 2021-06-12
    • 2019-12-04
    • 2017-07-22
    • 2018-08-19
    相关资源
    最近更新 更多