如何正确地从 keras fit_generator() 迁移到 fit()？答案

【问题标题】：How to migrate from keras fit_generator() to fit() properly?如何正确地从 keras fit_generator() 迁移到 fit()？
【发布时间】：2020-12-25 10:03:48
【问题描述】：

我有 2 个数据集和一个权重数组。（train_X, validation_X, train_Y, validation_Y 和 sampleW） X 集是 3 维的，而 Y 集是 2 维 numpy 数组。 sampleW 是一维 numpy 数组。

如何从fit_generator()成功迁移到fit()函数？

就：

是“fit(x=None, y=None”，代表train_X, train_Y？
如何分别传递验证数据？ (validation_X, validation_Y)
我可以像以前一样通过sampleW吗？
如何在fit()上训练分段数据？
最重要的是：如何在没有生成器的情况下做到这一点？

这是一个最小的可重现性（我目前正在努力找出为什么除 1 之外的任何其他批量大小都会出错，但 >1 也应该可用）

# -*- coding: utf-8 -*-
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout,LSTM,BatchNormalization
import tensorflow as tf, numpy as np; from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint 
tensorboard_path= r"C:\Users\user\documents\session"  # <--- your path
checkpoint_path = tensorboard_path 

BATCH_SIZE = 1
EPOCHS, Input_shape, labels =  3, (20,4),6
train_X,train_Y = np.asarray([np.random.random(Input_shape) for x in range(100)]), np.random.random((100,labels))
validation_X,validation_Y = np.asarray([np.random.random(Input_shape) for x in range(50)]), np.random.random((50,labels))
sampleW = np.random.random((100,1)) 

class CustomGenerator_SampleW(tf.keras.utils.Sequence) :
    def __init__(self, list_x, labels, batch_size, sample_weights=None) : 
        self.labels         = labels
        self.batch_size     = batch_size
        self.list_x         = list_x
        self.sample_weights = sample_weights
        
    def __len__(self) :
        return (np.ceil(len(self.list_x) / float(self.batch_size))).astype(np.int)
    def __getitem__(self, idx) :
        batch_x      = self.list_x[idx * self.batch_size : (idx+1) * self.batch_size]
        batch_y      = self.labels[idx * self.batch_size : (idx+1) * self.batch_size]
        batch_weight = self.sample_weights[idx * self.batch_size : (idx+1) * self.batch_size]
        return np.array(batch_x),np.array(batch_y), np.array(batch_weight)

class CustomGenerator(tf.keras.utils.Sequence) :
    def __init__(self, list_x, labels, batch_size) : 
        self.labels         = labels
        self.batch_size     = batch_size
        self.list_x         = list_x 
        
    def __len__(self) :
        return (np.ceil(len(self.list_x) / float(self.batch_size))).astype(np.int)
    def __getitem__(self, idx) :
        batch_x      = self.list_x[idx * self.batch_size : (idx+1) * self.batch_size]
        batch_y      = self.labels[idx * self.batch_size : (idx+1) * self.batch_size] 
        return np.array(batch_x),np.array(batch_y)
 

model = Sequential()
model.add(LSTM(242, input_shape=Input_shape, return_sequences=True))
model.add(Dropout(0.3)); model.add(BatchNormalization())  

model.add(LSTM(242, return_sequences=True))
model.add(Dropout(0.3)); model.add(BatchNormalization())

model.add(Dense(labels, activation='tanh')); model.add(Dropout(0.3))

opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
model.compile(loss='mean_absolute_error',optimizer=opt,metrics=['mse'])

if sampleW is not None:
    train_batch_gen   = CustomGenerator_SampleW(train_X, train_Y, BATCH_SIZE, sample_weights=sampleW)
else: train_batch_gen = CustomGenerator(train_X, train_Y, BATCH_SIZE)
validation_batch_gen  = CustomGenerator(validation_X, validation_Y, BATCH_SIZE)

tensorboard = TensorBoard(tensorboard_path)
checkpoint = ModelCheckpoint(checkpoint_path, monitor='val_loss', verbose=1, save_best_only=True, mode='min') 

model.fit_generator(train_batch_gen, steps_per_epoch=None,  epochs=EPOCHS, 
                    validation_data = validation_batch_gen, callbacks=[tensorboard,checkpoint])

【问题讨论】：

标签： python machine-learning keras data-science lstm

【解决方案1】：

这是由于您的模型输出和提供的标签的形状不匹配。

模型架构：

如您所见，模型的输出形状为 (batch_size, 20, 6)，而标签的形状为 (batch_size, 6)，两者不兼容。

为什么这适用于 batch_size = 1？
这是因为 TensorFlow 使用了一种称为广播的技术。例如：

x = np.ones(shape = (1,20,6))
array([[[1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1.]]])


y = np.ones(shape = (1,6))
array([[1., 1., 1., 1., 1., 1.]])


y-x
array([[[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]]])

更多信息请参见this。

但是当你使用batch_size = 10时，广播不再可能。

代码：

x = np.ones(shape = (10,20,6))
y = np.ones(shape = (10,6))
y-x

输出：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-102-4a65323a80fa> in <module>
      1 x = np.ones(shape = (10,20,6))
      2 y = np.ones(shape = (10,6))
----> 3 y-x

ValueError: operands could not be broadcast together with shapes (10,6) (10,20,6)

可以通过在 lstm 层之后添加一个 flatten 层来将 2d 向量转换为 1d 向量来固定模型的形状。

代码：

model = Sequential()
model.add(LSTM(242, input_shape=Input_shape, return_sequences=True))
model.add(Dropout(0.3)); model.add(BatchNormalization())  

model.add(LSTM(242, return_sequences=True))
model.add(Dropout(0.3)); model.add(BatchNormalization())
model.add(Flatten())
model.add(Dropout(0.3))
model.add(Dense(labels, activation='tanh')) 

opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
model.compile(loss='mean_absolute_error',optimizer=opt,metrics=['mse'])
tf.keras.utils.plot_model(model, 'my_first_model.png', show_shapes=True)

模型架构：

最终使用model.fit()：

model.fit(train_batch_gen, epochs=EPOCHS, validation_data = validation_batch_gen)

输出：

Epoch 1/3
2/2 [==============================] - 1s 708ms/step - loss: 0.2891 - mse: 0.5739 - val_loss: 0.4078 - val_mse: 0.2461
Epoch 2/3
2/2 [==============================] - 0s 46ms/step - loss: 0.2229 - mse: 0.3151 - val_loss: 0.3867 - val_mse: 0.2225
Epoch 3/3
2/2 [==============================] - 0s 49ms/step - loss: 0.2315 - mse: 0.3341 - val_loss: 0.3813 - val_mse: 0.2161

【讨论】：

谢谢你的回答^^。为了确保我 100% 正确理解它： 1. 问题出现了，因为模型形状与标签形状不匹配。而不是因为 labelshpe 是 2d 并且应该在另一个数组中。（我说的对吗？） 2. 因此，您建议在最后一个隐藏层和输出层之间添加一个展平层。（只是扁平化对吗？） 3.当我理解正确时，这意味着我实际上也可以传递形状为 (10,Phi,6) 的 3d 标签。 “Phi”可以与 X 标签的序列长度不同吗？（怎么做？）
刚刚发现，好像是和generator中的sampleWeights有关。
只要您的损失函数可以处理标签和预测形状，您的算法就可以工作。为什么你认为它与样本权重有关？
我忽略了，我的示例权重生成函数吐出了一个太短的数组 ^^ 但这发生在使用 flatten() 修复第一个错误之后的第二个。非常感谢，祝您度过愉快的一周