Keras 仅在使用验证生成器时卡在第一个 epoch答案

【问题标题】：Keras stuck at first epoch only when using validation generatorKeras 仅在使用验证生成器时卡在第一个 epoch
【发布时间】：2020-10-08 20:57:36
【问题描述】：

我正在阅读一本名为 Deep learning with python 的深度学习书籍。这本书在代码方面很旧，但我阅读了官方文档来完成它。反正

这是一个程序，应该使用可用的数据集来训练一个简单的模型，用于温度的时间序列预测here。

程序是这样的

import numpy as np
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop

#Loading file 
f = open(fname) # fname is the filepath for the csv file
data = f.read()
f.close()
lines = data.split('\n')
header = lines[0].split(',')
lines = lines[1:]

# Converting into numpy array
float_data = np.zeros((len(lines), len(header) - 1))
for i, line in enumerate(lines):
values = [float(x) for x in line.split(',')[1:]]
float_data[i, :] = values

# Normalizing the data 
mean = float_data[:200000].mean(axis=0)
float_data -= mean
std = float_data[:200000].std(axis=0)
float_data /= std

有一个用于创建数据集的生成器函数（我读过tensorflow.keras.utils.Sequence 是首选，但我未能将此生成器转换为序列子类

def generator(data, lookback, delay, min_index, max_index, shuffle=False, batch_size=128, step=6):
  if max_index is None:
    max_index = len(data) - delay - 1
  i = min_index + lookback

  while 1:
    if shuffle:
      rows = np.random.randint(min_index + lookback, max_index, size=batch_size)
    else:
      if i + batch_size >= max_index:
        i = min_index + lookback

      rows = np.arange(i, min(i + batch_size, max_index))
      i += len(rows)

    samples = np.zeros((len(rows),lookback // step,data.shape[-1]))
    targets = np.zeros((len(rows),))

    for j, row in enumerate(rows):
      indices = range(rows[j] - lookback, rows[j], step)
      samples[j] = data[indices]
      targets[j] = data[rows[j] + delay][1]
    yield samples, targets

这里是参数详情

1) data—浮点数据的原始数组，标准化

2) lookback—输入数据应该回退多少时间步。

3) delay——目标应该是未来多少个时间步长。

4) min_index 和 max_index - 数据数组中的索引，用于界定从中提取的时间步长。这对于保留一部分数据用于验证和另一部分用于测试很有用。

5) shuffle—是随机播放样本还是按时间顺序绘制样本。

6) batch_size—每批次的样本数。

7) step - 您对数据进行采样的时间段（以时间步长为单位）。您将其设置为 6 英寸每小时绘制一个数据点。

还有这些生成器

lookback = 1440
step = 6
delay = 144
batch_size = 128

train_gen = generator(float_data,
lookback=lookback,
delay=delay,
min_index=0,
max_index=200000,
shuffle=True,
step=step,
batch_size=batch_size)

val_gen = generator(float_data,
lookback=lookback,
delay=delay,
min_index=200001,
max_index=300000,
step=step,
batch_size=batch_size)

test_gen = generator(float_data,
lookback=lookback,
delay=delay,
min_index=300001,
max_index=None,
step=step,
batch_size=batch_size)

val_steps = (300000 - 200001 - lookback)
test_steps = (len(float_data) - 300001 - lookback)

网络布局如下

model = Sequential()
model.add(layers.Flatten(input_shape=(lookback // step, float_data.shape[-1])))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer=RMSprop(), loss='mae')
model.compile(optimizer=RMSprop(), loss='mae')
history = model.fit(train_gen,steps_per_epoch=500,epochs=20,validation_data=test_gen,validation_steps=test_steps)

但是模型卡在了

Train for 500 steps, validate for 119110 steps
Epoch 1/20
497/500 [============================>.] - ETA: 0s - loss: 0.3524

我的模型在评估验证集时卡住了。为了确保train_gen 和val_gen 我试过了

next(train_gen)
next(val_gen)

而且它们每次都显示不同的值

(array([[[ 0.34593055,  0.49507501,  0.4628141 , ...,  0.16203687,
           0.18470667,  0.84378526],
         [ 0.36243914,  0.6283707 ,  0.59460993, ...,  0.2921889 ,
           0.94414397,  0.60710086],
         [ 0.35182647,  0.64305582,  0.60912981, ...,  1.78242962,
           1.59631612,  0.43507171],
         ...,

这里有什么问题？

【问题讨论】：

你确定你的 test_steps 是正确的吗？您没有使用批量大小来计算它，因此它可能比需要的大得多。
尝试减少批量大小
@DulangaHeshan 我试过了，没用
@Dr.Snoopy 我不确定，这是我从书中学习的一个例子
问题是这些步骤计算没有使用批量大小，所以它们都是错误的，因为步数随着批量大小而变化。最后，如果您为 validation_steps 设置的值比需要的大得多，将使验证阶段花费更长的时间，这似乎会被卡住。我建议您将这些步骤与批量大小分开

标签： python keras deep-learning

【解决方案1】：

验证步骤的数量看起来很可疑，因为它不是根据批量大小计算的，因此它比应有的要大，这会大大延长验证阶段的时间。解决办法是用batch size来划分步数：

val_steps = val_steps // batch_size
test_steps = test_steps // batch_size

这将使步骤具有正确的价值。

【讨论】：