大数据集导致 CNN 训练超出 RAM答案

【问题标题】：CNN training out of RAM cause by big dataset大数据集导致 CNN 训练超出 RAM
【发布时间】：2022-01-19 06:07:56
【问题描述】：

我有一个包含大约 30000 多张图像的大型图像数据集。当我训练模型时，我的系统内存不足，我不想对数据集进行下采样。有什么办法可以解决这个问题吗？

#set up the inizilize integer
batch_size = 16
img_height = 512
img_width = 512
color_mode = 'rgba'

#split the dataset into training testing and validation
#load the dataset as categorical label type
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  train_dir,
  labels='inferred', 
  label_mode='categorical',
  color_mode=color_mode,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
  train_dir,
  labels='inferred', 
  label_mode='categorical',
  color_mode=color_mode,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)


train_ds = train_ds.cache().prefetch(tf.data.AUTOTUNE)
val_ds = val_ds.cache().prefetch(tf.data.AUTOTUNE)

cnn_model = Sequential([
  layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 4)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  #layers.Dropout(0.2),
  layers.Flatten(),
  layers.Dense(64, activation='relu'),
  layers.Dense(5,activation="softmax")
])
cnn_model.compile(
    optimizer='adam',
  loss=tf.losses.CategoricalCrossentropy(),
  metrics=['accuracy','Recall','Precision','AUC']
  )

def model_train(model,patience,namemodel):
    #call back for earlystopping
    callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience)
    #tensorboard call back for profile
    tboard_callback = tf.keras.callbacks.TensorBoard(log_dir = log_dir,
                                                     histogram_freq = 1,
                                                     profile_batch = '500,520')

    model_save_callback = tf.keras.callbacks.ModelCheckpoint(
        filepath=save_dir+'pd/'+namemodel,
        save_weights_only=False,
        monitor='val_loss',
        mode='min',
        save_best_only=True)

    history = model.fit(  
      train_ds,
      validation_data=val_ds,
      epochs=1000,
      callbacks=[callback,model_save_callback],
      batch_size = batch_size
    )
    return history


history = model_train(cnn_model,30,'cnn_v1'.format(img_height,color_mode,batch_size))

我知道有一种方法可以将 30000 多张图像部分发送给模型。但我不知道该怎么做。或者有没有更好的方法来做到这一点？

【问题讨论】：

最简单的方法就是减少你的batch_size
为什么input_shape=(img_height, img_width, 4)中有4？
因为它们是 4 通道 png 图片

标签： python tensorflow machine-learning deep-learning conv-neural-network

【解决方案1】：

当您使用 image_dataset_from_directory 时，将获取图像和标签以进行批量训练。在您的情况下，您将批量大小设置为 16。因此，与加载全部 30000 个相比，每次仅将 16 个图像和标签加载到内存中。如果您仍然遇到内存不足错误，您可以减少批量大小，但除非您有一个非常小的内存，批量大小为 16 应该没问题。您可以考虑减小图像大小。 rgba 格式的 512 X 512 图像需要处理大约 1,000,000 个像素，这将占用大量内存。尝试 256 X 256，它大约是 275K 像素或更好的 128 X 128，它只有大约 65K 像素。我不确定缓存的效果是什么，但我预计它也会增加内存使用量，因为我相信它会在网络训练时将下一批数据提取到内存中。尝试删除这两行代码，看看问题是否消失。

【讨论】：