【发布时间】:2022-01-19 06:07:56
【问题描述】:
我有一个包含大约 30000 多张图像的大型图像数据集。当我训练模型时,我的系统内存不足,我不想对数据集进行下采样。有什么办法可以解决这个问题吗?
#set up the inizilize integer
batch_size = 16
img_height = 512
img_width = 512
color_mode = 'rgba'
#split the dataset into training testing and validation
#load the dataset as categorical label type
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
train_dir,
labels='inferred',
label_mode='categorical',
color_mode=color_mode,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
train_dir,
labels='inferred',
label_mode='categorical',
color_mode=color_mode,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
train_ds = train_ds.cache().prefetch(tf.data.AUTOTUNE)
val_ds = val_ds.cache().prefetch(tf.data.AUTOTUNE)
cnn_model = Sequential([
layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 4)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
#layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(5,activation="softmax")
])
cnn_model.compile(
optimizer='adam',
loss=tf.losses.CategoricalCrossentropy(),
metrics=['accuracy','Recall','Precision','AUC']
)
def model_train(model,patience,namemodel):
#call back for earlystopping
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience)
#tensorboard call back for profile
tboard_callback = tf.keras.callbacks.TensorBoard(log_dir = log_dir,
histogram_freq = 1,
profile_batch = '500,520')
model_save_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=save_dir+'pd/'+namemodel,
save_weights_only=False,
monitor='val_loss',
mode='min',
save_best_only=True)
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=1000,
callbacks=[callback,model_save_callback],
batch_size = batch_size
)
return history
history = model_train(cnn_model,30,'cnn_v1'.format(img_height,color_mode,batch_size))
我知道有一种方法可以将 30000 多张图像部分发送给模型。但我不知道该怎么做。或者有没有更好的方法来做到这一点?
【问题讨论】:
-
最简单的方法就是减少你的batch_size
-
为什么
input_shape=(img_height, img_width, 4)中有4? -
因为它们是 4 通道 png 图片
标签: python tensorflow machine-learning deep-learning conv-neural-network