【发布时间】:2020-01-13 19:32:41
【问题描述】:
我正在尝试使用带有 GPU (GeForce 940 M) 的 tensorflow 2.0 训练深度残差网络(ResNet34,总共有 21,302,722 个参数)。顺序模型定义如下:
model = keras.models.Sequential()
model.add(DefaultConv2D(64, kernel_size=7, strides=2,
input_shape=[224, 224, 3]))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Activation("relu"))
model.add(keras.layers.MaxPool2D(pool_size=3, strides=2, padding="SAME"))
prev_filters = 64
for filters in [64] * 3 + [128] * 4 + [256] * 6 + [512] * 3:
strides = 1 if filters == prev_filters else 2
model.add(ResidualUnit(filters, strides=strides))
prev_filters = filters
model.add(keras.layers.GlobalAvgPool2D())
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(2, activation="softmax"))
model.summary()
这个模型是经过训练的:
history = model.fit(xtrain, ytrain, epochs=10, validation_data=[xtest, ytest])
xtrain 的形状为(2000, 224, 224, 3),xtest 的形状为(1000, 224, 224, 3)。
然后我收到OOM错误信息:
ResourceExhaustedError: OOM when allocating tensor with shape[256,256,3,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node residual_unit_28/conv2d_64/Conv2D}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[GroupCrossDeviceControlEdges_0/training/Nadam/Nadam/Const/_287]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference_keras_scratch_graph_30479]
这个错误是我的电脑内存(它有 16 GB RAM)还是一些不正确的配置引起的?
【问题讨论】:
标签: tensorflow deep-learning gpu