训练keras模型时GPU性能应该是多少？答案

【问题标题】：How much should GPU performance be when training keras model?训练keras模型时GPU性能应该是多少？
【发布时间】：2020-08-19 22:09:24
【问题描述】：

我在 mnist 数据集上使用 RTX 2080 ti。我已经安装了 tensrflow-gpu。它几乎比在其他环境中仅在 CPU 上运行快 12 倍。

我在训练时检查任务管理器 CPU 和 GPU 性能。以下是训练期间的表现：

GPU 环境： CPU =20% GPU = 10% 训练时间 = 24 秒

CPU 环境： CPU =100% GPU = 10% 训练时间 = 500 秒

我想知道GPU运行在10%是否正常？我可以手动提高性能还是降低性能？

【问题讨论】：

如果你只是在 CPU 上训练，那么在训练期间应该没有使用 GPU，你的 GPU 正被用于其他目的。

标签： python tensorflow keras gpu

【解决方案1】：

这取决于您的应用程序。 GPU 利用率低并不罕见。尝试增加批量大小以提高利用率。

话虽如此，MNIST 规模的网络很小，很难为它们实现高 GPU（或 CPU）效率，我认为 10% 的利用率和 CPU 对您的应用程序来说并不少见。更大的批大小将获得更高的计算效率，这意味着您每秒可以处理更多示例，但您也会获得较低的统计效率，这意味着您需要处理更多示例才能达到目标准确度。所以这是一个权衡。对于微小的字符模型，统计效率在 100 之后会迅速下降，因此可能不值得尝试增加批量大小进行训练。对于推理，您应该尽可能使用最大的批大小。

您还可以设置要在程序中使用的设备类型。在您的情况下，强制您的程序仅使用 GPU 并验证 GPU 利用率。

例如在程序中使用 GPU 仅用于 model.fit

%tensorflow_version 2.x
print(tf.__version__)
# MLP for Pima Indians Dataset saved to single file
import numpy as np
from numpy import loadtxt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import model_from_json

# load pima indians dataset
dataset = np.loadtxt("/content/pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

# define model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Model Summary
model.summary()

# Fit the model
with tf.device("/device:GPU:0"):
  model.fit(X, Y, epochs=150, batch_size=10, verbose=0)

# evaluate the model
scores = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

输出 -

2.2.0
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_18 (Dense)             (None, 12)                108       
_________________________________________________________________
dense_19 (Dense)             (None, 8)                 104       
_________________________________________________________________
dense_20 (Dense)             (None, 1)                 9         
=================================================================
Total params: 221
Trainable params: 221
Non-trainable params: 0
_________________________________________________________________
accuracy: 78.39%

希望这能回答您的问题。快乐学习。

【讨论】：

@osamub - 希望我们已经回答了您的问题。如果您对答案感到满意，请您接受并投票。