【发布时间】:2021-05-01 09:28:58
【问题描述】:
我按照 TF 帮助页面上的说明安装了 Cuda 和 cuDNN,看起来一切正常。如果我打印可用的 GPU,我会得到:
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Out: Num GPUs Available: 1
此外,当我开始在输出中训练顺序模型时,我发现所有必要的库都已正确加载,并且 GPU 设备已成功创建:
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4733 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
但我没有看到训练表现有任何重大改进。它与之前在 CPU 上训练时大致相同,我认为我的 RTX 3060 应该会提供一点提升。
在训练一个相对简单的序列模型时,我是否应该看到改进?
编辑: 如果我禁用 GPU 训练并仅使用 CPU 进行训练:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
模型在 CPU 上的训练时间是 21.14 秒,在 GPU 上训练需要 57.59(!!!) 秒。
我也没有看到训练期间 GPU 负载增加如预期:
还有我正在训练的模型的代码:
import datetime as dt
# import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf
from tensorflow import keras
import numpy as np
EPOCHS = 50
BATCH_SIZE = 128
VERBOSE = 1
NB_CLASSES = 10 # Number of outputs
N_HIDDEN = 128
VALIDATION_SPLIT = 0.2
DROPOUT = 0.3
mnist = keras.datasets.mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# X_train is 60,000 rows of 28x28 values
# Reshape it to 60,000x784
RESHAPED = 784
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
X_train = X_train.reshape(60000, RESHAPED)
X_test = X_test.reshape(10000, RESHAPED)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# Normalize inputs between 0 and 1
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# One-hot encoding of labels
Y_train = tf.keras.utils.to_categorical(Y_train, NB_CLASSES)
Y_test = tf.keras.utils.to_categorical(Y_test, NB_CLASSES)
# Build the model
model = tf.keras.models.Sequential()
model.add(keras.layers.Dense(N_HIDDEN, input_shape=(RESHAPED,),
name='dense_layer', activation='relu'))
model.add(keras.layers.Dropout(DROPOUT))
model.add(keras.layers.Dense(N_HIDDEN, input_shape=(RESHAPED,),
name='dense_layer2', activation='relu'))
model.add(keras.layers.Dropout(DROPOUT))
model.add(keras.layers.Dense(NB_CLASSES, input_shape=(RESHAPED,),
name='dense_layer3', activation='softmax'))
# Print summary of the model
model.summary()
# Compiling the model
model.compile(optimizer='Adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
t = dt.datetime.now()
# Training the model
model.fit(X_train, Y_train, batch_size=BATCH_SIZE,
epochs=EPOCHS, verbose=VERBOSE,
validation_split=VALIDATION_SPLIT)
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, Y_test)
print('Test accuracy: ', test_acc)
print(f'Training elapsed: {dt.datetime.now()-t}')
【问题讨论】:
-
能否分享代码(模型、训练循环)以及CPU和GPU上的计时结果?
-
@LouisLac,修改了问题。它实际上在 GPU 上明显慢。不知道为什么。
-
你检查过这个吗? :stackoverflow.com/questions/42097115/…
-
关于 GPU 负载,重要的是要了解,当模型浅并且数据很复杂(MNIST 只是 28x28 图像)时,GPU 的性能实际上是最好的。所以我猜是因为 MNIST 示例非常简单,所以根本不使用 GPU 资源。另外要检查是否实际使用了 gpu,我会推荐这个:
tf.test.is_gpu_available() -
@aSaffary,谢谢,没有看到那个。
标签: python tensorflow