【发布时间】:2022-01-10 21:56:05
【问题描述】:
我在 M1 MacBook Air 上使用 Keras 构建了一个简单的网络,我安装了官方推荐的 tensorflow-metal 期望获得更快的训练或预测速度。然而,GPU 预测比 CPU 慢 3.5 倍,这让我感到困惑。这是我的代码,在启用和不启用 GPU 的情况下输出:
import time
import numpy as np
from keras.callbacks import ModelCheckpoint
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import layers
class CNNModel(object):
def __init__(self, input_shape=(29, 1), num_classes=6, model_path=None):
self.model = keras.Sequential(
[
keras.Input(input_shape),
layers.Conv1D(16, kernel_size=3, activation="relu"),
layers.MaxPooling1D(pool_size=3),
layers.Conv1D(32, kernel_size=3, activation="relu"),
layers.MaxPooling1D(pool_size=3),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(32, activation="sigmoid"),
layers.Dense(num_classes, activation='softmax')
]
)
self.model.compile(loss="categorical_crossentropy", optimizer='adam', metrics=['accuracy'])
if model_path is not None:
self.model.load_weights(model_path)
def predict(self, x):
preds = self.model.predict(x)
preds = np.argmax(preds, axis=1)
return preds
def fit(self, x, y, model_save_path, batch_size=64, epochs=30):
history = self.model.fit(x, y, batch_size=batch_size, epochs=epochs, validation_split=0.2,
callbacks=[ModelCheckpoint(filepath=model_save_path, save_weights_only=True,
monitor='val_accuracy', mode='max', save_best_only=True)])
if __name__ == '__main__':
model_path = "test.h5"
sample_size = 20000
data_x, data_y = np.random.random((sample_size, 29)), np.random.randint(0, 12, size=(sample_size, 1))
class_num = np.unique(data_y).shape[0]
data_y = keras.utils.to_categorical(data_y, class_num)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(data_x, data_y, test_size=0.2)
model = CNNModel(input_shape=(Xtrain.shape[1], 1), num_classes=class_num)
model.fit(Xtrain, Ytrain, batch_size=512, epochs=10, model_save_path=model_path)
model = CNNModel(input_shape=(Xtrain.shape[1], 1), num_classes=class_num, model_path=model_path)
since = time.time()
preds = model.predict(Xtest)
end = time.time()
print(f'Predict {Xtest.shape[0]} samples in {end - since : .9f}s, {(end - since) / Xtest.shape[0]: .9f}s on avg')
我在使用 GPU 时得到如下输出:
金属设备设置为:Apple M1
系统内存:8.00 GB 最大缓存大小:2.67 GB
2022-01-10 21:07:47.974952: 我 tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] 无法识别平台 GPU ID 0 的 NUMA 节点,默认为 0。 您的内核可能没有使用 NUMA 支持构建。 2022-01-10 21:07:47.975053:我 tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] 创建 TensorFlow 设备 (/job:localhost/replica:0/task:0/device:GPU:0 内存为 0 MB)-> 物理 PluggableDevice(设备:0,名称:METAL,pci 总线 ID: ) 2022-01-10 21:07:48.039236: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] 获取失败 CPU 频率:0 Hz 纪元 1/10 2022-01-10 21:07:48.206631: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] device_type GPU 的插件优化器已启用。 23/25 [==========================>...] - ETA:0s - 损失:2.5483 - 准确度: 0.08282022-01-10 21:07:48.674379:我 tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] device_type GPU 的插件优化器已启用。 25/25 [==============================] - 1s 18ms/step - loss: 2.5446 - 准确度:0.0839 - val_loss:2.4955 - val_accuracy:0.0850 Epoch 2/10 25/25 [==============================] - 0s 15ms/步 - 损失:2.4923 - 准确度:0.0870 - val_loss:2.4852 - val_accuracy:0.0875 Epoch 3/10 25/25 [==============================] - 0s 13ms/step - 损失:2.4864 - 准确度:0.0863 - val_loss:2.4851 - val_accuracy:0.0866 Epoch 4/10 25/25 [==============================] - 0s 13ms/步 - 损失:2.4866 - 准确度:0.0841 - val_loss:2.4851 - val_accuracy:0.0862 Epoch 5/10 25/25 [===============================] - 0s 14ms/步 - 损失:2.4863 - 准确度:0.0826 - val_loss:2.4849 - val_accuracy:0.0869 Epoch 6/10 25/25 [===============================] - 0s 13ms/step - 损失:2.4855 - 准确度:0.0909 - val_loss:2.4850 - val_accuracy:0.0800 纪元 7/10 25/25 [==============================] - 0s 13ms/步 - 损失:2.4861 - 准确度:0.0843 - val_loss:2.4848 - val_accuracy:0.0884 Epoch 8/10 25/25 [==============================] - 0s 13ms/步 - 损失:2.4852 - 准确度:0.0848 - val_loss:2.4852 - val_accuracy:0.0803 纪元 9/10 25/25 [==============================] - 0s 13ms/步 - 损失:2.4848 - 准确度:0.0880 - val_loss:2.4846 - val_accuracy:0.0866 Epoch 10/10 25/25 [==============================] - 0s 13ms/步 - 损失:2.4846 - 准确度:0.0871 - val_loss:2.4851 - val_accuracy:0.0875 2022-01-10 21:07:51.840891:我 tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] device_type GPU 的插件优化器已启用。预测 4000 个样本 0.259644985s,平均 0.000064911s
我在使用 python -m pip uninstall tensorflow-metal 卸载 tensorFlow-metal 后得到了这个:
Epoch 1/10 25/25 [==============================] - 0s 6ms/step - 损失:2.6182 - 准确度:0.0824 - val_loss:2.5252 - val_accuracy: 0.0878 纪元 2/10 25/25 [==============================] - 0s 3ms/步 - 损失:2.5025 -准确度:0.0863 - val_loss:2.4898 - val_accuracy:0.0791 Epoch 3/10 25/25 [=============================] - 0s 3ms/步 - 损失:2.4901 - 准确度:0.0848 - val_loss:2.4873 - val_accuracy:0.0766 Epoch 4/10 25/25 [===================== =========] - 0s 3ms/步 - 损失:2.4894 - 准确度:0.0844 - val_loss:2.4865 - val_accuracy:0.0847 Epoch 5/10 25/25 [============ ===================] - 0s 3ms/步 - 损失:2.4891 - 准确度:0.0802 - val_loss:2.4869 - val_accuracy:0.0797 Epoch 6/10 25/25 [= =============================] - 0s 3ms/步 - 损失:2.4876 - 准确度:0.0811 - val_loss:2.4876 - val_accuracy: 0.0828 纪元 7/10 25/25 [==============================] - 0s 3ms/步 - 损失:2.4866 -准确度:0.0847 - val_loss:2.4873 - val_accuracy:0.0822 Epoch 8/10 25/25 [=============================] - 0s 3ms/步 - 损失:2.4867 - 准确度:0.0841 - val_loss:2.4867 - val_accuracy:0.0838 Epoch 9/10 25 /25 [===============================] - 0s 3ms/步 - 损失:2.4870 - 准确度:0.0860 - val_loss: 2.4867 - val_accuracy: 0.0787 Epoch 10/10 25/25 [==============================] - 0s 3ms/step -损失:2.4860 - 准确度:0.0883 - val_loss:2.4870 - val_accuracy: 0.0744 在 0.073775768s 内预测 4000 个样本, 平均 0.000018444 秒
【问题讨论】:
-
您的模型太小而无法从 GPU 处理中受益,它只是不需要足够的计算和并行度。
标签: macos keras deep-learning tensorflow2.0 apple-m1