【问题标题】:I cannot get Tensorflow 2.0 to work on my GPU我无法让 Tensorflow 2.0 在我的 GPU 上运行
【发布时间】:2021-04-26 21:32:37
【问题描述】:

我一直在使用 Linux Mint 的计算机上使用 Tensorflow 编写程序。无论出于何种原因,我都无法让 Tensorflow 在我的 GPU 上运行。

2021-04-26 15:46:11.462612:W tensorflow/stream_executor/platform/default/dso_loader.cc:60] 无法加载动态库“libcudart.so.11.0”; dlerror:libcudart.so.11.0:无法打开共享对象文件:没有这样的文件或目录 2021-04-26 15:46:11.462650: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] 如果您的机器上没有设置 GPU,请忽略上面的 cudart dlerror。

我知道我已经安装了 CUDA,因为对于 PyTorch,GPU 工作正常:

mydevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(mydevice)

产量

cuda

另外,我用 tensorflow 运行了一个程序,我得到:

START TIME:  Mon Apr 26 16:34:24 2021
2021-04-26 16:34:24.499178: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-26 16:34:24.499862: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-26 16:34:24.526372: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-26 16:34:24.526781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 computeCapability: 7.5
coreClock: 1.56GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 119.24GiB/s
2021-04-26 16:34:24.526900: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.526986: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.527069: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.528676: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-26 16:34:24.528994: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-26 16:34:24.530990: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-04-26 16:34:24.531125: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.531230: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.531245: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-04-26 16:34:24.531641: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-26 16:34:24.532140: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-26 16:34:24.532178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-26 16:34:24.532192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      
2021-04-26 16:34:24.592917: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-04-26 16:34:24.593369: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2400000000 Hz

我相信我使用 conda 在 anaconda 上安装了 tensorflow,尽管构建来自 PyPi。请让我知道你的建议。谢谢。

【问题讨论】:

标签: python tensorflow anaconda gpu


【解决方案1】:

从您的错误日志看来,tensorflow 正在使用您的 GPU (GTX 1650)。但是,问题是 cudatoolkit and the cudnn version 可能与您的 tensorflow 版本不兼容。 TF 对这些要求相当具体。您需要注意的错误行如下:

2021-04-26 16:34:24.526900: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library **'libcudart.so.11.0'**; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.526986: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library **'libcublas.so.11'**; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.527069: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library **'libcublasLt.so.11'**; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.528676: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10

2021-04-26 16:34:24.531125: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library **'libcusparse.so.11'**; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.531230: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library **'libcudnn.so.8'**; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory

最新的 tensorflow 版本 tensorflow-2.4.0 (See full table) 仅适用于:cuDNN 8.0 和 CUDA 11.0 版本。 (虽然这些新版本已经发布——您可能需要检查您的版本,我认为您可能使用的是 CUDA 10)。

我建议看看这个post(较旧,但命令和原则仍然适用)。


对于 conda,以及为 TensorFlow 创建一个新环境:

  1. 制作 yaml 文件 (example yaml file for tensorflow)
  2. 使用上面的 yaml 文件为 Tensorflow 创建一个new environment

conda env create -f environment.yml

  1. 激活您的新环境

conda activate tensorflow_env_388

铌。全新的环境将避免任何冲突的包。


排除故障并检查当前安装的内容

conda list cudnn

# packages in environment at /rds/general/user/home/anaconda3/envs/tensorflow_env_388:
#
# Name                    Version                   Build  Channel
cudnn                     7.0.5.39             ha5ca753_1    conda-forge

conda list cudatoolkit

然后根据需要安装 cudnn/cuda

conda install cudatoolkit=11.0

conda install cudnn=8.0

【讨论】:

  • 您好,感谢您的回复。我正在运行 conda install cudatoolkit=11.0,但我不断收到“求解环境:初始冻结求解失败。使用灵活求解重试。求解环境:来自 current_repodata.json 的 repodata 失败,将使用下一个 repodata 源重试。收集包元数据(repodata.json):完成解决环境:初始冻结解决失败。使用灵活解决重试。解决环境:-发现冲突!寻找不兼容的包。此时升级基本停止工作。
  • 可能和其他包有冲突,排除一切可能会有点麻烦。与我的帖子一样,我建议您为 tensorflow 创建一个全新的 conda 环境。一种资产方式是by using a yml file。将在我上面的帖子中添加完整的说明。
【解决方案2】:

您是从哪个渠道安装的?如果使用默认通道,则必须指定 tensorflow 的 GPU 版本。

conda install tensorflow=2.4.*=gpu* -c anaconda 

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-04-12
    • 2020-08-11
    • 1970-01-01
    • 2018-12-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多