【问题标题】:Why is tensorflow looking for cuda10.1 while cuda 10.0 installed?为什么安装 cuda 10.0 时 tensorflow 会寻找 cuda10.1?
【发布时间】:2020-12-04 05:55:01
【问题描述】:

我在 Ubuntu 18.04 上。并输出以下命令:

nvidia-smi
Fri Dec  4 11:35:09 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M40           On   | 00000000:00:08.0 Off |                  Off |
| N/A   59C    P0   146W / 250W |  11724MiB / 12215MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

dpkg -l | grep cuda
ii  cuda-command-line-tools-10-0                          10.0.130-1                        amd64        CUDA command-line tools
ii  cuda-compat-10-0                                      410.104-1                         amd64        CUDA Compatibility Platform
ii  cuda-cublas-10-0                                      10.0.130-1                        amd64        CUBLAS native runtime libraries
ii  cuda-cudart-10-0                                      10.0.130-1                        amd64        CUDA Runtime native Libraries
ii  cuda-cudart-dev-10-0                                  10.0.130-1                        amd64        CUDA Runtime native dev links, headers
ii  cuda-cufft-10-0                                       10.0.130-1                        amd64        CUFFT native runtime libraries
ii  cuda-cuobjdump-10-0                                   10.0.130-1                        amd64        CUDA cuobjdump
ii  cuda-cupti-10-0                                       10.0.130-1                        amd64        CUDA profiling tools interface.
ii  cuda-curand-10-0                                      10.0.130-1                        amd64        CURAND native runtime libraries
ii  cuda-cusolver-10-0                                    10.0.130-1                        amd64        CUDA solver native runtime libraries
ii  cuda-cusparse-10-0                                    10.0.130-1                        amd64        CUSPARSE native runtime libraries
ii  cuda-driver-dev-10-0                                  10.0.130-1                        amd64        CUDA Driver native dev stub library
ii  cuda-gdb-10-0                                         10.0.130-1                        amd64        CUDA-GDB
ii  cuda-gpu-library-advisor-10-0                         10.0.130-1                        amd64        CUDA GPU Library Advisor.
ii  cuda-license-10-0                                     10.0.130-1                        amd64        CUDA licenses
ii  cuda-memcheck-10-0                                    10.0.130-1                        amd64        CUDA-MEMCHECK
ii  cuda-misc-headers-10-0                                10.0.130-1                        amd64        CUDA miscellaneous headers
ii  cuda-nvcc-10-0                                        10.0.130-1                        amd64        CUDA nvcc
ii  cuda-nvdisasm-10-0                                    10.0.130-1                        amd64        CUDA disassembler
ii  cuda-nvprof-10-0                                      10.0.130-1                        amd64        CUDA Profiler tools
ii  cuda-nvtx-10-0                                        10.0.130-1                        amd64        NVIDIA Tools Extension
ii  cuda-repo-ubuntu1804                                  10.1.243-1                        amd64        cuda repository configuration files
ii  libcudnn7                                             7.4.1.5-1+cuda10.0                amd64        cuDNN runtime libraries
ii  libnvinfer5                                           5.0.2-1+cuda10.0                  amd64        TensorRT runtime libraries
ii  nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 1-1                               amd64        nvinfer-runtime-trt repository configuration files

所以我安装了 cuda10.0。我也设置了路径:

export CUDA_HOME=/usr/local/cuda
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH

但是为什么会出现这个错误呢?它寻找的是 cuda10.1,而不是 cuda10.0?

python3 -c 'import tensorflow as tf; print(tf.__version__)'
2020-12-04 11:37:43.929779: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/hadoop/lib/native:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-12-04 11:37:43.929830: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2.3.1

【问题讨论】:

  • 您必须使用构建 tensorflow 的 CUDA 版本。这是一个不可协商的要求。在这种情况下是 10.1
  • @talonmies 错误消息是否意味着 tensorflow 是针对 cuda 10.1 构建的?那么,如果我降级 tensorflow 版本,错误信息会消失吗?
  • 正如我在评论中所说,是的,这意味着您的 tensorflow 是针对 10.1 构建的并且需要 10.1。至于降级,我不知道。如果您安装了 CUDA 10.0,则需要使用基于 CUDA 10.0 构建的版本。可能不是降级。那是你自己决定的。
  • 我还建议将 CUDA 升级到 10.1 而不是降级 TF。除了当前的问题之外,它还会使您免于其他不兼容的情况。
  • 或者,如果您无法更新 CUDA,您可以针对您的 CUDA 版本构建所需版本的 TF。但是@PoeDator 的建议仍然是最可取的。

标签: tensorflow nvidia


【解决方案1】:

根据Tensorflow tested build configurationTF 2.1TF 2.3,它需要CUDA 10.1版本,因此您收到上述错误。

如果您想使用CUDA 10.0,则兼容版本为TF_GPU 1.15TF 2.0

正如 Poe Dator 正确建议的那样,您可以升级到 CUDA 10.1 而不是 tensorflow downgrading。因为最新版本解决了许多性能问题。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2022-10-31
    • 1970-01-01
    • 2019-04-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多