【问题标题】:TensorFlow libdevice not found. Why is it not found in the searched path?未找到 TensorFlow libdevice。为什么在搜索的路径中找不到?
【发布时间】:2021-08-01 21:37:27
【问题描述】:

Win 10 64位21H1; TF2.5、CUDA 11 安装在环境中(Python 3.9.5 Xeus)

我不是唯一看到此错误的人;另请参阅(未答复)herehere。 问题不明确,提议的解决方案不清楚/似乎不起作用(例如,参见here

问题使用 TF Linear_Mixed_Effects_Models.ipynb 示例(从 TensorFlow github here 下载)执行到达执行“预热阶段”的点然后抛出错误:

InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_one_e_step_2806]

控制台包含此输出,表明它找到了 GPU,但 XLA 初始化未能找到 - 存在! - 指定路径中的libdevice

2021-08-01 22:04:36.691300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9623 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-08-01 22:04:37.080007: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2021-08-01 22:04:54.122528: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1d724940130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-01 22:04:54.127766: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2021-08-01 22:04:54.215072: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:241] Warning: Using tf.random.uniform with XLA compilation will ignore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired.
2021-08-01 22:04:55.506464: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-01 22:04:55.512876: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-01 22:04:55.517387: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-01 22:04:55.520773: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-01 22:04:55.524125: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2021-08-01 22:04:55.526349: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.

现在有趣的是搜索到的路径包括“C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin”

该文件夹的内容包括所有(在 TF 启动时成功加载)DLL,包括 cudart64_110.dll、dudnn64_8.dll...当然还有 libdevice.10.bc

问题既然 TF 说它正在这个位置搜索这个文件并且该文件存在那里,那么有什么问题,我该如何解决?

(NB C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 不存在... CUDA 已安装在环境中;此路径必须是操作系统安装的最佳猜测)

信息:我正在设置路径

aPath = '--xla_gpu_cuda_data_dir=C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin'
print(aPath)
os.environ['XLA_FLAGS'] = aPath

但我还将操作系统环境变量 XLA_FLAGS 设置为相同的字符串值...我不知道哪个实际上正在工作,但控制台输出显示它搜索预期路径的事实已经足够好了

【问题讨论】:

    标签: python tensorflow configuration


    【解决方案1】:

    诊断信息不清楚,因此没有帮助;但是有一个解决方案

    通过在此路径提供文件(作为副本)解决了问题

    C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin\nvvm\libdevice\

    注意C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin 是给 XLA_FLAGS 的路径,但它似乎不是在寻找 libdevice file 它正在寻找 \nvvm\libdevice\ path em> 这意味着我不能只在 XLA_FLAGS 中设置一个不同的值来指向 libdevice 文件的实际位置,因为用一句话来说,它不是(只是)它正在寻找的 file .

    之前的调试信息:

    2021-08-05 08:38:52.889213: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
    2021-08-05 08:38:52.896033: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
    2021-08-05 08:38:52.899128: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
    2021-08-05 08:38:52.902510: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
    2021-08-05 08:38:52.905815: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
    

    不正确,因为搜索路径中没有“CUDA”;和 FWIW,我认为在 C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 中搜索应该给出不同的错误,因为没有这样的文件夹(那里有一个旧的 V10.0 文件夹,但没有安装 CUDA 11 的操作系统)

    直到/除非 TensorFlow 改进了路径处理,在每个新的 (Anaconda) python 环境中都需要这样的文件结构操作。

    TensorFlow论坛here的完整帖子

    【讨论】:

      【解决方案2】:

      对于 linux 用户,使用 tensorflow==2.8 添加以下环境变量。

      XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda-11.4
      

      【讨论】:

        【解决方案3】:

        以下内容对我有用。带有错误消息:

        error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
        

        首先我搜索nvvm目录,然后验证libdevice目录存在:

        $ find / -type d -name nvvm 2>/dev/null
        /usr/lib/cuda/nvvm
        $ cd /usr/lib/cuda/nvvm
        /usr/lib/cuda/nvvm$ ls
        libdevice
        /usr/lib/cuda/nvvm$ cd libdevice
        /usr/lib/cuda/nvvm/libdevice$ ls
        libdevice.10.bc
        

        然后我导出环境变量:

        export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/lib/cuda
        

        如上面@Insectatorious 所示。这解决了错误,我能够运行代码。

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2014-07-27
          • 1970-01-01
          • 2015-03-15
          • 2022-06-15
          • 2016-11-07
          • 2021-04-15
          • 2013-02-25
          • 2018-04-28
          相关资源
          最近更新 更多