【发布时间】:2018-07-09 14:31:05
【问题描述】:
我的机器信息:
-
nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176 -
cuda驱动版本
- 版本 - 9.2
- 文件 - nvidia-diag-driver-local-repo-rhel7-396.26-1.0-1.x86_64.rpm
cat /etc/redhat-release:CentOS Linux 版本 7.5.1804(核心)
cat .bashrc 包括以下内容
PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
CUDA_HOME=$CUDA_HOME:/usr/local/cuda
在此之后,如果我尝试导入 torch 或 torchvision,它工作正常。但是我导入了tensorflow,它没有导入
我的tensorflow版本如下:
- 张量板==1.8.0
- tensorflow-gpu==1.8.0
我收到以下错误:
>>> import tensorflow
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/local/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/local/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/local/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/local/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/install_sources#common_installation_problems
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
但是/usr/local/cuda/lib64 有以下内容:
- libcublas_device.a
- libcublas.so
- libcublas.so.9.0
- libcublas.so.9.0.176
- libcublas_static.a
我无法找出问题所在。许可与此有关吗?后来我将上述文件的所有者和权限更改为当前和755。仍然得到同样的错误。
【问题讨论】:
-
LD_LIBRARY_PATH是否指向/usr/local/cuda/lib64??没有? -
@Patwie
/usr/local/cuda是指向/datadrive/abhisek/cuda-9.0/的链接文件。反正我改了。echo $LD_LIBRARY_PATH::/usr/local/cuda/lib64。但同样的错误。 -
而
libcublas.so是libcublas.so.9.0的符号链接,libcublas.so.9.0是libcublas.so.9.0.176的符号链接?您没有复制、粘贴和覆盖这些“*.so”文件?而LD_LIBRARY_PATH=datadrive/abhisek/cuda-9.0/lib64 python用 TensorFlow 重现了同样的错误(换句话说:.bashrc真的被执行了)? -
我没有进行任何覆盖或复制粘贴。在安装 cuda 时,它会询问将安装文件放在哪里以及在哪里创建链接文件。唯一的链接文件是
/usr/local/cuda到datadrive/abhisek/cuda-9.0。.bashrc正在执行,因为我在更改后重新启动会话。换句话说echo $LD_LIBRARY_PATH::/usr/local/cuda/lib64
标签: tensorflow