以Ubuntu16.04系统为例(x86架构,64bit),安装Docker社区版和NVIDIA-Docker并下载启动TensorFlow镜像,跑起hello_tensorflow的ipynb。
准备工作
首先需要安装一些基本的组件,否则后面安装cuda会失败(比方会因为缺失gcc/g++/cc等编译工具安装cuda失败等等),所以先执行下面的命令:
|
|
sudo
apt-get
update
sudo apt-get
install -y
build-essential
git
|
以上安装完成后,再次确认显卡是否支持cuda,打开NVIDIA官方查看显卡是否支持CUDA的连接:https://developer.nvidia.com/cuda-gpus
里面列举了不同系列下支持CUDA的显卡列表,因为需要翻墙,这里只列出GeForce系列的部分显卡(左边是卡的名字,右边是CUDA计算力,数值越大越好):
虽然只是部分列表,但如果GPU不在以上GTX系列卡的名单里,还是建议装CPU版本(跳过安装CUDA的部分),只看安装Docker(不是NVIDIA-Docker)和安装TensorFlow的CPU镜像。
安装CUDA9.0
从官网下载cuda:CUDA Toolkit Download | NVIDIA Developer
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal
点击进入链接看到如下图,根据咱们的系统版本(Linux的Ubuntu发行版的16.04,x86架构),选择对应的选项,Installer Type建议选择第一个。

下载完成后,根据官网给出的命令提示,对下载好的文件执行如下命令安装cuda:
|
|
sudo
sh cuda_9.0.176_384.81_linux.run
|
以下问题在安装的时候会出现(我没有输入的,就是直接回车,选默认值):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
Do
you accept the
previously read
EULA?
accept/decline/quit:
accept
Install
NVIDIA Accelerated
Graphics Driver for
Linux-x86_64
384.81?
(y)es/(n)o/(q)uit:
y
Do you
want to
install the OpenGL
libraries?
(y)es/(n)o/(q)uit
[ default
is yes
]:
y
Do
you want to
run nvidia-xconfig?
This will
update the system
X configuration
file so
that the NVIDIA
X driver
is
used.
The pre-existing
X configuration
file will
be backed up.
This option
should not
be used on
systems that require
a custom
X
configuration,
such as
systems with multiple
GPU vendors.
(y)es/(n)o/(q)uit
[ default
is no
]:
Install the CUDA
9.0 Toolkit?
(y)es/(n)o/(q)uit:
yes
Enter
Toolkit Location
[
default is
/usr/local/cuda-9.0
]:
Do you
want to
install a
symbolic link
at /usr/local/cuda?
(y)es/(n)o/(q)uit:
y
Install
the CUDA
9.0 Samples?
(y)es/(n)o/(q)uit:
y
Enter CUDA Samples
Location
[
default is
/home/yuens
]:
|
安装成功则会显示以下信息:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
Installing
the NVIDIA display
driver...
Installing the CUDA
Toolkit in
/usr/local/cuda-9.0
...
Missing
recommended library:
libGLU.so
Missing recommended library:
libXmu.so
Installing the CUDA
Samples in
/home/yuanshuai
...
Copying
samples to
/home/yuanshuai/NVIDIA_CUDA-9.0_Samples
now...
Finished copying samples.
===========
=
Summary =
===========
Driver:
Installed
Toolkit: Installed
in /usr/local/cuda-9.0
Samples: Installed
in /home/yuanshuai,
but missing recommended
libraries
Please make
sure that
-
PATH includes
/usr/local/cuda-9.0/bin
-
LD_LIBRARY_PATH includes
/usr/local/cuda-9.0/lib64,
or,
add /usr/local/cuda-9.0/lib64
to /etc/ld.so.conf
and run
ldconfig as
root
To uninstall
the CUDA Toolkit,
run the uninstall
script in
/usr/local/cuda-9.0/bin
To
uninstall the
NVIDIA Driver,
run nvidia-uninstall
Please
see CUDA_Installation_Guide_Linux.pdf
in /usr/local/cuda-9.0/doc/pdf
for detailed
information on
setting up CUDA.
Logfile
is /tmp/cuda_install_5544.log
|
安装失败也会给出Logfile的地址,只需要看看log的内容是啥,并解决就OK(像我上面一开始提到的如果没有安装gcc/cc/icc/g++等编译器,就会导致安装cuda失败,这都会在log里面写的很清楚并给出解决方案)。
检查是否安装成功,在命令行输入:nvidia-smi,如果得到以下类似的内容,说明安装成功:

安装Docker
打开官网关于docker installation for Linux的文档页(我建议还是打开下面的链接,看看):
- Get Docker CE for Ubuntu | Docker Documentation
https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/#uninstall-old-versions
我们用的系统是Xenial 16.04 (LTS),所以查看Ubuntu这个版本的部分,首先若先前有安装docker需要先卸载(若没有安装过则无需执行),执行命令:
|
|
sudo
apt-get
remove docker
docker-engine
docker.io
|
Docker的安装有多个方式,这里以最常见的方式(文档第一种)为例:
首先依次执行以下命令(反斜杠\代表一行,只是换行写更清晰),把docker仓库加进到apt里:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
sudo
apt-get
update
sudo
apt-get
install \
apt-transport-https
\
ca-certificates
\
curl \
software-properties-common
curl
-fsSL https://download.docker.com/linux/ubuntu/gpg
| sudo
apt-key
add -
sudo
apt-key
fingerprint 0EBFCD88
sudo
add-apt-repository
\
"deb [arch=amd64]
https://download.docker.com/linux/ubuntu
\
$(lsb_release
-cs)
\
stable"
|
正式安装docker:
|
|
sudo
apt-get
update
sudo
apt-get
install docker-ce
apt-cache
madison docker-ce
sudo
docker run hello-world
|
最后一个命令是验证docker是否安装成功,它会下载并执行hello-world镜像。如果安装正确,执行后的结果应该类似下面这样:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
Unable
to find
image 'hello-world:latest'
locally
latest:
Pulling from library/hello-world
5b0f327be733:
Pull complete
Digest:
sha256:07d5f7800dfe37b8c2196c7b1c524c33808ce2e0f74e7aa00e603295ca9a0972
Status:
Downloaded newer
image for
hello-world:latest
Hello
from Docker!
This message
shows that your
installation appears
to be
working correctly.
To generate
this message,
Docker took the
following steps:
1.
The Docker client
contacted the
Docker daemon.
2.
The Docker daemon
pulled the
"hello-world"
image from the
Docker Hub.
3.
The Docker daemon
created a
new container
from that image
which runs the
executable that
produces the output
you are currently
reading.
4.
The Docker daemon
streamed that
output to
the Docker client,
which sent it
to
your terminal.
To try
something more
ambitious,
you can run
an Ubuntu container
with:
$
docker run
-it ubuntu
bash
Share
images,
automate workflows,
and more
with a
free Docker ID:
https://cloud.docker.com/
For more
examples and
ideas,
visit:
https://docs.docker.com/engine/userguide/
|
安装NVIDIA-Docker
安装完成docker并检查安装正确(能跑出来hello-world)后,如果需要docker容器中有gpu支持,需要再安装NVIDIA-Docker,同样找到并打开该项目的主页:
- NVIDIA/nvidia-docker: Build and run Docker containers leveraging NVIDIA GPUs
https://github.com/NVIDIA/nvidia-docker
可以看到在Quick start小节,选择我们Ubuntu的发行版,依次执行命令:
|
|
# Install nvidia-docker and nvidia-docker-plugin
wget
-P
/tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo
dpkg -i
/tmp/nvidia-docker*.deb
&& rm
/tmp/nvidia-docker*.deb
# Test nvidia-smi 验证是否安装成功
nvidia-docker
run --rm
nvidia/cuda
nvidia-smi
|
上面最后一条命令是检查是否安装成功,安装成功,则会显示关于GPU的信息,类似前面的一个截图:

然后在执行下面这句,默认用nvdia-docker替代docker命令:
|
|
echo
'alias docker=nvidia-docker'
>> ~/.bashrc
bash
|
下载使用TensorFlow镜像
打开dockerhub关于tensorflow的页面:
- tensorflow/tensorflow – Docker Hub
https://hub.docker.com/r/tensorflow/tensorflow/
根据需要的版本下载tensorflow镜像并开启tensorflow容器:
CPU版本
|
|
docker
run -it
-p
8888:8888
tensorflow/tensorflow
|
GPU版本
|
|
nvidia-docker
run -it
-p
8888:8888
tensorflow/tensorflow:latest-gpu
|
如何使用
执行以上命令的结果类似如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
$
nvidia-docker
run -it
-p
8888:8888
tensorflow/tensorflow:latest-gpu
[I
02:51:21.230
NotebookApp]
Writing notebook
server cookie secret
to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[W
02:51:21.242
NotebookApp]
WARNING:
The notebook server
is listening
on all IP
addresses and
not using
encryption.
This is
not recommended.
[I
02:51:21.249
NotebookApp]
Serving notebooks
from local directory:
/notebooks
[I
02:51:21.249
NotebookApp]
0 active
kernels
[I
02:51:21.249
NotebookApp]
The Jupyter Notebook
is running
at:
http://[all
ip addresses on
your system]:8888/?token=8f90cc7b9ad6ccc4f36f53f347c7a314220bbcb82dd416ea
[I
02:51:21.249
NotebookApp]
Use Control-C
to stop
this server
and shut
down all kernels
(twice to
skip confirmation).
[C
02:51:21.249
NotebookApp]
Copy/paste
this URL
into your browser
when you connect
for the
first time,
to
login with
a token:
http://localhost:8888/?token=8f90cc7b9ad6ccc4f36f53f347c7a314220bbcb82dd416ea
[I
02:51:31.832
NotebookApp]
302 GET
/ (172.17.0.1)
0.74ms
[I
02:51:31.943
NotebookApp]
302 GET
/tree?
(172.17.0.1)
1.44ms
|
其中看到有个网址:
|
|
http://localhost:8888/?token=8f90cc7b9ad6ccc4f36f53f347c7a314220bbcb82dd416ea
|
每个人的网址在token=后面的内容是不一样的,现在我们打开浏览器,输入网址:
这时会出现如下画面:

输入刚刚token后面的值后,点击login会看到一下画面:

点击第一个1_hello_tensorflow.ipynb,然后可以选择执行所有代码(见下图):

下载使用Kaggle镜像
其实kaggle官方提供了好些个镜像,这里以python的为例,即kaggle/python:latest,这个镜像地址在:https://hub.docker.com/r/kaggle/python/
下载该镜像使用命令:
|
|
docker
pull kaggle/python:latest
|
这个镜像包含了tensorflow和xgboost。下面请看:
|
|
In
[1]:
import xgboost
as xgb
In
[2]:
xgb.__version__
Out[2]:
'0.6'
In [3]:
import tensorflow
as tf
In [4]:
tf.__version__
Out[4]:
'1.3.0'
|
This entry was posted in