Ubuntu 18.04 安裝 NVIDIA Driver 418 & CUDA 10 & Miniconda & TensorFlow 1.13

12 min readMar 11, 2019

作業系統：Kubuntu 18.04 (Ubuntu 的 KDE 版本)
顯示卡：NVIDIA GeForce GTX 1080 Ti
預計目標是可以成功執行 TensorFlow 1.13 的 GPU 版本。

參考： https://www.tensorflow.org/install/gpu#software_requirements

The following NVIDIA® software must be installed on your system:
NVIDIA® GPU drivers — CUDA 10.0 requires 410.x or higher.
CUDA® Toolkit — TensorFlow supports CUDA 10.0 (TensorFlow >= 1.13.0)
CUPTI ships with the CUDA Toolkit.
cuDNN SDK (>= 7.4.1)
(Optional) TensorRT 5.0 to improve latency and throughput for inference on some models.

重點：

安裝 NVIDIA driver 版本 410 以上
安裝 CUDA 10.0
安裝 CUPTI
安裝 cuDNN 7.4.1 以上
(可選) 安裝 TensorRT 5.0

版本號要特別注意，經過測試以後發現如果使用最新的 CUDA 10.1 是沒辦法跑的...

另外，TensorFlow 官網提供了幾乎最方便通用的安裝方式，參考網址：https://www.tensorflow.org/install/gpu
所以基本上照著官網指示安裝就可以用了。我以下記錄的指令依照個人喜好稍作修改。如果不太相信我，可以關掉這一頁，直接照著官網教學來安裝也沒問題 XD。

安裝過程：

0. 將 NVIDIA repository 將入你的 apt

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.debsudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.debsudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pubsudo apt updatewget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.debsudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.debsudo apt update

1. 安裝 NVIDIA Driver

首先要查看一下有沒有內建可裝的 driver

ubuntu-drivers devices

輸出結果

== /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0 ==
modalias : pci:v000010DEd00001B06sv00003842sd00006696bc03sc00i00
vendor   : NVIDIA Corporation
model    : GP102 [GeForce GTX 1080 Ti]
driver   : nvidia-driver-390 - distro non-free
driver   : nvidia-driver-410 - third-party free
driver   : nvidia-driver-418 - third-party free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin

由此可知前面成功加入 NVIDIA repository。如果失敗的話會看到只有 `nvidia-driver-390` 這一項，這樣是沒辦法安裝 CUDA 10 的。

開始安裝

sudo ubuntu-drivers autoinstall

輸出結果

(前略)
The following packages have unmet dependencies:
 nvidia-driver-418 : Depends: xserver-xorg-video-nvidia-418 (= 418.39-0ubuntu1) but it is not going to be installed
                     Recommends: libnvidia-compute-418:i386 (= 418.39-0ubuntu1) but it is not installable
                     Recommends: libnvidia-decode-418:i386 (= 418.39-0ubuntu1) but it is not installable
                     Recommends: libnvidia-encode-418:i386 (= 418.39-0ubuntu1) but it is not installable
                     Recommends: libnvidia-ifr1-418:i386 (= 418.39-0ubuntu1) but it is not installable
                     Recommends: libnvidia-fbc1-418:i386 (= 418.39-0ubuntu1) but it is not installable
                     Recommends: libnvidia-gl-418:i386 (= 418.39-0ubuntu1) but it is not installable
E: Unable to correct problems, you have held broken packages.

表示缺少 xserver-xorg-video-nvidia-418，於是安裝它

sudo apt install xserver-xorg-video-nvidia-418

輸出結果

The following packages have unmet dependencies:
 xserver-xorg-video-nvidia-418 : Depends: xserver-xorg-core (>= 2:1.19.6-1ubuntu2~)
E: Unable to correct problems, you have held broken packages.

表示缺少 xserver-xorg-core，於是安裝它

sudo apt install xserver-xorg-core

安裝完成以後再輸入一次

sudo ubuntu-drivers autoinstall

安裝完成以後重新開機

sudo reboot

重新開機以後測試一下

nvidia-smi

輸出結果

Mon Mar 11 21:20:52 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  On   | 00000000:01:00.0 Off |                  N/A |
|  0%   38C    P8    17W / 280W |      1MiB / 11175MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  On   | 00000000:02:00.0 Off |                  N/A |
|  0%   33C    P8    11W / 280W |      1MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

可以正常顯示，看來是沒問題了。

2. 安裝 CUDA 10 & cuDNN 7.4

這一步幾乎直接抄官網的。

sudo apt install --no-install-recommends cuda-10-0 libcudnn7=7.4.1.5-1+cuda10.0

這樣就沒問題了。其中，安裝 cuDNN 會比較容易一點，也可以拆成兩步：

2.1. 安裝 CUDA 10

sudo apt install --no-install-recommends cuda-10-0

2.2. 安裝 cuDNN

從官網 https://developer.nvidia.com/rdp/cudnn-download 下載，
挑選 Runtime Library "cuDNN Runtime Library for Ubuntu18.04 (Deb)"
只要版本大於 7.4.1 都可以，因為 cuDNN 有向後相容，可以盡量安裝最新的沒煩惱。

sudo dpkg -i <your_cudnn.deb>

如果發生缺少 dependencies 的問題，就再下一個指令

sudo apt install -f

然後再安裝一次

sudo dpkg -i <your_cudnn.deb>

通常就成功了。

3. 安裝 TensorRT

sudo apt update && sudo apt-get install nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 && sudo apt update && sudo apt install -y --no-install-recommends libnvinfer-dev=5.0.2-1+cuda10.0

毫無反應，就只是安裝成功。...

4. 安裝 TensorFlow 1.13

我打算用 Miniconda 安裝，所以先安裝 Miniconda

4.1 安裝 Miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.shsudo bash Miniconda3-latest-Linux-x86_64.sh

出現 license 選 yes

Miniconda3 will now be installed into this location:
/home/<username>/miniconda3- Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below[/home/<username>/miniconda3] >>> /opt/miniconda3

這個是在問要不要安裝到 /home/<username> 底下，我打算安裝給所有使用者，所以改成 /opt/miniconda3 。

最後出現這個

Do you wish the installer to initialize Miniconda3
in your /home/<username>/.bashrc ? [yes|no]
[no] >>> no

讓所有使用者都能使用 Miniconda：

sudo ln -s /opt/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh

這一步很簡單，conda 的環境設置全部都寫在 miniconda3/etc/profile.d/conda.sh 裡面，所有我們要做的事情就只是建立一個 symbolic link 過去 /etc/profile.d即可，這樣使用者在登入以後會載入所有在 /etc/profile.d 裡面的 shell script file。

登出之後重新登入，然後輸入

conda

輸出

usage: conda [-h] [-V] command ...... (略)

看來是沒問題了。

4.2 建立虛擬環境

conda create -n tf python=3

4.3 啟動虛擬環境

conda activate tf

4.4 接下來按照官網提示安裝 Tensorflow：

pip install --upgrade tensorflow-gpu

我做到這邊一路沒發生什麼問題。接下來就測試一下有沒有問題：

python...>>> import tensorflow as tf
>>> tf.test.gpu_device_name()

這邊如果沒有出現錯誤，那就成功了。

如果有任何問題，歡迎在底下留言討論。

Ubuntu 18.04 安裝 NVIDIA Driver 418 & CUDA 10 & Miniconda & TensorFlow 1.13

Written by Maniac.tw

Responses (1)