Ubuntu 18.04 安裝 NVIDIA Driver 418 & CUDA 10 & Miniconda & TensorFlow 1.13

Maniac.tw
12 min readMar 11, 2019

--

作業系統:Kubuntu 18.04 (Ubuntu 的 KDE 版本)
顯示卡:NVIDIA GeForce GTX 1080 Ti
預計目標是可以成功執行 TensorFlow 1.13 的 GPU 版本。

參考: https://www.tensorflow.org/install/gpu#software_requirements

The following NVIDIA® software must be installed on your system:
NVIDIA® GPU drivers — CUDA 10.0 requires 410.x or higher.
CUDA® Toolkit — TensorFlow supports CUDA 10.0 (TensorFlow >= 1.13.0)
CUPTI ships with the CUDA Toolkit.
cuDNN SDK (>= 7.4.1)
(Optional) TensorRT 5.0 to improve latency and throughput for inference on some models.

重點:

  1. 安裝 NVIDIA driver 版本 410 以上
  2. 安裝 CUDA 10.0
  3. 安裝 CUPTI
  4. 安裝 cuDNN 7.4.1 以上
  5. (可選) 安裝 TensorRT 5.0

版本號要特別注意,經過測試以後發現如果使用最新的 CUDA 10.1 是沒辦法跑的...

另外,TensorFlow 官網提供了幾乎最方便通用的安裝方式,參考網址:https://www.tensorflow.org/install/gpu
所以基本上照著官網指示安裝就可以用了。我以下記錄的指令依照個人喜好稍作修改。如果不太相信我,可以關掉這一頁,直接照著官網教學來安裝也沒問題 XD。

安裝過程:

0. 將 NVIDIA repository 將入你的 apt

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.debsudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.debsudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pubsudo apt updatewget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.debsudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.debsudo apt update

1. 安裝 NVIDIA Driver

首先要查看一下有沒有內建可裝的 driver

ubuntu-drivers devices

輸出結果

== /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0 ==
modalias : pci:v000010DEd00001B06sv00003842sd00006696bc03sc00i00
vendor : NVIDIA Corporation
model : GP102 [GeForce GTX 1080 Ti]
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-418 - third-party free recommended
driver : xserver-xorg-video-nouveau - distro free builtin

由此可知前面成功加入 NVIDIA repository。如果失敗的話會看到只有 `nvidia-driver-390` 這一項,這樣是沒辦法安裝 CUDA 10 的。

開始安裝

sudo ubuntu-drivers autoinstall

輸出結果

(前略)
The following packages have unmet dependencies:
nvidia-driver-418 : Depends: xserver-xorg-video-nvidia-418 (= 418.39-0ubuntu1) but it is not going to be installed
Recommends: libnvidia-compute-418:i386 (= 418.39-0ubuntu1) but it is not installable
Recommends: libnvidia-decode-418:i386 (= 418.39-0ubuntu1) but it is not installable
Recommends: libnvidia-encode-418:i386 (= 418.39-0ubuntu1) but it is not installable
Recommends: libnvidia-ifr1-418:i386 (= 418.39-0ubuntu1) but it is not installable
Recommends: libnvidia-fbc1-418:i386 (= 418.39-0ubuntu1) but it is not installable
Recommends: libnvidia-gl-418:i386 (= 418.39-0ubuntu1) but it is not installable
E: Unable to correct problems, you have held broken packages.

表示缺少 xserver-xorg-video-nvidia-418,於是安裝它

sudo apt install xserver-xorg-video-nvidia-418

輸出結果

The following packages have unmet dependencies:
xserver-xorg-video-nvidia-418 : Depends: xserver-xorg-core (>= 2:1.19.6-1ubuntu2~)
E: Unable to correct problems, you have held broken packages.

表示缺少 xserver-xorg-core,於是安裝它

sudo apt install xserver-xorg-core

安裝完成以後再輸入一次

sudo ubuntu-drivers autoinstall

安裝完成以後重新開機

sudo reboot

重新開機以後測試一下

nvidia-smi

輸出結果

Mon Mar 11 21:20:52 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:01:00.0 Off | N/A |
| 0% 38C P8 17W / 280W | 1MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... On | 00000000:02:00.0 Off | N/A |
| 0% 33C P8 11W / 280W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

可以正常顯示,看來是沒問題了。

2. 安裝 CUDA 10 & cuDNN 7.4

這一步幾乎直接抄官網的。

sudo apt install --no-install-recommends cuda-10-0 libcudnn7=7.4.1.5-1+cuda10.0

這樣就沒問題了。其中,安裝 cuDNN 會比較容易一點,也可以拆成兩步:

2.1. 安裝 CUDA 10

sudo apt install --no-install-recommends cuda-10-0

2.2. 安裝 cuDNN

從官網 https://developer.nvidia.com/rdp/cudnn-download 下載,
挑選 Runtime Library "cuDNN Runtime Library for Ubuntu18.04 (Deb)"
只要版本大於 7.4.1 都可以,因為 cuDNN 有向後相容,可以盡量安裝最新的沒煩惱。

sudo dpkg -i <your_cudnn.deb>

如果發生缺少 dependencies 的問題,就再下一個指令

sudo apt install -f

然後再安裝一次

sudo dpkg -i <your_cudnn.deb>

通常就成功了。

3. 安裝 TensorRT

sudo apt update && sudo apt-get install nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 && sudo apt update && sudo apt install -y --no-install-recommends libnvinfer-dev=5.0.2-1+cuda10.0

毫無反應,就只是安裝成功。...

4. 安裝 TensorFlow 1.13

我打算用 Miniconda 安裝,所以先安裝 Miniconda

4.1 安裝 Miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.shsudo bash Miniconda3-latest-Linux-x86_64.sh

出現 license 選 yes

Miniconda3 will now be installed into this location:
/home/<username>/miniconda3
- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below
[/home/<username>/miniconda3] >>> /opt/miniconda3

這個是在問要不要安裝到 /home/<username> 底下,我打算安裝給所有使用者,所以改成 /opt/miniconda3

最後出現這個

Do you wish the installer to initialize Miniconda3
in your /home/<username>/.bashrc ? [yes|no]
[no] >>> no

讓所有使用者都能使用 Miniconda:

sudo ln -s /opt/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh

這一步很簡單,conda 的環境設置全部都寫在 miniconda3/etc/profile.d/conda.sh 裡面,所有我們要做的事情就只是建立一個 symbolic link 過去 /etc/profile.d即可,這樣使用者在登入以後會載入所有在 /etc/profile.d 裡面的 shell script file。

登出之後重新登入,然後輸入

conda

輸出

usage: conda [-h] [-V] command ...... (略)

看來是沒問題了。

4.2 建立虛擬環境

conda create -n tf python=3

4.3 啟動虛擬環境

conda activate tf

4.4 接下來按照官網提示安裝 Tensorflow:

pip install --upgrade tensorflow-gpu

我做到這邊一路沒發生什麼問題。接下來就測試一下有沒有問題:

python...>>> import tensorflow as tf
>>> tf.test.gpu_device_name()

這邊如果沒有出現錯誤,那就成功了。

如果有任何問題,歡迎在底下留言討論。

--

--

Responses (1)