Ubuntu 18.04 安裝 NVIDIA Driver 418 & CUDA 10 & Miniconda & TensorFlow 1.13
作業系統:Kubuntu 18.04 (Ubuntu 的 KDE 版本)
顯示卡:NVIDIA GeForce GTX 1080 Ti
預計目標是可以成功執行 TensorFlow 1.13 的 GPU 版本。
參考: https://www.tensorflow.org/install/gpu#software_requirements
The following NVIDIA® software must be installed on your system:
NVIDIA® GPU drivers — CUDA 10.0 requires 410.x or higher.
CUDA® Toolkit — TensorFlow supports CUDA 10.0 (TensorFlow >= 1.13.0)
CUPTI ships with the CUDA Toolkit.
cuDNN SDK (>= 7.4.1)
(Optional) TensorRT 5.0 to improve latency and throughput for inference on some models.
重點:
- 安裝 NVIDIA driver 版本 410 以上
- 安裝 CUDA 10.0
- 安裝 CUPTI
- 安裝 cuDNN 7.4.1 以上
- (可選) 安裝 TensorRT 5.0
版本號要特別注意,經過測試以後發現如果使用最新的 CUDA 10.1 是沒辦法跑的...
另外,TensorFlow 官網提供了幾乎最方便通用的安裝方式,參考網址:https://www.tensorflow.org/install/gpu
所以基本上照著官網指示安裝就可以用了。我以下記錄的指令依照個人喜好稍作修改。如果不太相信我,可以關掉這一頁,直接照著官網教學來安裝也沒問題 XD。
安裝過程:
0. 將 NVIDIA repository 將入你的 apt
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.debsudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.debsudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pubsudo apt updatewget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.debsudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.debsudo apt update
1. 安裝 NVIDIA Driver
首先要查看一下有沒有內建可裝的 driver
ubuntu-drivers devices
輸出結果
== /sys/devices/pci0000:00/0000:00:01.1/0000:02:00.0 ==
modalias : pci:v000010DEd00001B06sv00003842sd00006696bc03sc00i00
vendor : NVIDIA Corporation
model : GP102 [GeForce GTX 1080 Ti]
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-418 - third-party free recommended
driver : xserver-xorg-video-nouveau - distro free builtin
由此可知前面成功加入 NVIDIA repository。如果失敗的話會看到只有 `nvidia-driver-390` 這一項,這樣是沒辦法安裝 CUDA 10 的。
開始安裝
sudo ubuntu-drivers autoinstall
輸出結果
(前略)
The following packages have unmet dependencies:
nvidia-driver-418 : Depends: xserver-xorg-video-nvidia-418 (= 418.39-0ubuntu1) but it is not going to be installed
Recommends: libnvidia-compute-418:i386 (= 418.39-0ubuntu1) but it is not installable
Recommends: libnvidia-decode-418:i386 (= 418.39-0ubuntu1) but it is not installable
Recommends: libnvidia-encode-418:i386 (= 418.39-0ubuntu1) but it is not installable
Recommends: libnvidia-ifr1-418:i386 (= 418.39-0ubuntu1) but it is not installable
Recommends: libnvidia-fbc1-418:i386 (= 418.39-0ubuntu1) but it is not installable
Recommends: libnvidia-gl-418:i386 (= 418.39-0ubuntu1) but it is not installable
E: Unable to correct problems, you have held broken packages.
表示缺少 xserver-xorg-video-nvidia-418,於是安裝它
sudo apt install xserver-xorg-video-nvidia-418
輸出結果
The following packages have unmet dependencies:
xserver-xorg-video-nvidia-418 : Depends: xserver-xorg-core (>= 2:1.19.6-1ubuntu2~)
E: Unable to correct problems, you have held broken packages.
表示缺少 xserver-xorg-core,於是安裝它
sudo apt install xserver-xorg-core
安裝完成以後再輸入一次
sudo ubuntu-drivers autoinstall
安裝完成以後重新開機
sudo reboot
重新開機以後測試一下
nvidia-smi
輸出結果
Mon Mar 11 21:20:52 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:01:00.0 Off | N/A |
| 0% 38C P8 17W / 280W | 1MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... On | 00000000:02:00.0 Off | N/A |
| 0% 33C P8 11W / 280W | 1MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
可以正常顯示,看來是沒問題了。
2. 安裝 CUDA 10 & cuDNN 7.4
這一步幾乎直接抄官網的。
sudo apt install --no-install-recommends cuda-10-0 libcudnn7=7.4.1.5-1+cuda10.0
這樣就沒問題了。其中,安裝 cuDNN 會比較容易一點,也可以拆成兩步:
2.1. 安裝 CUDA 10
sudo apt install --no-install-recommends cuda-10-0
2.2. 安裝 cuDNN
從官網 https://developer.nvidia.com/rdp/cudnn-download 下載,
挑選 Runtime Library "cuDNN Runtime Library for Ubuntu18.04 (Deb)"
只要版本大於 7.4.1 都可以,因為 cuDNN 有向後相容,可以盡量安裝最新的沒煩惱。
sudo dpkg -i <your_cudnn.deb>
如果發生缺少 dependencies 的問題,就再下一個指令
sudo apt install -f
然後再安裝一次
sudo dpkg -i <your_cudnn.deb>
通常就成功了。
3. 安裝 TensorRT
sudo apt update && sudo apt-get install nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 && sudo apt update && sudo apt install -y --no-install-recommends libnvinfer-dev=5.0.2-1+cuda10.0
毫無反應,就只是安裝成功。...
4. 安裝 TensorFlow 1.13
我打算用 Miniconda 安裝,所以先安裝 Miniconda
4.1 安裝 Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.shsudo bash Miniconda3-latest-Linux-x86_64.sh
出現 license 選 yes
Miniconda3 will now be installed into this location:
/home/<username>/miniconda3- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below[/home/<username>/miniconda3] >>> /opt/miniconda3
這個是在問要不要安裝到 /home/<username>
底下,我打算安裝給所有使用者,所以改成 /opt/miniconda3
。
最後出現這個
Do you wish the installer to initialize Miniconda3
in your /home/<username>/.bashrc ? [yes|no]
[no] >>> no
讓所有使用者都能使用 Miniconda:
sudo ln -s /opt/miniconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh
這一步很簡單,conda 的環境設置全部都寫在 miniconda3/etc/profile.d/conda.sh
裡面,所有我們要做的事情就只是建立一個 symbolic link 過去 /etc/profile.d
即可,這樣使用者在登入以後會載入所有在 /etc/profile.d
裡面的 shell script file。
登出之後重新登入,然後輸入
conda
輸出
usage: conda [-h] [-V] command ...... (略)
看來是沒問題了。
4.2 建立虛擬環境
conda create -n tf python=3
4.3 啟動虛擬環境
conda activate tf
4.4 接下來按照官網提示安裝 Tensorflow:
pip install --upgrade tensorflow-gpu
我做到這邊一路沒發生什麼問題。接下來就測試一下有沒有問題:
python...>>> import tensorflow as tf
>>> tf.test.gpu_device_name()
這邊如果沒有出現錯誤,那就成功了。
如果有任何問題,歡迎在底下留言討論。