Python + TensorFlow + GPU + CUDA + CUDNN setup with Ubuntu
Every time I setup Python + TensorFlow on a new machine with a fresh Ubuntu install, I have to spend some time again and again on this topic, and do some trial and error (yes I'm speaking about such issues). So here is a little HOWTO, once for all.
Important fact: we need to install the specific version number of CUDA and CUDNN relative to a particular version of TensorFlow, otherwise it will fail, with errors like libcudnn.so.7: cannot open shared object file: No such file or directory
.
For example, for TensorFlow 2.3, we have to use CUDA 10.1 and CUDNN 7.6 (see here).
Here is how to install on a Ubuntu 18.04:
pip3 install --upgrade pip # it was mandatory to upgrade for me
pip3 install keras tensorflow==2.3.0
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt install cuda-10-1 nvidia-driver-430
To test if the NVIDIA driver is properly installed, you can run nvidia-smi
(I noticed a reboot was necessary).
Then download "Download cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.1" on https://developer.nvidia.com/rdp/cudnn-archive (you need to create an account there), and then:
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
That's it! Reboot the computer, launch Python 3 and do:
import tensorflow
tensorflow.test.gpu_device_name() # also, tensorflow.test.is_gpu_available() should give True
The last line should display the right GPU device name. If you get an empty string instead, it means your GPU isn't used by TensorFlow!
Notes:
-
Initially the installation of CUDA 10.1 failed with errors like:
The following packages have unmet dependencies: cuda-10-1 : Depends: cuda-toolkit-10-1 (>= 10.1.243) but it is not going to be installed
on a fresh Xubuntu 18.04.5 install. Trying to install
cuda-toolkit-10-1
manually led to other similar errors. Using asources.list
from Xubuntu 18.04 like this one helped. -
I also once had errors like
Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory
. After searching this file on the filesystem, I noticed it was found in/usr/local/cuda-10.2/...
whereas I never installed the 10.2 version, strange! Solution given in this post:sudo apt install --reinstall libcublas10=10.2.1.243-1 libcublas-dev=10.2.1.243-1
. IIRC, these 2 issues weren't present when I used a Xubuntu 18.04, could the fact I used 18.04.5 be the reason? - Nothing really related, but when installing Xubuntu 20.04.2.0, memtest86, which is quite useful to test the integrity of the hardware before launching long computations, did not work.