A few thingz
Every time I setup Python + TensorFlow on a new machine with a fresh Ubuntu install, I have to spend some time again and again on this topic, and do some trial and error (yes I'm speaking about such issues). So here is a little HOWTO, once for all.
Important fact: we need to install the specific version number of CUDA and CUDNN relative to a particular version of TensorFlow, otherwise it will fail, with errors like
libcudnn.so.7: cannot open shared object file: No such file or directory.
For example, for TensorFlow 2.3, we have to use CUDA 10.1 and CUDNN 7.6 (see here).
Here is how to install on a Ubuntu 18.04:
pip3 install --upgrade pip # it was mandatory to upgrade for me pip3 install keras tensorflow==2.3.0 wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /" sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt-get update sudo apt install cuda-10-1 nvidia-driver-430
To test if the NVIDIA driver is properly installed, you can run
nvidia-smi (I noticed a reboot was necessary).
Then download "Download cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.1" on https://developer.nvidia.com/rdp/cudnn-archive (you need to create an account there), and then:
sudo dpkg -i libcudnn7_126.96.36.199-1+cuda10.1_amd64.deb
That's it! Reboot the computer, launch Python 3 and do:
import tensorflow tensorflow.test.gpu_device_name() # also, tensorflow.test.is_gpu_available() should give True
The last line should display the right GPU device name. If you get an empty string instead, it means your GPU isn't used by TensorFlow!
Initially the installation of CUDA 10.1 failed with errors like:
The following packages have unmet dependencies: cuda-10-1 : Depends: cuda-toolkit-10-1 (>= 10.1.243) but it is not going to be installed
on a fresh Xubuntu 18.04.5 install. Trying to install
cuda-toolkit-10-1manually led to other similar errors. Using a
sources.listfrom Xubuntu 18.04 like this one helped.
I also once had errors like
Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory. After searching this file on the filesystem, I noticed it was found in
/usr/local/cuda-10.2/...whereas I never installed the 10.2 version, strange! Solution given in this post:
sudo apt install --reinstall libcublas10=10.2.1.243-1 libcublas-dev=10.2.1.243-1. IIRC, these 2 issues weren't present when I used a Xubuntu 18.04, could the fact I used 18.04.5 be the reason?
- Nothing really related, but when installing Xubuntu 20.04.2.0, memtest86, which is quite useful to test the integrity of the hardware before launching long computations, did not work.