How to install NVIDIA drivers for machine learning on Ubuntu

How to install NVIDIA drivers for machine learning on Ubuntu

A common pain point for setting up servers to run AI models - is getting the NVIDIA drivers to work correctly with Pytorch and other machine-learning libraries.

In this guide, I will walk you through some installation steps you need to run to get your GPU working correctly with your AI models.

I am running Ubuntu 22.04. If you are running a different version - you may need to tweak the CUDA toolkit version to suit your Distro.

Install docker and some essential apt packages

sudo apt install apt-transport-https ca-certificates curl software-properties-common -y
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" -y
sudo apt install docker-ce -y

# Now add your user to the docker group
# You will need to logout and back in again - for this to take effect
sudo groupadd docker
sudo usermod -aG docker yourusername

Setup NVIDIA GPU drivers

sudo add-apt-repository ppa:graphics-drivers/ppa --yes
sudo apt update -y
sudo apt-get install linux-headers-$(uname -r)
sudo ubuntu-drivers install --gpgpu

Setup CUDA

sudo apt-get update -y

# You can find the right key to use for your distro here:
# https://developer.download.nvidia.com/compute/cuda/repos/
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get -y install cuda-toolkit-12-

Configure docker to use the GPU and Cuda


curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update -y

sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker

Conclusion

GPU setups can be tricky and painful, hopefully, this goes a long way in getting you up and running.

Now you should be able to run any of your Pytorch or machine learning models on the GPU, either natively on the machine or using docker.