Skip to content

Driver Installation

For GPU-accelerated deep learning, you need three key components:

  1. NVIDIA Driver - Controls the GPU hardware
  2. CUDA Toolkit - GPU computing platform
  3. cuDNN - Deep learning primitives library

Time required: 30-45 minutes Difficulty: Moderate

For PyTorch 2.2+:

NVIDIA Driver: ≥535.x
CUDA: 12.1 or 11.8
cuDNN: 8.9.0+

Verify on: pytorch.org/get-started

This installs everything you need in one go:

# Update package list
sudo apt update

# Install NVIDIA driver and CUDA via apt
sudo apt install -y nvidia-driver-535 nvidia-cuda-toolkit

# Reboot
sudo reboot

Verify installation:

# Check driver
nvidia-smi

# Check CUDA
nvcc --version

Step 1: Install NVIDIA Driver

# Add graphics drivers PPA
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

# Install driver (check latest version)
sudo apt install -y nvidia-driver-535

# Reboot
sudo reboot

# Verify
nvidia-smi

Step 2: Install CUDA Toolkit

Visit NVIDIA CUDA Downloads

# Example for CUDA 12.1 on Ubuntu 22.04
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

Step 3: Configure Environment

# Add to ~/.bashrc
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

# Reload
source ~/.bashrc

# Verify
nvcc --version
nvidia-smi

Step 4: Install cuDNN

Download from NVIDIA cuDNN (requires free account)

# Extract downloaded archive
tar -xvf cudnn-linux-x86_64-8.9.x.x_cuda12-archive.tar.xz

# Copy files
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp -P cuda/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

Best for avoiding system-wide installations:

# Install Miniconda first
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# Create environment with CUDA
conda create -n ml python=3.11 cuda -c nvidia

# Activate
conda activate ml

# Install PyTorch with CUDA
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
nvidia-smi

Expected output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.xx       Driver Version: 535.xx       CUDA Version: 12.2   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA RTX 4090     Off  | 00000000:01:00.0  On |                  Off |
| 30%   45C    P8    25W / 450W |    500MiB / 24564MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
# Check CUDA version
nvcc --version

# Compile and run sample
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery

Expected: “Result = PASS”

import torch

# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")

# Check CUDA version
print(f"CUDA version: {torch.version.cuda}")

# Check number of GPUs
print(f"GPU count: {torch.cuda.device_count()}")

# Get GPU name
if torch.cuda.is_available():
    print(f"GPU 0: {torch.cuda.get_device_name(0)}")

# Test GPU computation
x = torch.rand(5, 3).cuda()
print(f"Tensor on GPU: {x.is_cuda}")

Expected output:

CUDA available: True
CUDA version: 12.1
GPU count: 1
GPU 0: NVIDIA GeForce RTX 4090
Tensor on GPU: True

Solution:

# Check if driver is installed
lsmod | grep nvidia

# If empty, reinstall driver
sudo apt install --reinstall nvidia-driver-535
sudo reboot

Problem: PyTorch shows different CUDA version than nvcc

Explanation:

  • nvidia-smi shows max CUDA version supported by driver
  • nvcc --version shows installed CUDA toolkit version
  • PyTorch bundles its own CUDA runtime

Solution: This is usually fine. PyTorch includes CUDA libraries.

Not a driver issue - see GPU Memory Management

Check installations:

# Find all CUDA installations
ls /usr/local/ | grep cuda

# See current version
ls -l /usr/local/cuda

Clean up:

# Remove old versions
sudo apt remove --purge cuda-*
sudo apt autoremove

# Reinstall desired version
sudo apt install cuda-12-1

Check if nouveau is loaded:

lsmod | grep nouveau

If yes, blacklist it:

# Create blacklist file
sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"

# Update initramfs
sudo update-initramfs -u

# Reboot
sudo reboot

For systems with multiple GPUs:

# Check all GPUs detected
nvidia-smi -L

# Should show:
# GPU 0: NVIDIA GeForce RTX 4090
# GPU 1: NVIDIA GeForce RTX 4090
# etc.

# Set GPU affinity (optional)
export CUDA_VISIBLE_DEVICES=0,1  # Use only GPUs 0 and 1

See Multi-GPU Training for training setup.

Update if:

  • New CUDA version required for framework
  • Bug fixes for your GPU model
  • Performance improvements listed

Don’t update if:

  • Everything works fine
  • Mid-training on important project
# Check current version
nvidia-smi

# Update driver
sudo apt update
sudo apt upgrade nvidia-driver-535

# Reboot
sudo reboot

After driver installation:

  1. Set up Python environments
  2. Run benchmark tests (see Training Optimization)
  3. Configure multi-GPU (if applicable)