Driver Installation
Overview
Section titled “Overview”For GPU-accelerated deep learning, you need three key components:
- NVIDIA Driver - Controls the GPU hardware
- CUDA Toolkit - GPU computing platform
- cuDNN - Deep learning primitives library
Time required: 30-45 minutes Difficulty: Moderate
Version Compatibility Guide
Section titled “Version Compatibility Guide”Current Recommendations (2025)
Section titled “Current Recommendations (2025)”For PyTorch 2.2+:
NVIDIA Driver: ≥535.x
CUDA: 12.1 or 11.8
cuDNN: 8.9.0+Verify on: pytorch.org/get-started
For TensorFlow 2.15+:
NVIDIA Driver: ≥525.x
CUDA: 12.2 or 11.8
cuDNN: 8.9.0+Verify on: tensorflow.org/install/gpu
Installation Methods
Section titled “Installation Methods”Method 1: Recommended (Easy)
Section titled “Method 1: Recommended (Easy)”This installs everything you need in one go:
# Update package list
sudo apt update
# Install NVIDIA driver and CUDA via apt
sudo apt install -y nvidia-driver-535 nvidia-cuda-toolkit
# Reboot
sudo rebootVerify installation:
# Check driver
nvidia-smi
# Check CUDA
nvcc --versionMethod 2: CUDA from NVIDIA (More Control)
Section titled “Method 2: CUDA from NVIDIA (More Control)”Step 1: Install NVIDIA Driver
# Add graphics drivers PPA
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
# Install driver (check latest version)
sudo apt install -y nvidia-driver-535
# Reboot
sudo reboot
# Verify
nvidia-smiStep 2: Install CUDA Toolkit
Visit NVIDIA CUDA Downloads
# Example for CUDA 12.1 on Ubuntu 22.04
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudaStep 3: Configure Environment
# Add to ~/.bashrc
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
# Reload
source ~/.bashrc
# Verify
nvcc --version
nvidia-smiStep 4: Install cuDNN
Download from NVIDIA cuDNN (requires free account)
# Extract downloaded archive
tar -xvf cudnn-linux-x86_64-8.9.x.x_cuda12-archive.tar.xz
# Copy files
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp -P cuda/lib/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*Method 3: Conda (Isolated)
Section titled “Method 3: Conda (Isolated)”Best for avoiding system-wide installations:
# Install Miniconda first
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# Create environment with CUDA
conda create -n ml python=3.11 cuda -c nvidia
# Activate
conda activate ml
# Install PyTorch with CUDA
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidiaWindows Installation
Section titled “Windows Installation”Step 1: Install NVIDIA Driver
Visit NVIDIA Drivers
- Select your GPU model
- Download latest driver
- Run installer
- Restart computer
Verify:
nvidia-smiStep 2: Install CUDA Toolkit
Visit NVIDIA CUDA Downloads
- Select Windows → x86_64 → version
- Download installer (network or local)
- Run installer
- Choose “Custom” and select:
- CUDA Toolkit
- CUDA Documentation (optional)
- CUDA Samples (optional)
Step 3: Install cuDNN
Download from NVIDIA cuDNN
# Extract zip file
# Copy files to CUDA installation directory:
# From cudnn-windows-x86_64-8.x.x.x_cuda12-archive\bin
# To C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin
# From cudnn-windows-x86_64-8.x.x.x_cuda12-archive\include
# To C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include
# From cudnn-windows-x86_64-8.x.x.x_cuda12-archive\lib
# To C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64Step 4: Set Environment Variables
- Open “Edit system environment variables”
- Click “Environment Variables”
- Add to PATH:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvpVerify:
nvcc --versionWSL2 Installation
Section titled “WSL2 Installation”If using WSL2 (Windows Subsystem for Linux):
# In Windows, install NVIDIA driver (already done above)
# In WSL2 Ubuntu:
# DON'T install drivers in WSL2, use Windows drivers
# Just install CUDA toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudaVerification
Section titled “Verification”Test NVIDIA Driver
Section titled “Test NVIDIA Driver”nvidia-smiExpected output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.xx Driver Version: 535.xx CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVIDIA RTX 4090 Off | 00000000:01:00.0 On | Off |
| 30% 45C P8 25W / 450W | 500MiB / 24564MiB | 0% Default |
+-------------------------------+----------------------+----------------------+Test CUDA
Section titled “Test CUDA”# Check CUDA version
nvcc --version
# Compile and run sample
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQueryExpected: “Result = PASS”
Test with Python
Section titled “Test with Python”import torch
# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")
# Check CUDA version
print(f"CUDA version: {torch.version.cuda}")
# Check number of GPUs
print(f"GPU count: {torch.cuda.device_count()}")
# Get GPU name
if torch.cuda.is_available():
print(f"GPU 0: {torch.cuda.get_device_name(0)}")
# Test GPU computation
x = torch.rand(5, 3).cuda()
print(f"Tensor on GPU: {x.is_cuda}")Expected output:
CUDA available: True
CUDA version: 12.1
GPU count: 1
GPU 0: NVIDIA GeForce RTX 4090
Tensor on GPU: Trueimport tensorflow as tf
# Check if GPU is available
print(f"GPUs available: {len(tf.config.list_physical_devices('GPU'))}")
# List GPU devices
gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
print(f"GPU: {gpu}")
# Check CUDA/cuDNN versions
print(f"CUDA version: {tf.sysconfig.get_build_info()['cuda_version']}")
print(f"cuDNN version: {tf.sysconfig.get_build_info()['cudnn_version']}")
# Test GPU computation
with tf.device('/GPU:0'):
a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
b = tf.constant([[1.0, 1.0], [0.0, 1.0]])
c = tf.matmul(a, b)
print(f"Result on GPU: {c}")Common Issues
Section titled “Common Issues”Issue: “nvidia-smi” not found
Section titled “Issue: “nvidia-smi” not found”Solution:
# Check if driver is installed
lsmod | grep nvidia
# If empty, reinstall driver
sudo apt install --reinstall nvidia-driver-535
sudo rebootIssue: CUDA version mismatch
Section titled “Issue: CUDA version mismatch”Problem: PyTorch shows different CUDA version than nvcc
Explanation:
nvidia-smishows max CUDA version supported by drivernvcc --versionshows installed CUDA toolkit version- PyTorch bundles its own CUDA runtime
Solution: This is usually fine. PyTorch includes CUDA libraries.
Issue: “CUDA out of memory”
Section titled “Issue: “CUDA out of memory””Not a driver issue - see GPU Memory Management
Issue: Multiple CUDA versions
Section titled “Issue: Multiple CUDA versions”Check installations:
# Find all CUDA installations
ls /usr/local/ | grep cuda
# See current version
ls -l /usr/local/cudaClean up:
# Remove old versions
sudo apt remove --purge cuda-*
sudo apt autoremove
# Reinstall desired version
sudo apt install cuda-12-1Issue: Nouveau driver conflict (Linux)
Section titled “Issue: Nouveau driver conflict (Linux)”Check if nouveau is loaded:
lsmod | grep nouveauIf yes, blacklist it:
# Create blacklist file
sudo bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
sudo bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf"
# Update initramfs
sudo update-initramfs -u
# Reboot
sudo rebootMulti-GPU Considerations
Section titled “Multi-GPU Considerations”For systems with multiple GPUs:
# Check all GPUs detected
nvidia-smi -L
# Should show:
# GPU 0: NVIDIA GeForce RTX 4090
# GPU 1: NVIDIA GeForce RTX 4090
# etc.
# Set GPU affinity (optional)
export CUDA_VISIBLE_DEVICES=0,1 # Use only GPUs 0 and 1See Multi-GPU Training for training setup.
Driver Updates
Section titled “Driver Updates”When to Update
Section titled “When to Update”Update if:
- New CUDA version required for framework
- Bug fixes for your GPU model
- Performance improvements listed
Don’t update if:
- Everything works fine
- Mid-training on important project
How to Update (Ubuntu)
Section titled “How to Update (Ubuntu)”# Check current version
nvidia-smi
# Update driver
sudo apt update
sudo apt upgrade nvidia-driver-535
# Reboot
sudo rebootNext Steps
Section titled “Next Steps”After driver installation:
- Set up Python environments
- Run benchmark tests (see Training Optimization)
- Configure multi-GPU (if applicable)