Skip to content

GPU (NVIDIA)

fixing numa_node
#1)Identify the PCI-ID (with domain) of your GPU
#For example: PCI_ID=“0000.81:00.0”
lspci -D | grep NVIDIA
# 2) Add a crontab for root
sudo crontab -e
#Add the following line.
#This guarantees that the NUMA affinity is set to 0 for the GPU device on every reboot.
@reboot (echo 0 | tee -a “/sys/bus/pci/devices/<PCI_ID>/numa_node”)

#Keep in mind that this is only a “shallow” fix as the Nvidia driver is unaware of it:
#Locally you would have some different PCI_ID, so replace it with your own.
#Such as 0000:0b:00.0, so example:
@reboot (echo 0 | tee -a “/sys/bus/pci/devices/0000:0b:00.0/numa_node”)

#You can also test
nvidia-smi topo -m

Discussion of the problem on Stackoverflow.

Checking if you have GPU available for Pytorch

import torch
torch.cuda.is_available()
#If you have GPU available output should be True

torch.cuda.get_device_name(0)
#If you have GPU available output should be the name of your GPU (for example RTX 3080)