GPU
- 淺談GPU到底是什麼(上):不同的運算型態 (133369) - Cool3c
- 淺談GPU到底是什麼(中):兼具SIMD與MIMD優點的SIMT (133370) - Cool3c
- 硬科技:淺談GPU到底是什麼(下):走向汎用化的GPGPU (134057) - Cool3c
- 硬科技:GPU虛擬化為何超級難搞(上) #CPU (157525) - Cool3c
- 硬科技:GPU虛擬化為何超級難搞(中) #api (157526) - Cool3c
- 硬科技:GPU虛擬化為何超級難搞(下) #nvidia (157527) - Cool3c
- PCI devices
is it PCIe GPU or HGX?
flow
enable gpu direct gpu1 <=> ib card <=> ib card <=> gpu2 有了之後是綠的
disable gpu direct gpu1 <=> cpu <=> ib card <=> ib card <=> cpu <=> gpu2 沒有gpu direct是紫色

確認CPU Slot位置
GPU and NIC mapping
look at the PIX
GPU0 is mapping to NIC0 GPU6 is mapping to NIC5 GPU4 is mapping to NIC4
nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 PIX SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 SYS PIX SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS 0-31,64-95 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS SYS SYS PIX SYS 32-63,96-127 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS SYS SYS SYS SYS 32-63,96-127 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS SYS SYS SYS PIX 32-63,96-127 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS SYS SYS SYS SYS 32-63,96-127 1 N/A
NIC0 PIX SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS SYS
NIC1 SYS SYS PIX SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS
NIC2 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS X PIX SYS SYS
NIC3 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS PIX X SYS SYS
NIC4 SYS SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS SYS X SYS
NIC5 SYS SYS SYS SYS SYS SYS PIX SYS SYS SYS SYS SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
root@tester:~/tools/perftest-cuda/bin# mst status -v
MST modules:
------------
MST PCI module is not loaded
MST PCI configuration module is not loaded
PCI devices:
------------
DEVICE_TYPE MST PCI RDMA NET NUMA
ConnectX6(rev:0) NA 5d:00.0 mlx5_2 net-ibp93s0f0 0
ConnectX7(rev:0) NA c0:00.0 mlx5_5 net-ibp192s0 1
ConnectX7(rev:0) NA 9c:00.0 mlx5_4 net-ibp156s0 1
ConnectX7(rev:0) NA 40:00.0 mlx5_1 net-ibp64s0 0
ConnectX7(rev:0) NA 1a:00.0 mlx5_0 net-ibp26s0 0
ConnectX6(rev:0) NA 5d:00.1 mlx5_3 net-ibp93s0f1 0
sharing Nvidia GPU resources
Multi-Instance GPU (MIG)
Multi-Instance GPU (MIG)類似multi process
single the GPU is partitioned into multiple instances of the same size. For example, an NVIDIA A100 GPU can be divided into seven instances, each with equal resources.
mixed the GPU is partitioned into instances of different sizes. This allows for a more flexible allocation of resources based on the specific needs of each workload.
Time-Slicing GPUs
Time-Slicing GPUs類似single thread with event loop
GPU time-slicing can be used with bare-metal applications, virtual machines with GPU passthrough, and virtual machines with NVIDIA vGPU.
Nvidia License Server
Nvidia CUDA
- CUDA Toolkit 12.4 Update 1 Downloads | NVIDIA Developer
- https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/
- 1. Introduction — Installation Guide for Linux 12.3 documentation
- 1. Introduction — Quick Start Guide 12.4 documentation
- CUDA Compatibility
- Driver and Runtime
- CUDA Driver VS CUDA Runtime - Lei Mao's Log Book
- CUDA has two APIs: 1. The runtime api (libcudart.so) 2. The driver api (libcuda.... | Hacker News
- CUDA 的driver API 、runtime API、Libraries - 知乎
- CUDA C++ Programming Guide - 3.3. Versioning and Compatibility
- Cuda toolkit — Cuda driver. Before using Nvidia’s profiling tools… | by Gia Huy ( CisMine) | Medium
- CUDA Installation Guide for Linux - 18. Removing CUDA Toolkit and Driver
- Runfile
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
runfile
sudo apt-get install build-essential gcc-12
sudo ln -s -f /usr/bin/gcc-12 /usr/bin/gcc
sudo sh cuda_12.6.2_560.35.03_linux.run --silent --driver
NCCL
- GitHub - NVIDIA/cloud-native-stack: Run cloud native workloads on NVIDIA GPUs
- GitHub - NVIDIA/nccl: Optimized primitives for collective multi-GPU communication
docker
- Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit 1.15.0 documentation
- 實作在 Docker 環境中使用 GPU - IT Bunny Lee
Driver
Debug
- NVIDIA GPU Debug Guidelines :: GPU Deployment and Management Documentation
- Bug #1915413 “Milan Delta A100 GPU fails to detect on Ubuntu 18....” : Bugs : Ubuntu
NIM
NVIDIA Inference Microservice - NVIDIA 黃仁勳執行長在 2024 Computex 說的 NIM 是什麼? - CAVEDU教育團隊技術部落格
NVIDIA GPU passthrough for KVM and guest OS is windows
- All you need for PCI passthrough on Ubuntu 22.04 + Windows11
- vfio module is loaded or not
- the Kernel driver in use: vfio-pci
- IOMMU group 26(61:00.0 and 61:00.1) are all added into the KVM PCIE config or not
- Index of /groups/virt/virtio-win/direct-downloads/archive-virtio
- :star:PCI Passthrough - Proxmox VE
- :star:在 Proxmox 上進行 PCI-E 直通 - Zen's Blog
- :star:[Blog] My Journey migrating from windows to Linux - Operating Systems & Open Source / Virtualization - Level1Techs Forums
- PCI passthrough via OVMF - ArchWiki
- Code 43 while Resizable Bar is turned on in the bios
- [SOLVED] - GPU Passthrough Issues After Upgrade to 7.2 | Page 4 | Proxmox Support Forum
- :star:VM GPU passthrough resizable BAR support in 6.1 kernel
- VFIO: How to enable Resizeable BAR (ReBAR) in your VFIO Virtual Machine - Angry Sysadmins
- Can Anyone Else Confirm that VFIO Doesn't Work w/ Nvidia GPUs if Resizable BAR is Enabled? : r/VFIO
- qemu:commandline about extend the available mmio space for the guest to 64GB
- Apparently the RTX A6000 supports resizable BAR as well on stock vBIOS, despite Nvidia not advertising the capability anywhere. Neat! : r/nvidia
- :star:PCI passthrough via OVMF - ArchWiki
- Ubuntu將Nvidia GPU直通給Windows虛擬機 + Looking Glass安裝教學 · Ivon的部落格
- QEMU/KVM GPU Passthrough: Attaching PCI host devices causes guest to not boot - English / Virtualization - openSUSE Forums
- A6000 ADA device not found on kvm guest system - Graphics / Linux / Linux NVIDIA Developer Forums
- Could not install Nvidia driver in GPU-passthrough VM - Graphics / Linux / Linux - NVIDIA Developer Forums
- nvidia - QEMU-KVM, GPU Passthrough, Windows 10 - Crashing - Ask Ubuntu
- BSOD in Windows 11 with GPU passthorugh & nested virtualisation | Proxmox Support Forum
- fresh Ubuntu 24 and KVM/Libvirt and copied the same Windows image to it with virtio and NVIDIA GPU passthough
bootloader dmesg kernerl version CPU(Intel/AMD) driver OVMF firmware QEMU version
IOMMU Resizeable BAR (ReBAR)
IOMMU Group 26 61:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:26b1] (rev a1)
IOMMU Group 26 61:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)
IOMMU Group 89 e1:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:26b1] (rev a1)
IOMMU Group 89 e1:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)
IOMMU Group 100 81:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:26b1] (rev a1)
IOMMU Group 100 81:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)
IOMMU Group 116 91:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:26b1] (rev a1)
IOMMU Group 116 91:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)
sudo dmesg | grep -i iommu
# [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.0-79-generic root=UUID=2163fb96-af53-452a-a0fc-c84b2c3f3e44 ro quiet splash amd_iommu=on iommu=pt vt.handoff=7
# [ 1.020652] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.0-79-generic root=UUID=2163fb96-af53-452a-a0fc-c84b2c3f3e44 ro quiet splash amd_iommu=on iommu=pt vt.handoff=7
# Verify module is loaded
lsmod | grep vfio
# check the Kernel driver in use: vfio-pci
sudo lspci -vvv -s "0000:61:00.0"
sudo lspci -vvv -s "0000:81:00.0"
sudo lspci -vvv -s "0000:91:00.0"
sudo lspci -vvv -s "0000:e1:00.0"
ls -la /sys/bus/pci/devices/0000\:61\:00.0/iommu_group
ls -la /sys/bus/pci/devices/0000\:81\:00.0/iommu_group
ls -la /sys/bus/pci/devices/0000\:91\:00.0/iommu_group
ls -la /sys/bus/pci/devices/0000\:e1\:00.0/iommu_group
# log
/var/log/libvirt/qemu/Window11-clone1.log
# config xml
/etc/libvirt/qemu/
# Runtime configurations and snapshots
/var/lib/libvirt/qemu/
EDITOR=vim virsh edit Window11-clone1
# Restart services
sudo systemctl restart libvirtd
sudo systemctl restart qemu-kvm
# Verify new version
qemu-system-x86_64 --version
Re-Size BAR Support: Enable
the qemu log
2025-09-16T03:49:14.496883Z qemu-system-x86_64: -device vfio-pci,host=0000:61:00.0,id=hostdev0,bus=pci.6,addr=0x0: vfio_listener_region_add received unaligned region
script to unbind my gpu from vfio, set the bar size to 8GB, and rebind to vfio
GPU=0e:00
GPU_ID="1002 744c"
echo -n "0000:${GPU}.0" > /sys/bus/pci/drivers/vfio-pci/unbind || echo "Failed to unbind gpu from vfio-pci"
cd /sys/bus/pci/devices/0000\:0e\:00.0/
echo 13 > resource0_resize
echo -n "$GPU_ID" > /sys/bus/pci/drivers/vfio-pci/new_id
set the bar size to 16GB
#!/bin/bash
echo -n "0000:0d:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind
echo 14 > /sys/bus/pci/devices/0000\:0d\:00.0/resource1_resize
echo -n "10de 2782" > /sys/bus/pci/drivers/vfio-pci/new_id || echo -n "0000:0d:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
# Bit Sizes
# 1 = 2MB
# 2 = 4MB
# 3 = 8MB
# 4 = 16MB
# 5 = 32MB
# 6 = 64MB
# 7 = 128MB
# 8 = 256MB
# 9 = 512MB
# 10 = 1GB
# 11 = 2GB
# 12 = 4GB
# 13 = 8GB
# 14 = 16GB
# 15 = 32GB
Re-Size BAR Support: Disable
Stress Test on windows
- 10 Best Tools To Stress Test Your GPU on Windows
- Best GPU Stress Test Software In 2025 - PC is Awesome
Traingin and certification
- Self-Paced Training and Courses | NVIDIA
- AI Learning Essentials | NVIDIA
- Generative AI and LLM Learning Paths | NVIDIA
- NVIDIA Learning Paths by Role and Topic
AMD ROCm
- ROCm quick start install guide for Linux — ROCm installation (Linux)
- New ROCm Documentation Site : r/ROCm
- System requirements (Linux) — ROCm installation (Linux)
- Compatibility matrix — ROCm Documentation
- GPU-enabled Message Passing Interface — GPU cluster networking documentation
ROCm Validation Suite(RVS)
- ROCm Validation Suite documentation — RVS 1.1.0 Documentation
- ROCmValidationSuite/docs/ug1main.md at master · ROCm/ROCmValidationSuite · GitHub
- example: /opt/rocm/share/rocm-validation-suite/conf/
- example: https://github.com/ROCm/ROCmValidationSuite/tree/master/rvs/conf