Description of errors
GPU UUID Support Section recommends using the ASIC_SERIAL UUID to inject a GPU by UUID. This is not correct.
The UUID used by container-toolkit is the one listed in /var/log/gpu-tracker.json, which is collected from .../kfd/topology/nodes/*/properties
Instead of recommending (ASIC SERIAL)
rocm-smi --showuniqueid
amd-smi static -a
This section should recommend (KFD UUID)
amd-ctk gpu-tracker status
$ amd-smi static -a
GPU: 0
ASIC:
MARKET_NAME: AMD Instinct MI300X VF
VENDOR_ID: 0x1002
VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
SUBVENDOR_ID: 0x1002
DEVICE_ID: 0x74b5
SUBSYSTEM_ID: 0x74a1
REV_ID: 0x00
ASIC_SERIAL: 0xDA20E05348498592
OAM_ID: 5
NUM_COMPUTE_UNITS: 304
TARGET_GRAPHICS_VERSION: gfx942
$ docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0xDA20E05348498592 rocm/dev-ubuntu-24.04 bash
Ignoring [0xDA20E05348498592] GPUs as they are invalid
$ amd-ctk gpu-tracker status
-----------------------------------------------------------------------------------------------------
GPU Id UUID Accessibility Container Ids
-----------------------------------------------------------------------------------------------------
0 0x582A730CC55D98F7 Shared -
$ docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0x582A730CC55D98F7 rocm/dev-ubuntu-24.04
GPUs [0] allocated
Description of errors
GPU UUID Support Section recommends using the
ASIC_SERIALUUID to inject a GPU by UUID. This is not correct.The UUID used by container-toolkit is the one listed in
/var/log/gpu-tracker.json, which is collected from.../kfd/topology/nodes/*/propertiesInstead of recommending (
ASIC SERIAL)rocm-smi --showuniqueidamd-smi static -aThis section should recommend (
KFD UUID)amd-ctk gpu-tracker status