Skip to content

[Documentation]: wrong UUID for injecting GPUs into containers #116

@katya-turba

Description

@katya-turba

Description of errors

GPU UUID Support Section recommends using the ASIC_SERIAL UUID to inject a GPU by UUID. This is not correct.

The UUID used by container-toolkit is the one listed in /var/log/gpu-tracker.json, which is collected from .../kfd/topology/nodes/*/properties

Instead of recommending (ASIC SERIAL)
rocm-smi --showuniqueid
amd-smi static -a

This section should recommend (KFD UUID)
amd-ctk gpu-tracker status

$ amd-smi static -a
GPU: 0
    ASIC:
        MARKET_NAME: AMD Instinct MI300X VF
        VENDOR_ID: 0x1002
        VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
        SUBVENDOR_ID: 0x1002
        DEVICE_ID: 0x74b5
        SUBSYSTEM_ID: 0x74a1
        REV_ID: 0x00
        ASIC_SERIAL: 0xDA20E05348498592
        OAM_ID: 5
        NUM_COMPUTE_UNITS: 304
        TARGET_GRAPHICS_VERSION: gfx942

$ docker run --rm --runtime=amd -e AMD_VISIBLE_DEVICES=0xDA20E05348498592 rocm/dev-ubuntu-24.04 bash
Ignoring [0xDA20E05348498592] GPUs as they are invalid
$ amd-ctk gpu-tracker status
-----------------------------------------------------------------------------------------------------
GPU Id    UUID                     Accessibility       Container Ids
-----------------------------------------------------------------------------------------------------
0         0x582A730CC55D98F7       Shared              -   

$ docker run --rm --runtime=amd   -e AMD_VISIBLE_DEVICES=0x582A730CC55D98F7   rocm/dev-ubuntu-24.04
GPUs [0] allocated

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions