-
Notifications
You must be signed in to change notification settings - Fork 161
Description
Problem Description
I am trying to run DeepSeek-V3.2 on an MI300X machine via SGLang, using the NSA + AITER (tilelang / flashMLA) pipeline on ROCm. That pipeline depends on AITER’s ROCm kernels, including its custom all-reduce, layernorm, and the FlashMLA module (flash_mla). On this machine those pieces are either missing or broken.
Here is what I am seeing.
AITER custom all-reduce segfaults
As soon as workers start running the NSA + AITER path, the process segfaults in aiter/aiter/jit/core.py.
This happens even when SGLang is started with --disable-custom-all-reduce, so it looks like AITER’s own ROCm all-reduce kernel is still being used somewhere in the pipeline.
The Python side never sees a PyTorch RuntimeError. It is a hard segmentation fault.
Operating System
Ubuntu 22.04.5 LTS
CPU
Intel(R) Xeon(R) Platinum 8480C
GPU
8* AMD MI300X
ROCm Version
6.10.5
ROCm Component
No response
Steps to Reproduce
docker image: lmsysorg/sglang:dsv32-rocm
command to run: python -m sglang.launch_server --model-path /models/DeepSeek-V3.2/ --tp 8 --disable-cuda-graph --disable-custom-all-reduce
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response