[Issue]: AITER all reduce kernel segmentation fault in MI300x

### Problem Description

I am trying to run DeepSeek-V3.2 on an MI300X machine via SGLang, using the NSA + AITER (tilelang / flashMLA) pipeline on ROCm. That pipeline depends on AITER’s ROCm kernels, including its custom all-reduce, layernorm, and the FlashMLA module (flash_mla). On this machine those pieces are either missing or broken.

Here is what I am seeing.

AITER custom all-reduce segfaults

As soon as workers start running the NSA + AITER path, the process segfaults in `aiter/aiter/jit/core.py`.

This happens even when SGLang is started with `--disable-custom-all-reduce`, so it looks like AITER’s own ROCm all-reduce kernel is still being used somewhere in the pipeline.

The Python side never sees a PyTorch RuntimeError. It is a hard segmentation fault.


### Operating System

Ubuntu 22.04.5 LTS

### CPU

Intel(R) Xeon(R) Platinum 8480C

### GPU

8* AMD MI300X

### ROCm Version

6.10.5

### ROCm Component

_No response_

### Steps to Reproduce

docker image: lmsysorg/sglang:dsv32-rocm
command to run:  python -m sglang.launch_server   --model-path /models/DeepSeek-V3.2/   --tp 8   --disable-cuda-graph   --disable-custom-all-reduce


### (Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

_No response_

### Additional Information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Issue]: AITER all reduce kernel segmentation fault in MI300x #1542

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Issue]: AITER all reduce kernel segmentation fault in MI300x #1542

Description

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions