Skip to content

Conversation

@yzhou103
Copy link
Contributor

Motivation

move a8w8_bpreshuffle ck and ck tile implementation together to tune

Technical Details

  1. add --libtype in ck_gemm_a8w8_bpreshuffle_tune to choose which type to tune
  2. cktile also support fp16 data type

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings November 28, 2025 03:17
Copilot finished reviewing on behalf of yzhou103 November 28, 2025 03:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates the CK and CKTile GEMM A8W8 implementations into a unified tuning framework. It replaces separate configuration files with a single shared config that includes a libtype field to distinguish between implementations.

Key Changes

  • Added --libtype parameter to the tuning script allowing selection of "ck", "cktile", "asm", or "all" implementations
  • Merged separate configuration files (a8w8_bpreshuffle_cktile_*) into a unified a8w8_bpreshuffle_* configuration
  • Updated runtime selection logic to use libtype field from config instead of comparing execution times

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
csrc/cktile_gemm_a8w8_bpreshuffle/include/gemm_a8w8_bpreshuffle_cktile_common.cuh Fixed include order and removed hardcoded bf16 output type
csrc/cktile_gemm_a8w8_bpreshuffle/gen_instances.py Added libtype filtering to only load cktile configs
csrc/cktile_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_cktile_tune.py Updated to use unified config path
csrc/cktile_gemm_a8w8_bpreshuffle/README.md Updated documentation to reflect unified tuning approach
csrc/ck_gemm_a8w8_bpreshuffle/gen_instances.py Added libtype filtering and commented out bounds check
csrc/ck_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.py Added libtype parameter, combined ck/cktile/asm tuning tasks
aiter/utility/base_tuner.py Fixed column selection order in preprocessing
aiter/ops/gemm_op_a8w8.py Simplified dispatch logic using libtype field
aiter/jit/optCompilerConfig.json Updated blob generation path for unified config
aiter/jit/core.py Commented out separate cktile config constant
aiter/configs/a8w8_bpreshuffle_cktile_untuned_gemm.csv Deleted (consolidated)
aiter/configs/a8w8_bpreshuffle_cktile_tuned_gemm.csv Deleted (consolidated)
Comments suppressed due to low confidence (1)

csrc/ck_gemm_a8w8_bpreshuffle/gen_instances.py:248

  • This comment appears to contain commented-out code.
            # if kid < 0 or kid > len(kernels_list):
            #    continue

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 247 to 248
# if kid < 0 or kid > len(kernels_list):
# continue
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out bounds checking code should either be removed or re-enabled. If the bounds check is intentionally removed, it could lead to index-out-of-bounds errors when accessing kernels_list[kid] on line 249. Consider either uncommenting this validation or documenting why it's safe to remove.

Suggested change
# if kid < 0 or kid > len(kernels_list):
# continue
if not isinstance(kid, int) or kid < 0 or kid >= len(kernels_list):
continue

Copilot uses AI. Check for mistakes.
First add GEMM shapes in `aiter/configs/a8w8_bpreshuffle_cktile_untuned_gemm.csv`, then run the following cmd to start tuning, please wait a few minutes as it will build gemm_a8w8_bpreshuffle_cktile_tune via jit:
`FLATMM_HIP_CLANG_PATH=/data/llvm-project/build/bin/ python3 csrc/cktile_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_cktile_tune.py -i aiter/configs/a8w8_bpreshuffle_cktile_untuned_gemm.csv -o aiter/configs/a8w8_bpreshuffle_cktile_tuned_gemm.csv`
First add GEMM shapes in `aiter/configs/a8w8_bpreshuffle_untuned_gemm.csv`, then run the following cmd to start tuning, please wait a few minutes as it will build gemm_a8w8_bpreshuffle_tune via jit:
`FLATMM_HIP_CLANG_PATH=/data/llvm-project/build/bin/ python3 csrc/cktile_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.py -i aiter/configs/a8w8_bpreshuffle_untuned_gemm.csv -o aiter/configs/a8w8_bpreshuffle_tuned_gemm.csv`
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command references the wrong script path. Since the tuning is now unified, this should point to csrc/ck_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.py with --libtype cktile to tune CKTile specifically, not csrc/cktile_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.py.

Suggested change
`FLATMM_HIP_CLANG_PATH=/data/llvm-project/build/bin/ python3 csrc/cktile_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.py -i aiter/configs/a8w8_bpreshuffle_untuned_gemm.csv -o aiter/configs/a8w8_bpreshuffle_tuned_gemm.csv`
`FLATMM_HIP_CLANG_PATH=/data/llvm-project/build/bin/ python3 csrc/ck_gemm_a8w8_bpreshuffle/gemm_a8w8_bpreshuffle_tune.py -i aiter/configs/a8w8_bpreshuffle_untuned_gemm.csv -o aiter/configs/a8w8_bpreshuffle_tuned_gemm.csv --libtype cktile`

Copilot uses AI. Check for mistakes.
## More
If you want to re-install gemm_a8w8_bpreshuffle_cktile, you should remove `aiter/jit/module_gemm_a8w8_bpreshuffle_cktile.so` and `aiter/jit/build/module_gemm_a8w8_bpreshuffle_cktile` first.
If you use flag `PREBUILD_KERNELS=1` when you install aiter, it will build gemm a8w8 kernels in `aiter/configs/a8w8_bpreshuffle_cktile_tuned_gemm.csv` by default. If you want to use the new result of gemm_a8w8_bpreshuffle_cktile_tune, please remove `build` and `*.so` first, then re-intall aiter after finishing tune.
If you use flag `PREBUILD_KERNELS=1` when you install aiter, it will build gemm a8w8 kernels in `aiter/configs/a8w8_bpreshuffle_tuned_gemm.csv` by default. If you want to use the new result of gemm_a8w8_bpreshuffle_cktile_tune, please remove `build` and `*.so` first, then re-intall aiter after finishing tune.
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 're-intall' to 're-install'.

Suggested change
If you use flag `PREBUILD_KERNELS=1` when you install aiter, it will build gemm a8w8 kernels in `aiter/configs/a8w8_bpreshuffle_tuned_gemm.csv` by default. If you want to use the new result of gemm_a8w8_bpreshuffle_cktile_tune, please remove `build` and `*.so` first, then re-intall aiter after finishing tune.
If you use flag `PREBUILD_KERNELS=1` when you install aiter, it will build gemm a8w8 kernels in `aiter/configs/a8w8_bpreshuffle_tuned_gemm.csv` by default. If you want to use the new result of gemm_a8w8_bpreshuffle_cktile_tune, please remove `build` and `*.so` first, then re-install aiter after finishing tune.

Copilot uses AI. Check for mistakes.
valarLip
valarLip previously approved these changes Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants