Skip to content

Conversation

@kareemshaik80
Copy link

@kareemshaik80 kareemshaik80 commented Oct 30, 2025

  • restructure moe kernels folder
  • add prepare moe inputs kerels
    • compute_problem_sizes
    • compute_expert_offsets
    • compute_expert_blockscale_offsets
    • compute_arg_sorts
    • ShuffleRows
    • ApplyShuffleMulSum

 - restructure moe kernels folder
 - add prepare moe inputs kerel

Signed-off-by: kareem <[email protected]>
@kareemshaik80 kareemshaik80 changed the title Restructure MoE and add prepare inputs/meta kernel Restructure MoE and add prepare inputs/meta kernel [wip] Oct 30, 2025
Signed-off-by: kareem <[email protected]>
@kareemshaik80 kareemshaik80 changed the title Restructure MoE and add prepare inputs/meta kernel [wip] Restructure MoE and add routing kernel [wip] Oct 30, 2025
kareemshaik80 and others added 5 commits November 3, 2025 08:11
@kareemshaik80 kareemshaik80 changed the title Restructure MoE and add routing kernel [wip] Restructure MoE and Add prepare input kernels Nov 10, 2025
@kareemshaik80 kareemshaik80 changed the title Restructure MoE and Add prepare input kernels Restructure MoE and Add MoE prepare input kernels Nov 10, 2025
Signed-off-by: kareem <[email protected]>
Signed-off-by: kareem <[email protected]>
Copy link

@adityachatter adityachatter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activations not only serve for MoE, better leave it unchanged.
Beside I'd prefer only put customized cutlass code under src/sycl/kernels/, and leave pure SYCL code outside

Comment on lines +6 to +10

@pytest.mark.parametrize("num_tokens", [5, 16, 128])
@pytest.mark.parametrize("num_experts", [4, 8, 32])
@pytest.mark.parametrize("top_k", [2])
@pytest.mark.parametrize("hidden_dims", [16, 32, 64])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@airMeng airMeng added the run-ci label Nov 11, 2025
Comment on lines +173 to +187
void operator()(sycl::nd_item<1> item) const {
int32_t tot_offset = 0;
int32_t tot_rounded_offset = 0;
expert_offsets_[0] = 0;
blockscale_offsets_[0] = 0;
for (int i = 0; i < num_experts_; ++i) {
atomic_buffer_[i] = tot_offset;
int num_tokens = problem_sizes1_[i * 3];
int rounded_num_tokens = (num_tokens + (block_size - 1)) / block_size * block_size;
tot_offset += num_tokens;
tot_rounded_offset += rounded_num_tokens;
expert_offsets_[i + 1] = tot_offset;
blockscale_offsets_[i + 1] = tot_rounded_offset;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants