[Feature] Per Expert Overlap (PEO) #492

k-ling3 · 2025-11-13T08:44:42Z

Background

The insight of this work is that we group the experts, allowing the communication of some experts to overlap with the computation of other experts. We call this approach Per Expert Overlap (PEO). Compared to existing methods, our approach has the following advantages:

1. Performance:

Compared to Non-overlap, PEO performs better at all batch sizes.
- For the DPSK model:
  - At batch size 4, PEO achieves an 11% improvement.
  - At batch size 128, PEO achieves a 31% improvement. The larger the batch size, the more significant the gain.
- For the QWEN model, PEO achieves up to a 51% improvement.
Compared to PR 390 ([Feat] Single Batch Overlap (SBO): Overlaping of Down GEMM with Combine Send #390), PEO also performs better.

2. Usability

Compared to PR 390, PEO only modifies DeepEP and does not change DeepGEEM, making it easier to use.

In short, during the dispatch phase, we change the order of communication (by modifying DeepEP) to allow some experts to receive tokens first. During the GEMM phase, we change the order of computation (by modifying how the inference engine calls DeepGEMM) to allow some experts to compute first. In the combine phase, we let some experts send tokens first. Overall, this allows the communication of some experts to overlap with the computation of others.

Design

In the original DeepEP, each communication unit consists of num_experts or num_local_experts experts. That is, during the dispatch phase, each rank sends tokens to num_experts experts. During the combine phase, each rank sends tokens from num_local_experts experts to num_ranks ranks.

This solution modifies DeepEP by dividing the experts into num_rounds groups, and the communication is divided into num_rounds rounds.

During the dispatch phase, in each round, each rank sends tokens to num_experts // num_rounds experts.
During the combine phase, in each round, each rank sends tokens from num_local_experts // num_rounds local experts to num_ranks ranks.

The process is shown as follows:

Due to differences in model parameters, deployment scale, and batch size, this solution allows the following adjustable parameters to achieve the best overlap effect in different scenarios:

Parameters for Overlap:

Overlap method
- We tested different overlap methods and found they have different effects. Consider the following options:
  - overlap-1: After all dispatch sends are completed, then perform dispatch recv and GEMM.
  - overlap-2: Immediately after each dispatch send, perform recv + gemm.
  - overlap-3: Immediately after each dispatch send, perform recv + gemm, and allow DeepEP's send and recv to overlap.
  - overlap-4: No overlap between dispatch and GEMM.
num_rounds: Number of rounds for splitting dispatch/combine.
deepep_send_num_sms: Number of SMs used for dispatch/combine send.
deepep_recv_num_sms: Number of SMs used for dispatch/combine recv.
up_deepgemm_num_sms: Number of SMs used for UP GEMM.
down_deepgemm_num_sms: Number of SMs used for DOWN GEMM.

Performance

Configuration:

H20, EP16, QWEN, DPSK

Comparison Methods:

non-overlap
PR 390 ([Feat] Single Batch Overlap (SBO): Overlaping of Down GEMM with Combine Send #390) (SBO)

Conclusion:

For both DPSK and QWEN, the overlap performance is the best at almost all batch sizes. For DPSK, PEO achieves a maximum of 31% improvement at batch size 128. For QWEN, PEO achieves a maximum of 50% improvement at batch size 16.

rubbberrabbit · 2025-11-17T08:08:49Z

Hi, to achieve Parameters for Overlap, should we modify in sglang forward and launch multi time gemm( to caculate different expert group ) ?

Support per-expert-overlap SBO.

b5fe7bb

zhihui1084 mentioned this pull request Nov 17, 2025

[Feature]Introduce DeepEP's Per-Expert-overlap(PEO) capability into SGLang. sgl-project/sglang#13442

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Per Expert Overlap (PEO) #492

[Feature] Per Expert Overlap (PEO) #492

k-ling3 commented Nov 13, 2025

Uh oh!

rubbberrabbit commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature] Per Expert Overlap (PEO) #492

Are you sure you want to change the base?

[Feature] Per Expert Overlap (PEO) #492

Conversation

k-ling3 commented Nov 13, 2025

Background

1. Performance:

2. Usability

Design

Parameters for Overlap:

Performance

Configuration:

Comparison Methods:

Conclusion:

Uh oh!

rubbberrabbit commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants