[EPLB] Weight rearrangement optimization #28562
Draft
+396
−141
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In this PR we refactor eplb
rearrange_expert_weightsphase.Instead of multiple small p2p operations which are done per layer, we take a group of layers and pack all p2p ops. It requires additional send and recv buffers but allows to reduce communication costs.
Benchmarking.
Isolated
rearrange_expert_weightsmicrobenchmark.QwenNext 80B with 128 redundant experts on 4 H100 GPUs.
Total kernel duration for ncclSendRecv
Before: 1.7s
After: 0.16s
As a result more than 10X communication kernel reduction.
Purpose
EPLB weights distribution optimization
Test Plan
PR also refactors the tests allowing nicer output and test failure without hangs.
tests/distributed/test_eplb_execute.pyTest Result
Tests passed.