Skip to content

Conversation

@jianingyu-ustc
Copy link
Contributor

@jianingyu-ustc jianingyu-ustc commented Nov 11, 2025

PR Category

Custom Device

PR Types

New features

Description

[XPU] [DEEP_EP 2/4] DeepEP normal intranode / internode support xpu

@CLAassistant
Copy link

CLAassistant commented Nov 11, 2025

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ZibinGuo
❌ jianingyu-ustc
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link

paddle-bot bot commented Nov 11, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Nov 11, 2025
@jianingyu-ustc jianingyu-ustc changed the title [XPU] [DEEP_EP 2/4] DeepEP support xpu [XPU] [DEEP_EP 2/4] DeepEP normal intranode / internode support xpu Nov 19, 2025
@jianingyu-ustc jianingyu-ustc force-pushed the develop branch 4 times, most recently from 6ff38bb to 21fb80c Compare November 20, 2025 15:21
@jianingyu-ustc jianingyu-ustc marked this pull request as draft November 21, 2025 02:49
@jianingyu-ustc jianingyu-ustc marked this pull request as ready for review November 21, 2025 02:50
@jianingyu-ustc jianingyu-ustc changed the title [XPU] [DEEP_EP 2/4] DeepEP normal intranode / internode support xpu [XPU] [DEEP_EP 2/4] DeepEP normal intranode / internode support xpu Nov 21, 2025
@jianingyu-ustc jianingyu-ustc changed the title [XPU] [DEEP_EP 2/4] DeepEP normal intranode / internode support xpu [XPU] [DEEP_EP 2/4] DeepEP normal intranode / internode support xpu Nov 21, 2025
@jianingyu-ustc jianingyu-ustc marked this pull request as draft November 21, 2025 02:59
@jianingyu-ustc jianingyu-ustc marked this pull request as ready for review November 21, 2025 03:00

// Message sizes
EP_HOST_ASSERT(num_scales * sizeof(float) <= hidden);
EP_HOST_ASSERT(num_scales * sizeof(float) <= static_cast<size_t>(hidden));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里直接判断会有问题吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该没有吧...,两边都会转换为size_t

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for library size, since this PR is unrelated to GPU.

Copy link
Contributor

@ZibinGuo ZibinGuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZibinGuo
Copy link
Contributor

deep_ep 功能实现,差单测

@ZibinGuo
Copy link
Contributor

/re-run all-failed

1 similar comment
@jianingyu-ustc
Copy link
Contributor Author

/re-run all-failed

calc_event_ = std::make_shared<XPUEventManager>();
auto* calc_ctx = static_cast<phi::XPUContext*>(
phi::DeviceContextPool::Instance().Get(place));
calc_ctx->CreateStream();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calc_ctx似乎已经默认创建四个流了,这里去给calc_ctx创建一个新的流的目的是什么?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XPU由于历史原因,一直使用默认流,只有手动调用CreateStream()才会创建新流,但GPU每当创建一个CUDAStream, 就会创建一个新流。这里deep_ep为了建立通信流和计算流,且和GPU对齐,故此修改。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void CreateStream() {
if (context_->xpu_stream) {
VLOG(3) << "xpu stream is already created for current context";
return;
}
PADDLE_ENFORCE_XPU_SUCCESS(xpu_stream_create(&context_->xpu_stream));
stream_owned_ = true;
}

如果CDNN_CLUSTER_PARALLEL已经创建了流,调用这个接口无效

if (CommContextManager::device_id != -1) {
std::unique_ptr<phi::XPUContext> dev_ctx(new phi::XPUContext(
phi::XPUPlace(CommContextManager::device_id), true));
dev_ctx->CreateStream();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

涉及通信库Stream的修改,请找 @lj970926 来review下

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改的时候就和lj970926还有XiaociZhang对过

Copy link
Contributor

@dynamicheart dynamicheart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dynamicheart dynamicheart merged commit 15301e0 into PaddlePaddle:develop Nov 24, 2025
121 of 135 checks passed
Copy link
Contributor

@lj970926 lj970926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for comm stream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants