-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[XPU] [DEEP_EP 2/4] DeepEP normal intranode / internode support xpu #76362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
你的PR提交成功,感谢你对开源项目的贡献! |
bb944ce to
1a46747
Compare
6ff38bb to
21fb80c
Compare
21fb80c to
c8eedd4
Compare
|
|
||
| // Message sizes | ||
| EP_HOST_ASSERT(num_scales * sizeof(float) <= hidden); | ||
| EP_HOST_ASSERT(num_scales * sizeof(float) <= static_cast<size_t>(hidden)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里直接判断会有问题吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该没有吧...,两边都会转换为size_t
ForFishes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
qingqing01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for library size, since this PR is unrelated to GPU.
ZibinGuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
deep_ep 功能实现,差单测 |
|
/re-run all-failed |
1 similar comment
|
/re-run all-failed |
| calc_event_ = std::make_shared<XPUEventManager>(); | ||
| auto* calc_ctx = static_cast<phi::XPUContext*>( | ||
| phi::DeviceContextPool::Instance().Get(place)); | ||
| calc_ctx->CreateStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calc_ctx似乎已经默认创建四个流了,这里去给calc_ctx创建一个新的流的目的是什么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XPU由于历史原因,一直使用默认流,只有手动调用CreateStream()才会创建新流,但GPU每当创建一个CUDAStream, 就会创建一个新流。这里deep_ep为了建立通信流和计算流,且和GPU对齐,故此修改。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paddle/paddle/phi/backends/xpu/xpu_context.cc
Lines 250 to 257 in f9062d5
| void CreateStream() { | |
| if (context_->xpu_stream) { | |
| VLOG(3) << "xpu stream is already created for current context"; | |
| return; | |
| } | |
| PADDLE_ENFORCE_XPU_SUCCESS(xpu_stream_create(&context_->xpu_stream)); | |
| stream_owned_ = true; | |
| } |
如果CDNN_CLUSTER_PARALLEL已经创建了流,调用这个接口无效
| if (CommContextManager::device_id != -1) { | ||
| std::unique_ptr<phi::XPUContext> dev_ctx(new phi::XPUContext( | ||
| phi::XPUPlace(CommContextManager::device_id), true)); | ||
| dev_ctx->CreateStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
涉及通信库Stream的修改,请找 @lj970926 来review下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改的时候就和lj970926还有XiaociZhang对过
dynamicheart
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lj970926
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for comm stream
PR Category
Custom Device
PR Types
New features
Description
[XPU] [DEEP_EP 2/4] DeepEP normal intranode / internode support xpu