-
Notifications
You must be signed in to change notification settings - Fork 318
Enhance RDC detection and add _CCCL_HAS_DEVICE_RUNTIME() macro
#7049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ok to test 5c0cc34 |
5c0cc34 to
39fd084
Compare
39fd084 to
cbeaaf1
Compare
_CCCL_HAS_DEVICE_RUNTIME() macro
cbeaaf1 to
e863934
Compare
This comment has been minimized.
This comment has been minimized.
d7696fc to
c8019bf
Compare
c8019bf to
3780d10
Compare
|
/ok to test 3780d10 |
miscco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me but would like some thoughts from @bernhardmgruber and @elstehle
This comment has been minimized.
This comment has been minimized.
| // Control whether device runtime APIs can be used, because they require libcudadevrt to be linked. Defaults to true | ||
| // when RDC or EWP are enabled. Can be disabled by defining CCCL_DISABLE_DEVICE_RUNTIME. | ||
| #if (_CCCL_HAS_RDC() || _CCCL_HAS_EWP()) && !defined(CCCL_DISABLE_DEVICE_RUNTIME) | ||
| # define _CCCL_HAS_DEVICE_RUNTIME() 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relocatable device code does not strictly require libcudadevrt, but only for CDP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what do you mean. Using CUDA device runtime APIs require either RDC or EWP + link libcudadevrt. This macro returns 1 if device runtime (stuff from <cuda_device_runtime_api.h>) can be used by CCCL, that means if RDC or EWP are enabled. The user has the option to explicitly disable device runtime (in CCCL) by defining CCCL_DISABLE_DEVICE_RUNTIME. CDP calls cudaDeviceLaunch which is one of the device runtime APIs, thus CDP depends on _CCCL_HAS_DEVICE_RUNTIME()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'm reading it in the wrong way.
- The user has the option to explicitly disable device runtime (in CCCL) by defining CCCL_DISABLE_DEVICE_RUNTIME.
- CDP depends on _CCCL_HAS_DEVICE_RUNTIME()
these are perfectly fine.
My issue is about RDC, relocatable device code. If we have RDC, this doesn't automatically translate in having _CCCL_HAS_DEVICE_RUNTIME. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not, because the user needn't to link libcudadevrt
| # if _CCCL_HOST_COMPILATION() | ||
| // Conditionally inserts a NVTX range starting here until the end of the current function scope in host code. Does | ||
| // nothing in device code. | ||
| // The optional is needed to defer the construction of an NVTX range (host-only code) and message string registration | ||
| // into a dispatch region running only on the host, while preserving the semantic scope where the range is declared. | ||
| # define _CCCL_NVTX_RANGE_SCOPE_IF(condition, name) \ | ||
| _CCCL_BEFORE_NVTX_RANGE_SCOPE(name) \ | ||
| ::cuda::std::optional<::nvtx3::v1::scoped_range_in<::cuda::detail::NVTXCCCLDomain>> __cuda_nvtx3_range; \ | ||
| NV_IF_TARGET( \ | ||
| NV_IS_HOST, \ | ||
| static const ::nvtx3::v1::registered_string_in<::cuda::detail::NVTXCCCLDomain> __cuda_nvtx3_func_name{name}; \ | ||
| static const ::nvtx3::v1::event_attributes __cuda_nvtx3_func_attr{__cuda_nvtx3_func_name}; \ | ||
| if (condition) __cuda_nvtx3_range.emplace(__cuda_nvtx3_func_attr); \ | ||
| (void) __cuda_nvtx3_range;) | ||
| # define _CCCL_NVTX_RANGE_SCOPE_IF(condition, name) \ | ||
| _CCCL_BEFORE_NVTX_RANGE_SCOPE(name) \ | ||
| ::cuda::std::optional<::nvtx3::v1::scoped_range_in<::cuda::detail::NVTXCCCLDomain>> __cuda_nvtx3_range; \ | ||
| NV_IF_TARGET( \ | ||
| NV_IS_HOST, ({ \ | ||
| static const ::nvtx3::v1::registered_string_in<::cuda::detail::NVTXCCCLDomain> __cuda_nvtx3_func_name{name}; \ | ||
| static const ::nvtx3::v1::event_attributes __cuda_nvtx3_func_attr{__cuda_nvtx3_func_name}; \ | ||
| if (condition) \ | ||
| { \ | ||
| __cuda_nvtx3_range.emplace(__cuda_nvtx3_func_attr); \ | ||
| } \ | ||
| })) | ||
| # else // ^^^ _CCCL_HOST_COMPILATION() ^^^ / vvv !_CCCL_HOST_COMPILATION() vvv | ||
| # define _CCCL_NVTX_RANGE_SCOPE_IF(condition, name) | ||
| # endif // ^^^ !_CCCL_HOST_COMPILATION() ^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark: this looks a bit weird, but I think it's correct. For nvc++, _CCCL_HOST_COMPILATION is defined for the combined host/device path, so we still need the NV_IF_TARGET inside.
🥳 CI Workflow Results🟩 Finished in 8h 56m: Pass: 100%/136 | Total: 6d 08h | Max: 5h 24m | Hits: 68%/271961See results here. |
Currently, we are mixing several things together. This PR is trying to put things in order and extend RDC support.
Changes:
_CCCL_HAS_RDC()macro just reflects whether RDC is being generated. This macro will no longer be disabled by definingCCCL_DISABLE_CDPorCUB_DISABLE_CDP._CCCL_HAS_EWP()macro expands to1if the-ewp(extensive whole program) option was passed to the compiler._CCCL_HAS_DEVICE_RUNTIME()macro that expands to1if device runtime (cuda_device_runtime_api.h) stuff can be used by CCCL. This option requires RDC or EWP and can be disabled by the user by definingCCCL_DISABLE_DEVICE_RUNTIME._CCCL_HAS_CDP()macro that expands to1if CUDA dynamic parallelism is enabled. Requires_CCCL_HAS_DEVICE_RUNTIME()to be1