Skip to content

Conversation

@davebayer
Copy link
Contributor

@davebayer davebayer commented Dec 26, 2025

Currently, we are mixing several things together. This PR is trying to put things in order and extend RDC support.

Changes:

  1. The _CCCL_HAS_RDC() macro just reflects whether RDC is being generated. This macro will no longer be disabled by defining CCCL_DISABLE_CDP or CUB_DISABLE_CDP.
  2. The _CCCL_HAS_EWP() macro expands to 1 if the -ewp (extensive whole program) option was passed to the compiler.
  3. New _CCCL_HAS_DEVICE_RUNTIME() macro that expands to 1 if device runtime (cuda_device_runtime_api.h) stuff can be used by CCCL. This option requires RDC or EWP and can be disabled by the user by defining CCCL_DISABLE_DEVICE_RUNTIME.
  4. New _CCCL_HAS_CDP() macro that expands to 1 if CUDA dynamic parallelism is enabled. Requires _CCCL_HAS_DEVICE_RUNTIME() to be 1

@davebayer davebayer requested a review from a team as a code owner December 26, 2025 16:00
@davebayer davebayer requested a review from miscco December 26, 2025 16:00
@github-project-automation github-project-automation bot moved this to Todo in CCCL Dec 26, 2025
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Dec 26, 2025
@davebayer
Copy link
Contributor Author

/ok to test 5c0cc34

@davebayer davebayer force-pushed the allow_rdc_with_clang branch from 5c0cc34 to 39fd084 Compare December 26, 2025 16:14
@davebayer davebayer requested a review from a team as a code owner December 26, 2025 16:14
@davebayer davebayer changed the title Enable RDC detection with clang-cuda Enable RDC with clang-cuda Dec 26, 2025
@davebayer davebayer force-pushed the allow_rdc_with_clang branch from 39fd084 to cbeaaf1 Compare December 26, 2025 17:50
@davebayer davebayer requested a review from a team as a code owner December 26, 2025 17:50
@davebayer davebayer changed the title Enable RDC with clang-cuda Enhance RDC detection and add _CCCL_HAS_DEVICE_RUNTIME() macro Dec 26, 2025
@davebayer davebayer changed the title Enhance RDC detection and add _CCCL_HAS_DEVICE_RUNTIME() macro Enhance RDC detection and add _CCCL_HAS_DEVICE_RUNTIME() macro Dec 26, 2025
@davebayer davebayer force-pushed the allow_rdc_with_clang branch from cbeaaf1 to e863934 Compare December 26, 2025 18:01
@github-actions

This comment has been minimized.

@davebayer davebayer force-pushed the allow_rdc_with_clang branch 3 times, most recently from d7696fc to c8019bf Compare January 5, 2026 11:31
@davebayer davebayer force-pushed the allow_rdc_with_clang branch from c8019bf to 3780d10 Compare January 5, 2026 12:03
@davebayer
Copy link
Contributor Author

/ok to test 3780d10

Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but would like some thoughts from @bernhardmgruber and @elstehle

@github-actions

This comment has been minimized.

// Control whether device runtime APIs can be used, because they require libcudadevrt to be linked. Defaults to true
// when RDC or EWP are enabled. Can be disabled by defining CCCL_DISABLE_DEVICE_RUNTIME.
#if (_CCCL_HAS_RDC() || _CCCL_HAS_EWP()) && !defined(CCCL_DISABLE_DEVICE_RUNTIME)
# define _CCCL_HAS_DEVICE_RUNTIME() 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relocatable device code does not strictly require libcudadevrt, but only for CDP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what do you mean. Using CUDA device runtime APIs require either RDC or EWP + link libcudadevrt. This macro returns 1 if device runtime (stuff from <cuda_device_runtime_api.h>) can be used by CCCL, that means if RDC or EWP are enabled. The user has the option to explicitly disable device runtime (in CCCL) by defining CCCL_DISABLE_DEVICE_RUNTIME. CDP calls cudaDeviceLaunch which is one of the device runtime APIs, thus CDP depends on _CCCL_HAS_DEVICE_RUNTIME()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I'm reading it in the wrong way.

  • The user has the option to explicitly disable device runtime (in CCCL) by defining CCCL_DISABLE_DEVICE_RUNTIME.
  • CDP depends on _CCCL_HAS_DEVICE_RUNTIME()

these are perfectly fine.

My issue is about RDC, relocatable device code. If we have RDC, this doesn't automatically translate in having _CCCL_HAS_DEVICE_RUNTIME. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not, because the user needn't to link libcudadevrt

Comment on lines +97 to +116
# if _CCCL_HOST_COMPILATION()
// Conditionally inserts a NVTX range starting here until the end of the current function scope in host code. Does
// nothing in device code.
// The optional is needed to defer the construction of an NVTX range (host-only code) and message string registration
// into a dispatch region running only on the host, while preserving the semantic scope where the range is declared.
# define _CCCL_NVTX_RANGE_SCOPE_IF(condition, name) \
_CCCL_BEFORE_NVTX_RANGE_SCOPE(name) \
::cuda::std::optional<::nvtx3::v1::scoped_range_in<::cuda::detail::NVTXCCCLDomain>> __cuda_nvtx3_range; \
NV_IF_TARGET( \
NV_IS_HOST, \
static const ::nvtx3::v1::registered_string_in<::cuda::detail::NVTXCCCLDomain> __cuda_nvtx3_func_name{name}; \
static const ::nvtx3::v1::event_attributes __cuda_nvtx3_func_attr{__cuda_nvtx3_func_name}; \
if (condition) __cuda_nvtx3_range.emplace(__cuda_nvtx3_func_attr); \
(void) __cuda_nvtx3_range;)
# define _CCCL_NVTX_RANGE_SCOPE_IF(condition, name) \
_CCCL_BEFORE_NVTX_RANGE_SCOPE(name) \
::cuda::std::optional<::nvtx3::v1::scoped_range_in<::cuda::detail::NVTXCCCLDomain>> __cuda_nvtx3_range; \
NV_IF_TARGET( \
NV_IS_HOST, ({ \
static const ::nvtx3::v1::registered_string_in<::cuda::detail::NVTXCCCLDomain> __cuda_nvtx3_func_name{name}; \
static const ::nvtx3::v1::event_attributes __cuda_nvtx3_func_attr{__cuda_nvtx3_func_name}; \
if (condition) \
{ \
__cuda_nvtx3_range.emplace(__cuda_nvtx3_func_attr); \
} \
}))
# else // ^^^ _CCCL_HOST_COMPILATION() ^^^ / vvv !_CCCL_HOST_COMPILATION() vvv
# define _CCCL_NVTX_RANGE_SCOPE_IF(condition, name)
# endif // ^^^ !_CCCL_HOST_COMPILATION() ^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark: this looks a bit weird, but I think it's correct. For nvc++, _CCCL_HOST_COMPILATION is defined for the combined host/device path, so we still need the NV_IF_TARGET inside.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

🥳 CI Workflow Results

🟩 Finished in 8h 56m: Pass: 100%/136 | Total: 6d 08h | Max: 5h 24m | Hits: 68%/271961

See results here.

@davebayer davebayer merged commit 8764a02 into NVIDIA:main Jan 5, 2026
298 of 303 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Jan 5, 2026
@davebayer davebayer self-assigned this Jan 6, 2026
@davebayer davebayer deleted the allow_rdc_with_clang branch January 9, 2026 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants