Enhance RDC detection and add `_CCCL_HAS_DEVICE_RUNTIME()` macro #7049

davebayer · 2025-12-26T16:00:12Z

Currently, we are mixing several things together. This PR is trying to put things in order and extend RDC support.

Changes:

The _CCCL_HAS_RDC() macro just reflects whether RDC is being generated. This macro will no longer be disabled by defining CCCL_DISABLE_CDP or CUB_DISABLE_CDP.
The _CCCL_HAS_EWP() macro expands to 1 if the -ewp (extensive whole program) option was passed to the compiler.
New _CCCL_HAS_DEVICE_RUNTIME() macro that expands to 1 if device runtime (cuda_device_runtime_api.h) stuff can be used by CCCL. This option requires RDC or EWP and can be disabled by the user by defining CCCL_DISABLE_DEVICE_RUNTIME.
New _CCCL_HAS_CDP() macro that expands to 1 if CUDA dynamic parallelism is enabled. Requires _CCCL_HAS_DEVICE_RUNTIME() to be 1

copy-pr-bot · 2025-12-26T16:00:16Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

davebayer · 2025-12-26T16:03:52Z

/ok to test 5c0cc34

davebayer · 2026-01-05T12:05:09Z

/ok to test 3780d10

miscco

Looks good to me but would like some thoughts from @bernhardmgruber and @elstehle

fbusato · 2026-01-05T18:12:55Z

libcudacxx/include/cuda/std/__cccl/cuda_capabilities.h

+// Control whether device runtime APIs can be used, because they require libcudadevrt to be linked. Defaults to true
+// when RDC or EWP are enabled. Can be disabled by defining CCCL_DISABLE_DEVICE_RUNTIME.
+#if (_CCCL_HAS_RDC() || _CCCL_HAS_EWP()) && !defined(CCCL_DISABLE_DEVICE_RUNTIME)
+#  define _CCCL_HAS_DEVICE_RUNTIME() 1


relocatable device code does not strictly require libcudadevrt, but only for CDP

I don't understand what do you mean. Using CUDA device runtime APIs require either RDC or EWP + link libcudadevrt. This macro returns 1 if device runtime (stuff from <cuda_device_runtime_api.h>) can be used by CCCL, that means if RDC or EWP are enabled. The user has the option to explicitly disable device runtime (in CCCL) by defining CCCL_DISABLE_DEVICE_RUNTIME. CDP calls cudaDeviceLaunch which is one of the device runtime APIs, thus CDP depends on _CCCL_HAS_DEVICE_RUNTIME()

maybe I'm reading it in the wrong way.

The user has the option to explicitly disable device runtime (in CCCL) by defining CCCL_DISABLE_DEVICE_RUNTIME.

CDP depends on _CCCL_HAS_DEVICE_RUNTIME()

these are perfectly fine.

My issue is about RDC, relocatable device code. If we have RDC, this doesn't automatically translate in having _CCCL_HAS_DEVICE_RUNTIME. Am I missing something?

It does not, because the user needn't to link libcudadevrt

bernhardmgruber · 2026-01-05T20:14:48Z

libcudacxx/include/cuda/__nvtx/nvtx.h

+#  if _CCCL_HOST_COMPILATION()
 // Conditionally inserts a NVTX range starting here until the end of the current function scope in host code. Does
 // nothing in device code.
 // The optional is needed to defer the construction of an NVTX range (host-only code) and message string registration
 // into a dispatch region running only on the host, while preserving the semantic scope where the range is declared.
-#  define _CCCL_NVTX_RANGE_SCOPE_IF(condition, name)                                                               \
-    _CCCL_BEFORE_NVTX_RANGE_SCOPE(name)                                                                            \
-    ::cuda::std::optional<::nvtx3::v1::scoped_range_in<::cuda::detail::NVTXCCCLDomain>> __cuda_nvtx3_range;        \
-    NV_IF_TARGET(                                                                                                  \
-      NV_IS_HOST,                                                                                                  \
-      static const ::nvtx3::v1::registered_string_in<::cuda::detail::NVTXCCCLDomain> __cuda_nvtx3_func_name{name}; \
-      static const ::nvtx3::v1::event_attributes __cuda_nvtx3_func_attr{__cuda_nvtx3_func_name};                   \
-      if (condition) __cuda_nvtx3_range.emplace(__cuda_nvtx3_func_attr);                                           \
-      (void) __cuda_nvtx3_range;)
+#    define _CCCL_NVTX_RANGE_SCOPE_IF(condition, name)                                                                 \
+      _CCCL_BEFORE_NVTX_RANGE_SCOPE(name)                                                                              \
+      ::cuda::std::optional<::nvtx3::v1::scoped_range_in<::cuda::detail::NVTXCCCLDomain>> __cuda_nvtx3_range;          \
+      NV_IF_TARGET(                                                                                                    \
+        NV_IS_HOST, ({                                                                                                 \
+          static const ::nvtx3::v1::registered_string_in<::cuda::detail::NVTXCCCLDomain> __cuda_nvtx3_func_name{name}; \
+          static const ::nvtx3::v1::event_attributes __cuda_nvtx3_func_attr{__cuda_nvtx3_func_name};                   \
+          if (condition)                                                                                               \
+          {                                                                                                            \
+            __cuda_nvtx3_range.emplace(__cuda_nvtx3_func_attr);                                                        \
+          }                                                                                                            \
+        }))
+#  else // ^^^ _CCCL_HOST_COMPILATION() ^^^ / vvv !_CCCL_HOST_COMPILATION() vvv
+#    define _CCCL_NVTX_RANGE_SCOPE_IF(condition, name)
+#  endif // ^^^ !_CCCL_HOST_COMPILATION() ^^^


Remark: this looks a bit weird, but I think it's correct. For nvc++, _CCCL_HOST_COMPILATION is defined for the combined host/device path, so we still need the NV_IF_TARGET inside.

github-actions · 2026-01-05T21:06:36Z

🥳 CI Workflow Results

🟩 Finished in 8h 56m: Pass: 100%/136 | Total: 6d 08h | Max: 5h 24m | Hits: 68%/271961

See results here.

davebayer requested a review from a team as a code owner December 26, 2025 16:00

davebayer requested a review from miscco December 26, 2025 16:00

github-project-automation bot added this to CCCL Dec 26, 2025

github-project-automation bot moved this to Todo in CCCL Dec 26, 2025

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Dec 26, 2025

davebayer force-pushed the allow_rdc_with_clang branch from 5c0cc34 to 39fd084 Compare December 26, 2025 16:14

davebayer requested a review from a team as a code owner December 26, 2025 16:14

davebayer changed the title ~~Enable RDC detection with clang-cuda~~ Enable RDC with clang-cuda Dec 26, 2025

davebayer force-pushed the allow_rdc_with_clang branch from 39fd084 to cbeaaf1 Compare December 26, 2025 17:50

davebayer requested a review from a team as a code owner December 26, 2025 17:50

davebayer changed the title ~~Enable RDC with clang-cuda~~ Enhance RDC detection and add _CCCL_HAS_DEVICE_RUNTIME() macro Dec 26, 2025

davebayer changed the title ~~Enhance RDC detection and add _CCCL_HAS_DEVICE_RUNTIME() macro~~ Enhance RDC detection and add _CCCL_HAS_DEVICE_RUNTIME() macro Dec 26, 2025

davebayer force-pushed the allow_rdc_with_clang branch from cbeaaf1 to e863934 Compare December 26, 2025 18:01

This comment has been minimized.

Sign in to view

davebayer force-pushed the allow_rdc_with_clang branch 3 times, most recently from d7696fc to c8019bf Compare January 5, 2026 11:31

Enhance RDC detection and add _CCCL_HAS_DEVICE_RUNTIME() macro

3780d10

davebayer force-pushed the allow_rdc_with_clang branch from c8019bf to 3780d10 Compare January 5, 2026 12:03

miscco reviewed Jan 5, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

fbusato reviewed Jan 5, 2026

View reviewed changes

bernhardmgruber reviewed Jan 5, 2026

View reviewed changes

bernhardmgruber approved these changes Jan 5, 2026

View reviewed changes

fbusato approved these changes Jan 5, 2026

View reviewed changes

davebayer merged commit 8764a02 into NVIDIA:main Jan 5, 2026
298 of 303 checks passed

github-project-automation bot moved this from In Review to Done in CCCL Jan 5, 2026

davebayer self-assigned this Jan 6, 2026

davebayer deleted the allow_rdc_with_clang branch January 9, 2026 07:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance RDC detection and add `_CCCL_HAS_DEVICE_RUNTIME()` macro #7049

Enhance RDC detection and add `_CCCL_HAS_DEVICE_RUNTIME()` macro #7049

davebayer commented Dec 26, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Dec 26, 2025

Uh oh!

davebayer commented Dec 26, 2025

Uh oh!

This comment has been minimized.

davebayer commented Jan 5, 2026

Uh oh!

miscco left a comment

Uh oh!

This comment has been minimized.

fbusato Jan 5, 2026

Uh oh!

davebayer Jan 5, 2026

Uh oh!

fbusato Jan 5, 2026

Uh oh!

davebayer Jan 5, 2026

Uh oh!

bernhardmgruber Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Enhance RDC detection and add _CCCL_HAS_DEVICE_RUNTIME() macro #7049

Enhance RDC detection and add _CCCL_HAS_DEVICE_RUNTIME() macro #7049

Conversation

davebayer commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Dec 26, 2025

Uh oh!

davebayer commented Dec 26, 2025

Uh oh!

This comment has been minimized.

davebayer commented Jan 5, 2026

Uh oh!

miscco left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

fbusato Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

davebayer Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

fbusato Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

davebayer Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 5, 2026

🥳 CI Workflow Results

🟩 Finished in 8h 56m: Pass: 100%/136 | Total: 6d 08h | Max: 5h 24m | Hits: 68%/271961

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Enhance RDC detection and add `_CCCL_HAS_DEVICE_RUNTIME()` macro #7049

Enhance RDC detection and add `_CCCL_HAS_DEVICE_RUNTIME()` macro #7049

davebayer commented Dec 26, 2025 •

edited

Loading