[cudax] CUDA shared memory abstractions proposal #7043

davebayer · 2025-12-22T14:07:53Z

Nowadays, if you want to create an object living in the shared memory, you can do something like:

__global__ void kernel()
{
  __shared__ int my_shared_obj;
  
  if (threadIdx.x == 0)
  {
    my_shared_obj = 0xbad;
  }
  __sync_threads();

  // ...
}

That works fine for trivially constructible/destructible types. If one wants to use a type that has non-trivial constructor/destructor, he must implement the construction/destruction himself. It makes sense, because objects in shared memory are shared by the whole CTA (block) and there is no simple mechanism to select which does the construction/destruction.

Until now.

This PR introduces several APIs to support managing object lifetime of object in shared memory. It introduces the cudax::static_shared<T> type that statically allocates the memory in shared memory and provides methods for constructing, destructing and obtaining the stored object.

__global__ void kernel()
{
  cudax::static_shared<int> my_shared_obj{0xbedder};
  __syncthreads();
}

The construction/destruction can be done either automatically or manually to allow low level control over when and by which thread is the construction/destruction executed. See the example to get better idea of what the API is capable of. Note, that construction/destruction are collective operations, that must be called by all threads in the block.

For manipulating with shared memory objects, this PR implements cudax::shared_memory_ptr, which is a non-owning pointer type, similar to observer_ptr. It could be used in our APIs that expect pointers to shared memory.

This PR is just a prototype. What is not currently implemented are arrays :)

copy-pr-bot · 2025-12-22T14:08:34Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Implement shared memory abstractions

13fb33b

davebayer requested review from a team as code owners December 22, 2025 14:07

davebayer requested review from elstehle and ericniebler December 22, 2025 14:07

github-project-automation bot added this to CCCL Dec 22, 2025

github-project-automation bot moved this to Todo in CCCL Dec 22, 2025

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Dec 22, 2025

davebayer marked this pull request as draft December 22, 2025 14:08

cccl-authenticator-app bot moved this from In Review to In Progress in CCCL Dec 22, 2025

fix missing synchronization

e86de7d

davebayer self-assigned this Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cudax] CUDA shared memory abstractions proposal #7043

[cudax] CUDA shared memory abstractions proposal #7043

davebayer commented Dec 22, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[cudax] CUDA shared memory abstractions proposal #7043

Are you sure you want to change the base?

[cudax] CUDA shared memory abstractions proposal #7043

Conversation

davebayer commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

davebayer commented Dec 22, 2025 •

edited

Loading