[cudax] CUDA shared memory abstractions proposal #7043
Draft
+1,184
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Nowadays, if you want to create an object living in the shared memory, you can do something like:
That works fine for trivially constructible/destructible types. If one wants to use a type that has non-trivial constructor/destructor, he must implement the construction/destruction himself. It makes sense, because objects in shared memory are shared by the whole CTA (block) and there is no simple mechanism to select which does the construction/destruction.
Until now.
This PR introduces several APIs to support managing object lifetime of object in shared memory. It introduces the
cudax::static_shared<T>type that statically allocates the memory in shared memory and provides methods for constructing, destructing and obtaining the stored object.The construction/destruction can be done either automatically or manually to allow low level control over when and by which thread is the construction/destruction executed. See the example to get better idea of what the API is capable of. Note, that construction/destruction are collective operations, that must be called by all threads in the block.
For manipulating with shared memory objects, this PR implements
cudax::shared_memory_ptr, which is a non-owning pointer type, similar to observer_ptr. It could be used in our APIs that expect pointers to shared memory.This PR is just a prototype. What is not currently implemented are arrays :)