cuFFTDx 2d FFT explanation of logical remapping for shared memory based global memory to register memory IO

Hi,

I am trying to understand the thread remapping logic in the shared memory variants of `load_strided` / `store_strided` in `block_io_generic_strided.hpp` of mathdx\25.12\example\cufftdx\05_fft_Xd folder.

My current understanding of the natural thread assignment (used in the non-shared-memory variants and during FFT computation for 1D FFT in introduction_example.cu) is:
- `threadIdx.x` → thread's position within its FFT
- `threadIdx.y` → `local_fft_id` 
- `blockIdx.x` → block's batch offset

However, in the shared memory variants, the code performs the following remapping before doing the global memory I/O:

```cpp
const unsigned int tid          = threadIdx.x + FFT::working_group::block_dim().x * threadIdx.y;
const unsigned int tidx         = tid / FFT::working_group::block_dim().y;
const unsigned int tidy         = tid % FFT::working_group::block_dim().y;
unsigned int       smem_index   = tidx + tidy * FFT::working_group::block_dim().x;
```
I tried to understand it through with AI, however, I yet to succeed with a clear explanation of this logical mapping. 
What I still don't fully understand is:
1. What is the concrete motivation for performing this logical mapping at all? In other words, what problem does it solve compared to just using `threadIdx.x / threadIdx.y` directly for global memory indexing as in the non-smem variants?
2. After the transpose, the smem index is computed as `tidx + tidy * block_dim().x`. Then after `__syncthreads()`, the read phase uses `threadIdx.x + threadIdx.y * block_dim().x` with the same formula. Could you walk through concretely what data each thread writes vs. reads, and why those two different identities (tidx/tidy vs threadIdx.x/threadIdx.y) indexing the same smem layout performs the intended transpose?
3. Is my understanding of `blockIdx.y` correct — is it unused / always 0 in this kernel launch configuration, or does it carry meaning?

Thanks in advance for your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuFFTDx 2d FFT explanation of logical remapping for shared memory based global memory to register memory IO #307

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cuFFTDx 2d FFT explanation of logical remapping for shared memory based global memory to register memory IO #307

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions