Skip to content

cuFFTDx 2d FFT explanation of logical remapping for shared memory based global memory to register memory IO #307

@Coll1ns-cult

Description

@Coll1ns-cult

Hi,

I am trying to understand the thread remapping logic in the shared memory variants of load_strided / store_strided in block_io_generic_strided.hpp of mathdx\25.12\example\cufftdx\05_fft_Xd folder.

My current understanding of the natural thread assignment (used in the non-shared-memory variants and during FFT computation for 1D FFT in introduction_example.cu) is:

  • threadIdx.x → thread's position within its FFT
  • threadIdx.ylocal_fft_id
  • blockIdx.x → block's batch offset

However, in the shared memory variants, the code performs the following remapping before doing the global memory I/O:

const unsigned int tid          = threadIdx.x + FFT::working_group::block_dim().x * threadIdx.y;
const unsigned int tidx         = tid / FFT::working_group::block_dim().y;
const unsigned int tidy         = tid % FFT::working_group::block_dim().y;
unsigned int       smem_index   = tidx + tidy * FFT::working_group::block_dim().x;

I tried to understand it through with AI, however, I yet to succeed with a clear explanation of this logical mapping.
What I still don't fully understand is:

  1. What is the concrete motivation for performing this logical mapping at all? In other words, what problem does it solve compared to just using threadIdx.x / threadIdx.y directly for global memory indexing as in the non-smem variants?
  2. After the transpose, the smem index is computed as tidx + tidy * block_dim().x. Then after __syncthreads(), the read phase uses threadIdx.x + threadIdx.y * block_dim().x with the same formula. Could you walk through concretely what data each thread writes vs. reads, and why those two different identities (tidx/tidy vs threadIdx.x/threadIdx.y) indexing the same smem layout performs the intended transpose?
  3. Is my understanding of blockIdx.y correct — is it unused / always 0 in this kernel launch configuration, or does it carry meaning?

Thanks in advance for your time!

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions