Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 19 additions & 5 deletions include/PolarGrid/polargrid.inl
Original file line number Diff line number Diff line change
Expand Up @@ -82,10 +82,24 @@ KOKKOS_INLINE_FUNCTION int PolarGrid::wrapThetaIndex(const int unwrapped_theta_i
// This effectively computes unwrapped_theta_index % ntheta(), because it discards all higher bits.
//
// If ntheta is not a power of two, we use the standard modulo approach to handle wrapping.
int theta_index = is_ntheta_PowerOfTwo_ ? unwrapped_theta_index & (ntheta() - 1)
: (unwrapped_theta_index % ntheta() + ntheta()) % ntheta();
assert(0 <= theta_index && theta_index < ntheta());
return theta_index;
if (is_ntheta_PowerOfTwo_) {
const int theta_index = unwrapped_theta_index & (ntheta() - 1);
assert(0 <= theta_index && theta_index < ntheta());
return theta_index;
}
else {
// For non-power-of-two ntheta, we use a simple iterative approach to wrap the index.
// This is efficient for small deviations from the valid range, which is common in practice.
int theta_index = unwrapped_theta_index;
while (theta_index >= ntheta()) {
theta_index -= ntheta();
}
while (theta_index < 0) {
theta_index += ntheta();
}
assert(0 <= theta_index && theta_index < ntheta());
return theta_index;
Comment on lines +94 to +101
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that this option will still optimise performance on GPU? I agree that 1 -= is usually less costly than % but it introduces more branching 🤔

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you measure how much difference this PR makes? It would be interesting to compare the results with OpenMP and GPU

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably none as the bottleneck is in the double computations. Aaaand I dont have a ntheta != 2^k test case

}
}

KOKKOS_INLINE_FUNCTION int PolarGrid::index(const int r_index, const int unwrapped_theta_index) const
Expand All @@ -106,7 +120,7 @@ KOKKOS_INLINE_FUNCTION void PolarGrid::multiIndex(const int node_index, int& r_i
assert(0 <= node_index && node_index < numberOfNodes());
if (node_index < numberCircularSmootherNodes()) {
r_index = node_index / ntheta();
theta_index = wrapThetaIndex(node_index);
theta_index = node_index % ntheta();
}
else {
theta_index = (node_index - numberCircularSmootherNodes()) / lengthRadialSmoother();
Expand Down
Loading