Skip to content

Conversation

@niklebedenko
Copy link
Contributor

In C++ CUDA, there's #pragma unroll, which allows you to force a loop to be unrolled. Rust does not have such an equivalent, but LLVM will decide whether to unroll your loop based on heuristics. If you set the LLVM option -unroll-threshold to a large number, you can make LLVM be more aggressive in its unrolling of loops.

This gave my Rust code a 10x speed improvement for some kernels, due to being able to index arrays at compile time, removing the need for local memory, stack frames, and function calls.

N.B. there is the unroll crate but it only supports unrolling loops with integer bounds. The LLVM approach allows unrolling loops over iterators also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant