-
Notifications
You must be signed in to change notification settings - Fork 269
Add generate_identity_sequences helper and replace lambdas with named functors #3628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Add generate_identity_sequences helper and replace lambdas with named functors #3628
Conversation
This adds an optimized helper for the common generate_tuple pattern:
generate_tuple([](auto i) { return Sequence<i.value>{}; }, N)
The new generate_identity_sequences<N>() function creates
Tuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>> without
requiring lambda instantiation at each call site.
Updated 21 call sites across threadwise_tensor_slice_transfer,
wrapper utilities, and layout files to use the new helper.
Build time improvement: ~1.1% wall-clock (18.3s -> 18.1s)
Lambda expressions in transform_tensor_descriptor created unique template instantiations for each capture combination. This change replaces lambdas with named functor structs to reduce instantiation count: - Add merge_sequences_functor and unpack_and_merge_sequences helper - Add convert_visible_to_hidden_id and convert_visible_ids_to_hidden_ids - Add generate_arithmetic_sequence_from_scan Build analysis shows instantiation count dropped from 388 to 32 (92% reduction).
Detailed comments explain: - generate_identity_sequences: Replaces 21 lambda-based call sites - merge_sequences_functor and unpack_and_merge_sequences: Named functors vs lambdas - convert_visible_to_hidden_id and related functors: Eliminate nested lambda instantiations - Why named functors significantly reduce template instantiation count - Which specific lambda patterns each functor replaces This documentation helps maintainers understand how named functors reduce build-time overhead in tensor_descriptor operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR aims to reduce C++ template instantiations (and improve build times) by introducing reusable helpers for common sequence/tuple metaprogramming patterns and by replacing per-call-site lambdas with named functors.
Changes:
- Added
generate_identity_sequences<N>()helper to generateTuple<Sequence<0>, ..., Sequence<N-1>>without lambdas. - Added named sequence utilities (
merge_sequences_functor,unpack_and_merge_sequences) and replaced lambdas intransform_tensor_descriptor/TensorDescriptorlogic. - Updated multiple call sites to use the new helper(s) and added unit tests.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/util/unit_sequence_helper.cpp | Adds unit tests for generate_identity_sequences and unpack_and_merge_sequences. |
| test/util/CMakeLists.txt | Adds a new gtest executable target for the new unit tests. |
| include/ck/wrapper/utils/tensor_partition.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/utils/layout_utils.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/tensor.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/operations/gemm.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/layout.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/utility/tuple_helper.hpp | Introduces generate_identity_sequences helper implementation. |
| include/ck/utility/sequence_helper.hpp | Introduces named functors and unpack_and_merge_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3_scatter.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r2.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r2.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_gather.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_dequant.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/device/matrix_padder.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_description/tensor_descriptor.hpp | Replaces lambdas with named functors and uses unpack_and_merge_sequences. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto input = make_tuple(Sequence<10, 20, 30>{}); | ||
| auto result = unpack_and_merge_sequences{}(input); | ||
| auto expected = Sequence<10, 20, 30>{}; |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unpack_and_merge_sequences is a function template, so unpack_and_merge_sequences{}(input) won’t compile here either. Call unpack_and_merge_sequences(input) instead.
| auto input = make_tuple(Sequence<100>{}, Sequence<200, 300>{}); | ||
| auto result = unpack_and_merge_sequences{}(input); | ||
| auto expected = Sequence<100, 200, 300>{}; |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue: unpack_and_merge_sequences is a function template, so unpack_and_merge_sequences{}(input) won’t compile. Call unpack_and_merge_sequences(input) instead.
| template <typename TupleOfSequences> | ||
| __host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences) | ||
| { | ||
| return unpack(merge_sequences_functor{}, TupleOfSequences{}); | ||
| } |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unpack_and_merge_sequences calls unpack(...), but this header only includes ck/utility/tuple.hpp and does not include the header that defines unpack (ck/utility/functional4.hpp). This breaks includes that pull in sequence_helper.hpp before functional4.hpp (e.g. include/ck/utility/container_helper.hpp includes sequence_helper.hpp before tuple_helper.hpp). Add the proper include here (or otherwise ensure unpack is declared).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to be careful to explicitly include the templates we use and not rely on transitive inclusion.
| __host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences) | ||
| { | ||
| return unpack(merge_sequences_functor{}, TupleOfSequences{}); |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unpack_and_merge_sequences ignores its argument and always default-constructs TupleOfSequences{}. That’s surprising and will fail for non-default-constructible tuple-like types. Use the passed tuple value (e.g., forward it into unpack) so the helper behaves like its name suggests.
| __host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences) | |
| { | |
| return unpack(merge_sequences_functor{}, TupleOfSequences{}); | |
| __host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences tuple_of_sequences) | |
| { | |
| return unpack(merge_sequences_functor{}, tuple_of_sequences); |
| EXPECT_TRUE((is_same<decltype(result.At(Number<0>{})), const Sequence<0>>::value)); | ||
| EXPECT_TRUE((is_same<decltype(result.At(Number<1>{})), const Sequence<1>>::value)); | ||
| EXPECT_TRUE((is_same<decltype(result.At(Number<2>{})), const Sequence<2>>::value)); |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple::At(Number<I>) returns a reference; for a constexpr auto result this is typically const Sequence<I>&. These assertions currently compare against const Sequence<I> (non-reference), so they will fail. Update the expected types (or strip references in the comparison).
| EXPECT_TRUE((is_same<decltype(result.At(Number<0>{})), const Sequence<0>>::value)); | |
| EXPECT_TRUE((is_same<decltype(result.At(Number<1>{})), const Sequence<1>>::value)); | |
| EXPECT_TRUE((is_same<decltype(result.At(Number<2>{})), const Sequence<2>>::value)); | |
| EXPECT_TRUE((is_same<decltype(result.At(Number<0>{})), const Sequence<0>&>::value)); | |
| EXPECT_TRUE((is_same<decltype(result.At(Number<1>{})), const Sequence<1>&>::value)); | |
| EXPECT_TRUE((is_same<decltype(result.At(Number<2>{})), const Sequence<2>&>::value)); |
| auto input = make_tuple(Sequence<1, 2>{}, Sequence<3, 4>{}, Sequence<5, 6>{}); | ||
| auto result = unpack_and_merge_sequences{}(input); | ||
| auto expected = Sequence<1, 2, 3, 4, 5, 6>{}; |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unpack_and_merge_sequences is defined as a function template, not a callable object type. unpack_and_merge_sequences{}(input) will not compile; call it as unpack_and_merge_sequences(input) (or change the helper to a functor if that was the intent).
|
|
||
| // convert_visible_ids_to_hidden_ids - maps sequence of visible IDs to hidden IDs | ||
| // | ||
| // Replaces: [&](auto low_dim_visible_ids) { return transform_sequences(convert_fn, ids); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment about the lambda it replaces is obvious and not helpful going forward (since the lambda isn't around anymore).
|
|
||
| // generate_arithmetic_sequence_from_scan - generates arithmetic sequences for upper dimensions | ||
| // | ||
| // Replaces lambda: [&](auto i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly: this "Replaces ..." comment is an artifact of the refactoring and should be removed.
| template <typename TupleOfSequences> | ||
| __host__ __device__ constexpr auto unpack_and_merge_sequences(TupleOfSequences) | ||
| { | ||
| return unpack(merge_sequences_functor{}, TupleOfSequences{}); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to be careful to explicitly include the templates we use and not rely on transitive inclusion.
| // - Pack expansion: make_tuple(Sequence<Is>{}...) creates all sequences at once | ||
| // - No lambda closures or unique types per call site | ||
| // | ||
| // Impact: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an artifact of the refactoring and shouldn't be included as a file comment. (It's very useful for the PR description, but it will get confusing if it's left in the code base.)
Summary
generate_identity_sequences<N>()helper that returnsTuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>transform_tensor_descriptorunpack_and_merge_sequenceshelper functortransform_tensor_descriptorinstantiations from 388 to 32 (92% reduction)Motivation
Multiple call sites use
generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})pattern. A named helper reduces lambda instantiations.Additionally, each lambda in
transform_tensor_descriptorcreates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.Changes
Part 1: generate_identity_sequences helper
Part 2: Named functors in transform_tensor_descriptor
unpack_and_merge_sequenceshelper to replace lambda inGetNumOfHiddenDimensiongenerate_identity_sequencesinmatrix_padder.hppTest Plan
generate_identity_sequencesunpack_and_merge_sequencesRelated PRs
This PR merges the functionality from:
Part of PR stack for issue #3575 (Reduce CK/CKTile Build Times)
Note: This PR supersedes #3588 and #3589, which can be closed once this is merged.