Skip to content

Conversation

@tenpercent
Copy link
Contributor

@tenpercent tenpercent commented Jan 16, 2026

Summary

  • Replace lambdas with named functors in transform_tensor_descriptor
  • Reduces transform_tensor_descriptor instantiations from 388 to 32 (92% reduction)

Changes

  • Add unpack_and_merge_sequences helper to replace lambda in GetNumOfHiddenDimension
  • Use generate_identity_sequences in matrix_padder.hpp

Why It Works

Each lambda creates a unique closure type, causing transform_tensor_descriptor to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.

Test Plan

  • Waiting for full CI

PR Stack

# PR Description
1 #3585 sequence_gen with __make_integer_seq
2 #3588 generate_identity_sequences helper
3 #3589 Named functors in transform_tensor_descriptor
4 #3590 container_concat optimization
5 #3596 O(1) pack expansion rewrites
6 #3600 TensorDescriptor/TensorAdaptor lambda elimination

Tracking issue: #3575

@tenpercent tenpercent force-pushed the mpodkory/transform-tensor-descriptor-optimization branch 3 times, most recently from 748497a to 885b80f Compare January 16, 2026 06:31
@tenpercent tenpercent marked this pull request as draft January 16, 2026 16:31
@tenpercent tenpercent force-pushed the mpodkory/transform-tensor-descriptor-optimization branch from 885b80f to 0791bad Compare January 16, 2026 20:16
@tenpercent tenpercent force-pushed the tenpercent/generate-identity-sequences branch from ef35913 to 7c37209 Compare January 16, 2026 20:16
@tenpercent tenpercent force-pushed the mpodkory/transform-tensor-descriptor-optimization branch from 0791bad to b26ed88 Compare January 17, 2026 03:37
@tenpercent tenpercent marked this pull request as ready for review January 17, 2026 03:41
Lambda expressions in transform_tensor_descriptor created unique template
instantiations for each capture combination. This change replaces lambdas
with named functor structs to reduce instantiation count:

- Add merge_sequences_functor and unpack_and_merge_sequences helper
- Add convert_visible_to_hidden_id and convert_visible_ids_to_hidden_ids
- Add generate_arithmetic_sequence_from_scan

Build analysis shows instantiation count dropped from 388 to 32 (92% reduction).
@tenpercent
Copy link
Contributor Author

Closing this PR as it has been merged with #3588 into the new PR #3628.

The combined PR includes all functionality from both PRs plus unit tests, and targets develop directly instead of being part of a stacked PR chain.

@tenpercent tenpercent closed this Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants