Replace O(N) recursive sequence_map_inverse with O(1) pack expansion #3596
+62
−20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Replace the O(N) recursive
sequence_map_inverseimplementation with O(1) template depth using pack expansion.Approach
constexprloop infind_source_indexto locate permutation inverse indicesWhy It Works
Template recursion requires N template instantiations for N iterations, each with its own overhead. Constexpr loops execute within a single template instantiation, avoiding per-instantiation overhead.
Build Performance Impact
Template Instantiation Reduction (measured on
device_grouped_conv3d_fwd_bias_bnorm_clamp_instancetarget, 248 files):This confirms the optimization successfully reduces template instantiation overhead by eliminating recursive template patterns in favor of pack expansion.
Test Plan
SequenceMapInverse.InverseMapandSequenceMapInverse.InverseIdentityMaptests validate correctnessNotes
sequence_mergeoptimization removed from this PR (handled in Optimize sequence_gen and uniform_sequence_gen to reduce template instantiation depth #3585)is_valid_sequence_mapbefore callingsequence_map_inverse