stats: optimize build topn and histogram #63285

Tristan1900 · 2025-08-29T16:16:20Z

What problem does this PR solve?

Issue Number: close #63286

Problem Summary:

What changed and how does it work?

Use heap to maintain the topN to reduce memory shuffle
Track topN index range while iterating samples
When building histogram, skip entries that falls into the topN range so don't need to remove topN values from samples.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

tiprow · 2025-08-29T16:16:38Z

Hi @Tristan1900. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2025-08-29T16:48:16Z

Codecov Report

❌ Patch coverage is 93.54839% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.4747%. Comparing base (6ba9835) to head (3ff2e4d).
⚠️ Report is 319 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #63285        +/-   ##
================================================
+ Coverage   72.7810%   73.4747%   +0.6936%     
================================================
  Files          1835       1870        +35     
  Lines        496885     509348     +12463     
================================================
+ Hits         361638     374242     +12604     
+ Misses       113284     112317       -967     
- Partials      21963      22789       +826

Flag	Coverage Δ
integration	`44.7713% <87.7419%> (?)`
unit	`72.3484% <93.5483%> (+0.0314%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`52.8700% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`46.4913% <ø> (-0.0153%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pkg/statistics/sort_utils.go

Tristan1900 · 2025-08-29T21:13:10Z

some preliminary result
go test -benchmem -run=^$ -bench ^BenchmarkBuildHistAndTopN$ github.com/pingcap/tidb/pkg/statistics -count=10

previous implementation

goos: darwin
goarch: arm64
pkg: github.com/pingcap/tidb/pkg/statistics
cpu: Apple M3
                   │  old.txt   │
                   │   sec/op   │
BuildHistAndTopN-8   1.194 ± 1%

                   │   old.txt    │
                   │     B/op     │
BuildHistAndTopN-8   30.54Mi ± 0%

                   │   old.txt   │
                   │  allocs/op  │
BuildHistAndTopN-8   2.000M ± 0%

After improvement

goos: darwin
goarch: arm64
pkg: github.com/pingcap/tidb/pkg/statistics
cpu: Apple M3
                   │   new.txt   │
                   │   sec/op    │
BuildHistAndTopN-8   63.99m ± 1%

                   │   new.txt    │
                   │     B/op     │
BuildHistAndTopN-8   15.36Mi ± 0%

                   │   new.txt   │
                   │  allocs/op  │
BuildHistAndTopN-8   1.001M ± 0%

Roughly 19x faster and 50% mem saved

Tristan1900 · 2025-09-04T14:44:16Z

run bench again after PR ready, confirmed the performance gain

goos: darwin
goarch: arm64
pkg: github.com/pingcap/tidb/pkg/statistics
cpu: Apple M3
                   │   new.txt   │
                   │   sec/op    │
BuildHistAndTopN-8   62.44m ± 3%

                   │   new.txt    │
                   │     B/op     │
BuildHistAndTopN-8   15.36Mi ± 0%

                   │   new.txt   │
                   │  allocs/op  │
BuildHistAndTopN-8   1.001M ± 0%

0xPoe

Great Work!

Thank you!

pkg/util/generic/bounded_min_heap.go

0xPoe · 2025-09-09T08:35:58Z

pkg/util/generic/bounded_min_heap.go

+// is better than the worst item, it replaces the worst item.
+func (h *BoundedMinHeap[T]) Add(item T) {
+	// handle zero capacity case
+	if h.maxSize <= 0 {


Should we panic early if maxSize is less than or equal to 0?

0xPoe · 2025-09-09T08:39:24Z

pkg/statistics/builder.go


+const (
+	// defaultTopNValue is the default value for numTopN parameter
+	defaultTopNValue = 100


Thank you for extracting them as constants. However, I believe we placed these magic numbers somewhere else. Could you please try to reuse them?

// TopN reduced from 500 to 100 due to concerns over large number of TopN values collected for customers with many tables. // 100 is more inline with other databases. 100-256 is also common for NumBuckets with other databases. var analyzeOptionDefaultV2 = map[ast.AnalyzeOptionType]uint64{ ast.AnalyzeOptNumBuckets: 256, ast.AnalyzeOptNumTopN: 100, ast.AnalyzeOptCMSketchWidth: 2048, ast.AnalyzeOptCMSketchDepth: 5, ast.AnalyzeOptNumSamples: 0, ast.AnalyzeOptSampleRate: math.Float64bits(-1), }

In a separate PR - we should make them variables such that customers can adjust them if needed.

Created an issue to track #63432

0xPoe · 2025-09-09T08:50:12Z

pkg/statistics/builder.go

-	for i := int64(1); i < sampleNum; i++ {
+	processedCount := int64(1) // we've processed the first sample
+
+	// Note: Start from firstSampleIdx+1 because we have already processed the first non-skipped sample.


Suggested change

// Note: Start from firstSampleIdx+1 because we have already processed the first non-skipped sample.

// Start from firstSampleIdx + 1 since the first non-skipped sample has already been processed when the range checker is not null.

0xPoe · 2025-09-09T08:51:45Z

pkg/statistics/builder.go

+	rangeChecker *SequentialRangeChecker, // optional range checker for skipping TopN indices
 ) (corrXYSum float64, err error) {
 	sampleNum := int64(len(samples))
-	// As we use samples to build the histogram, the bucket number and repeat should multiply a factor.


I guess we don't need to delete this comment?

It seems this comment hasn’t been restored.

0xPoe · 2025-09-09T08:52:31Z

pkg/statistics/builder.go

+	// If a numTopn value other than default is passed in, we assume it's a value that the user wants us to honor
 	allowPruning := true
-	if numTopN != 100 {
+	if numTopN != defaultTopNValue {


👍🏼 Love this const, I hate magic numbers!

Tristan1900 · 2025-09-09T17:28:34Z

/retest

tiprow · 2025-09-09T17:28:58Z

@Tristan1900: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Tristan1900 · 2025-09-09T20:59:45Z

/retest

tiprow · 2025-09-09T21:00:11Z

@Tristan1900: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Tristan1900 · 2025-09-10T13:23:02Z

/hold

one test fails for legit reason

Tristan1900 · 2025-09-10T13:38:06Z

/retest

actually a flaky test that happened before this PR

tiprow · 2025-09-10T13:38:30Z

@Tristan1900: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

actually a flaky test that happened before this PR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

terry1purcell · 2025-09-10T13:58:37Z

/ok-to-test

Signed-off-by: Wenqi Mou <[email protected]>

Tristan1900 · 2025-09-10T15:45:11Z

/retest

0xPoe

Thanks!

/hold

For the comment issue. Feel free to unhold it after you address it.

0xPoe · 2025-09-11T08:44:38Z

pkg/statistics/builder.go

+	rangeChecker *SequentialRangeChecker, // optional range checker for skipping TopN indices
 ) (corrXYSum float64, err error) {
 	sampleNum := int64(len(samples))
-	// As we use samples to build the histogram, the bucket number and repeat should multiply a factor.


It seems this comment hasn’t been restored.

ti-chi-bot · 2025-09-11T09:46:13Z

[LGTM Timeline notifier]

Timeline:

2025-09-09 14:02:35.546469291 +0000 UTC m=+370022.106350778: ☑️ agreed by terry1purcell.
2025-09-11 09:46:12.237980283 +0000 UTC m=+527438.797861800: ☑️ agreed by 0xPoe.

Add comment to clarify sample factor calculation.

Tristan1900 · 2025-09-11T16:44:49Z

/unhold

Tristan1900 · 2025-09-11T17:41:21Z

/retest

Tristan1900 · 2025-09-11T20:15:46Z

/retest

time-and-fate · 2025-09-11T20:35:36Z

pkg/util/generic/bounded_min_heap.go

+	sort.Slice(result, func(i, j int) bool {
+		return h.cmpFunc(result[i], result[j]) > 0
+	})


slices.SortFunc is a newer replacement for it. In most cases, we should use the slices package instead of the sort package.

I think using -h.cmpFunc(result[i], result[j]) would be more explicit that the order here is reversed.

time-and-fate · 2025-09-11T20:54:53Z

pkg/util/generic/bounded_min_heap.go

+// Less compares two items in the heap. We use a min-heap for efficient bounded operations.
+func (h *BoundedMinHeap[T]) Less(i, j int) bool {
+	return h.cmpFunc(h.items[i], h.items[j]) < 0
+}
+
+// Swap swaps two items in the heap.
+func (h *BoundedMinHeap[T]) Swap(i, j int) {
+	h.items[i], h.items[j] = h.items[j], h.items[i]
+}
+
+// Push adds an item to the heap.
+func (h *BoundedMinHeap[T]) Push(x any) {
+	h.items = append(h.items, x.(T))
+}
+
+// Pop removes and returns the smallest item from the heap.
+func (h *BoundedMinHeap[T]) Pop() any {
+	old := h.items
+	n := len(old)
+	item := old[n-1]
+	h.items = old[0 : n-1]
+	return item
+}


I think we need to reconsider the API design of this struct.
These methods are not expected to be used externally and should be unexported. I think the heap from the std lib should be an internal data type.

yes that's heap's interface we must comply. https://pkg.go.dev/container/heap
I think that's the standard lib heap, and I looked at other impl in the code base and they all follow the same pattern. Do you mind sharing which std lib you are referring to?

Yes. I understand that you implemented BoundedMinHeap based on the container/heap from the std lib.

I mean the container/heap here should be an internal data type and should not be accessed externally. For example, you don't want other people to do something like NewBoundedMinHeap().Push(val).

What I have in mind is that the implementation should be something like:

type internalHeap[T any] struct { .... } func (h *internalHeap[T]) Len() int {...} func (h *internalHeap[T]) Less(i, j int) bool {...} func (h *internalHeap[T]) Swap(i, j int) {...} func (h *internalHeap[T]) Push(x any) {...} func (h *internalHeap[T]) Pop() any {...} type BoundedMinHeap[T any] struct { data *internalHeap[T] ... } func (h *BoundedMinHeap[T]) Add(item T) {...} func (h *BoundedMinHeap[T]) ToSortedSlice() []T {...}

oh I see the idea. Yeah that's a great suggestion. Let me do that

time-and-fate · 2025-09-11T21:01:50Z

pkg/util/generic/bounded_min_heap.go

+// BoundedMinHeap implements a min-heap for maintaining the best N items efficiently.
+// It keeps the N items with the highest values according to the comparison function.
+// The root of the heap is always the smallest item, making it easy to remove when adding better items.


I think we need to unify the terminology: best/worst or high/low.

time-and-fate · 2025-09-11T21:17:10Z

pkg/statistics/builder.go

+		if a.Count < b.Count {
+			return -1 // min-heap: smaller counts at root
+		} else if a.Count > b.Count {
+			return 1
+		}
+		return 0


cmp.Compare(a.Count, b.Count)

Signed-off-by: Wenqi Mou <[email protected]>

Tristan1900 · 2025-09-12T17:02:09Z

/retest

Tristan1900 · 2025-09-12T17:06:18Z

/hold

Signed-off-by: Wenqi Mou <[email protected]>

ti-chi-bot · 2025-09-12T18:29:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 0xPoe, terry1purcell, time-and-fate

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [0xPoe,terry1purcell,time-and-fate]
~~pkg/planner/OWNERS~~ [0xPoe,terry1purcell,time-and-fate]
~~pkg/statistics/OWNERS~~ [0xPoe,terry1purcell,time-and-fate]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Tristan1900 · 2025-09-12T18:36:26Z

/unhold

Signed-off-by: ti-chi-bot <[email protected]>

ti-chi-bot · 2025-11-19T15:55:32Z

In response to a cherrypick label: new pull request created to branch release-8.4: #64575.
But this PR has conflicts, please resolve them!

Signed-off-by: ti-chi-bot <[email protected]>

ti-chi-bot · 2025-11-19T16:46:03Z

In response to a cherrypick label: new pull request created to branch release-8.5: #64577.
But this PR has conflicts, please resolve them!

ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 29, 2025

ti-chi-bot bot removed do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked labels Aug 29, 2025

hawkingrei reviewed Aug 29, 2025

View reviewed changes

pkg/statistics/sort_utils.go Outdated Show resolved Hide resolved

ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 2, 2025

Tristan1900 force-pushed the stats-topn-calculation branch from 69b3885 to 175670c Compare September 2, 2025 19:56

0xPoe reviewed Sep 9, 2025

View reviewed changes

terry1purcell approved these changes Sep 9, 2025

View reviewed changes

ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 9, 2025

ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 10, 2025

ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Sep 10, 2025

stats: optimize build topn and histogram

09c1a77

Signed-off-by: Wenqi Mou <[email protected]>

0xPoe approved these changes Sep 11, 2025

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 11, 2025

revert comment

21e1015

Add comment to clarify sample factor calculation.

ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 11, 2025

time-and-fate reviewed Sep 11, 2025

View reviewed changes

address comments

4a81e9f

Signed-off-by: Wenqi Mou <[email protected]>

ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2025

address comments

3ff2e4d

Signed-off-by: Wenqi Mou <[email protected]>

time-and-fate approved these changes Sep 12, 2025

View reviewed changes

ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2025

ti-chi-bot bot merged commit b75bdd1 into pingcap:master Sep 12, 2025
29 checks passed

0xPoe added the needs-cherry-pick-release-8.4 Should cherry pick this PR to release-8.4 branch. label Nov 19, 2025

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Nov 19, 2025

This is an automated cherry-pick of pingcap#63285

4bf9f74

Signed-off-by: ti-chi-bot <[email protected]>

ti-chi-bot mentioned this pull request Nov 19, 2025

stats: optimize build topn and histogram (#63285) #64575

Open

13 tasks

fixdb added needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. and removed needs-cherry-pick-release-8.4 Should cherry pick this PR to release-8.4 branch. labels Nov 19, 2025

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Nov 19, 2025

This is an automated cherry-pick of pingcap#63285

1b79291

Signed-off-by: ti-chi-bot <[email protected]>

ti-chi-bot mentioned this pull request Nov 19, 2025

stats: optimize build topn and histogram (#63285) #64577

Open

13 tasks

	// Note: Start from firstSampleIdx+1 because we have already processed the first non-skipped sample.
	// Start from firstSampleIdx + 1 since the first non-skipped sample has already been processed when the range checker is not null.

stats: optimize build topn and histogram #63285

stats: optimize build topn and histogram #63285

Uh oh!

Conversation

Tristan1900 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Uh oh!

tiprow bot commented Aug 29, 2025

Uh oh!

codecov bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Tristan1900 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tristan1900 commented Sep 4, 2025

Uh oh!

0xPoe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0xPoe Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tristan1900 Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tristan1900 commented Sep 9, 2025

Uh oh!

tiprow bot commented Sep 9, 2025

Uh oh!

Tristan1900 commented Sep 9, 2025

Uh oh!

tiprow bot commented Sep 9, 2025

Uh oh!

Tristan1900 commented Sep 10, 2025

Uh oh!

Tristan1900 commented Sep 10, 2025

Uh oh!

tiprow bot commented Sep 10, 2025

Uh oh!

terry1purcell commented Sep 10, 2025

Uh oh!

Tristan1900 commented Sep 10, 2025

Uh oh!

0xPoe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Sep 11, 2025

[LGTM Timeline notifier]

Uh oh!

Tristan1900 commented Sep 11, 2025

Uh oh!

Tristan1900 commented Sep 11, 2025

Uh oh!

Tristan1900 commented Sep 11, 2025

Tristan1900 commented Aug 29, 2025 •

edited

Loading

codecov bot commented Aug 29, 2025 •

edited

Loading

Tristan1900 commented Aug 29, 2025 •

edited

Loading

0xPoe Sep 9, 2025 •

edited

Loading

Tristan1900 Sep 9, 2025 •

edited

Loading

Tristan1900 Sep 12, 2025 •

edited

Loading