Skip to content

Conversation

@Tristan1900
Copy link
Contributor

@Tristan1900 Tristan1900 commented Aug 29, 2025

What problem does this PR solve?

Issue Number: close #63286

Problem Summary:

What changed and how does it work?

  1. Use heap to maintain the topN to reduce memory shuffle
  2. Track topN index range while iterating samples
  3. When building histogram, skip entries that falls into the topN range so don't need to remove topN values from samples.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 29, 2025
@tiprow
Copy link

tiprow bot commented Aug 29, 2025

Hi @Tristan1900. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

❌ Patch coverage is 93.54839% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.4747%. Comparing base (6ba9835) to head (3ff2e4d).
⚠️ Report is 319 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #63285        +/-   ##
================================================
+ Coverage   72.7810%   73.4747%   +0.6936%     
================================================
  Files          1835       1870        +35     
  Lines        496885     509348     +12463     
================================================
+ Hits         361638     374242     +12604     
+ Misses       113284     112317       -967     
- Partials      21963      22789       +826     
Flag Coverage Δ
integration 44.7713% <87.7419%> (?)
unit 72.3484% <93.5483%> (+0.0314%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.8700% <ø> (ø)
parser ∅ <ø> (∅)
br 46.4913% <ø> (-0.0153%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Tristan1900
Copy link
Contributor Author

Tristan1900 commented Aug 29, 2025

some preliminary result
go test -benchmem -run=^$ -bench ^BenchmarkBuildHistAndTopN$ github.com/pingcap/tidb/pkg/statistics -count=10

previous implementation

goos: darwin
goarch: arm64
pkg: github.com/pingcap/tidb/pkg/statistics
cpu: Apple M3
                   │  old.txt   │
                   │   sec/op   │
BuildHistAndTopN-8   1.194 ± 1%

                   │   old.txt    │
                   │     B/op     │
BuildHistAndTopN-8   30.54Mi ± 0%

                   │   old.txt   │
                   │  allocs/op  │
BuildHistAndTopN-8   2.000M ± 0%

After improvement

goos: darwin
goarch: arm64
pkg: github.com/pingcap/tidb/pkg/statistics
cpu: Apple M3
                   │   new.txt   │
                   │   sec/op    │
BuildHistAndTopN-8   63.99m ± 1%

                   │   new.txt    │
                   │     B/op     │
BuildHistAndTopN-8   15.36Mi ± 0%

                   │   new.txt   │
                   │  allocs/op  │
BuildHistAndTopN-8   1.001M ± 0%

Roughly 19x faster and 50% mem saved

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 2, 2025
@Tristan1900 Tristan1900 force-pushed the stats-topn-calculation branch from 69b3885 to 175670c Compare September 2, 2025 19:56
@Tristan1900
Copy link
Contributor Author

run bench again after PR ready, confirmed the performance gain

goos: darwin
goarch: arm64
pkg: github.com/pingcap/tidb/pkg/statistics
cpu: Apple M3
                   │   new.txt   │
                   │   sec/op    │
BuildHistAndTopN-8   62.44m ± 3%

                   │   new.txt    │
                   │     B/op     │
BuildHistAndTopN-8   15.36Mi ± 0%

                   │   new.txt   │
                   │  allocs/op  │
BuildHistAndTopN-8   1.001M ± 0%

Copy link
Member

@0xPoe 0xPoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Work!

Thank you!

// is better than the worst item, it replaces the worst item.
func (h *BoundedMinHeap[T]) Add(item T) {
// handle zero capacity case
if h.maxSize <= 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we panic early if maxSize is less than or equal to 0?


const (
// defaultTopNValue is the default value for numTopN parameter
defaultTopNValue = 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for extracting them as constants. However, I believe we placed these magic numbers somewhere else. Could you please try to reuse them?

Copy link
Member

@0xPoe 0xPoe Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TopN reduced from 500 to 100 due to concerns over large number of TopN values collected for customers with many tables.
// 100 is more inline with other databases. 100-256 is also common for NumBuckets with other databases.
var analyzeOptionDefaultV2 = map[ast.AnalyzeOptionType]uint64{
	ast.AnalyzeOptNumBuckets:    256,
	ast.AnalyzeOptNumTopN:       100,
	ast.AnalyzeOptCMSketchWidth: 2048,
	ast.AnalyzeOptCMSketchDepth: 5,
	ast.AnalyzeOptNumSamples:    0,
	ast.AnalyzeOptSampleRate:    math.Float64bits(-1),
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a separate PR - we should make them variables such that customers can adjust them if needed.

Copy link
Contributor Author

@Tristan1900 Tristan1900 Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue to track #63432

for i := int64(1); i < sampleNum; i++ {
processedCount := int64(1) // we've processed the first sample

// Note: Start from firstSampleIdx+1 because we have already processed the first non-skipped sample.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Note: Start from firstSampleIdx+1 because we have already processed the first non-skipped sample.
// Start from firstSampleIdx + 1 since the first non-skipped sample has already been processed when the range checker is not null.

rangeChecker *SequentialRangeChecker, // optional range checker for skipping TopN indices
) (corrXYSum float64, err error) {
sampleNum := int64(len(samples))
// As we use samples to build the histogram, the bucket number and repeat should multiply a factor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't need to delete this comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this comment hasn’t been restored.

// If a numTopn value other than default is passed in, we assume it's a value that the user wants us to honor
allowPruning := true
if numTopN != 100 {
if numTopN != defaultTopNValue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 Love this const, I hate magic numbers!

@ti-chi-bot ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 9, 2025
@Tristan1900
Copy link
Contributor Author

/retest

@tiprow
Copy link

tiprow bot commented Sep 9, 2025

@Tristan1900: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Tristan1900
Copy link
Contributor Author

/retest

@tiprow
Copy link

tiprow bot commented Sep 9, 2025

@Tristan1900: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Tristan1900
Copy link
Contributor Author

/hold

one test fails for legit reason

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 10, 2025
@Tristan1900
Copy link
Contributor Author

/retest

actually a flaky test that happened before this PR

@tiprow
Copy link

tiprow bot commented Sep 10, 2025

@Tristan1900: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

actually a flaky test that happened before this PR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@terry1purcell
Copy link
Contributor

/ok-to-test

@ti-chi-bot ti-chi-bot bot added the ok-to-test Indicates a PR is ready to be tested. label Sep 10, 2025
@Tristan1900
Copy link
Contributor Author

/retest

Copy link
Member

@0xPoe 0xPoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :shipit:

/hold

For the comment issue. Feel free to unhold it after you address it.

rangeChecker *SequentialRangeChecker, // optional range checker for skipping TopN indices
) (corrXYSum float64, err error) {
sampleNum := int64(len(samples))
// As we use samples to build the histogram, the bucket number and repeat should multiply a factor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this comment hasn’t been restored.

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Sep 11, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 11, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-09-09 14:02:35.546469291 +0000 UTC m=+370022.106350778: ☑️ agreed by terry1purcell.
  • 2025-09-11 09:46:12.237980283 +0000 UTC m=+527438.797861800: ☑️ agreed by 0xPoe.

Add comment to clarify sample factor calculation.
@Tristan1900
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 11, 2025
@Tristan1900
Copy link
Contributor Author

/retest

1 similar comment
@Tristan1900
Copy link
Contributor Author

/retest

Comment on lines 106 to 108
sort.Slice(result, func(i, j int) bool {
return h.cmpFunc(result[i], result[j]) > 0
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. slices.SortFunc is a newer replacement for it. In most cases, we should use the slices package instead of the sort package.
  2. I think using -h.cmpFunc(result[i], result[j]) would be more explicit that the order here is reversed.

Comment on lines 50 to 72
// Less compares two items in the heap. We use a min-heap for efficient bounded operations.
func (h *BoundedMinHeap[T]) Less(i, j int) bool {
return h.cmpFunc(h.items[i], h.items[j]) < 0
}

// Swap swaps two items in the heap.
func (h *BoundedMinHeap[T]) Swap(i, j int) {
h.items[i], h.items[j] = h.items[j], h.items[i]
}

// Push adds an item to the heap.
func (h *BoundedMinHeap[T]) Push(x any) {
h.items = append(h.items, x.(T))
}

// Pop removes and returns the smallest item from the heap.
func (h *BoundedMinHeap[T]) Pop() any {
old := h.items
n := len(old)
item := old[n-1]
h.items = old[0 : n-1]
return item
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to reconsider the API design of this struct.
These methods are not expected to be used externally and should be unexported. I think the heap from the std lib should be an internal data type.

Copy link
Contributor Author

@Tristan1900 Tristan1900 Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's heap's interface we must comply. https://pkg.go.dev/container/heap
I think that's the standard lib heap, and I looked at other impl in the code base and they all follow the same pattern. Do you mind sharing which std lib you are referring to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I understand that you implemented BoundedMinHeap based on the container/heap from the std lib.

I mean the container/heap here should be an internal data type and should not be accessed externally. For example, you don't want other people to do something like NewBoundedMinHeap().Push(val).

What I have in mind is that the implementation should be something like:

type internalHeap[T any] struct {
	....
}

func (h *internalHeap[T]) Len() int {...}

func (h *internalHeap[T]) Less(i, j int) bool {...}

func (h *internalHeap[T]) Swap(i, j int) {...}

func (h *internalHeap[T]) Push(x any) {...}

func (h *internalHeap[T]) Pop() any {...}

type BoundedMinHeap[T any] struct {
	data *internalHeap[T]
	...
}

func (h *BoundedMinHeap[T]) Add(item T) {...}


func (h *BoundedMinHeap[T]) ToSortedSlice() []T {...}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see the idea. Yeah that's a great suggestion. Let me do that

Comment on lines 22 to 24
// BoundedMinHeap implements a min-heap for maintaining the best N items efficiently.
// It keeps the N items with the highest values according to the comparison function.
// The root of the heap is always the smallest item, making it easy to remove when adding better items.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to unify the terminology: best/worst or high/low.

Comment on lines 433 to 438
if a.Count < b.Count {
return -1 // min-heap: smaller counts at root
} else if a.Count > b.Count {
return 1
}
return 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmp.Compare(a.Count, b.Count)

Signed-off-by: Wenqi Mou <[email protected]>
@Tristan1900
Copy link
Contributor Author

/retest

@Tristan1900
Copy link
Contributor Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2025
Signed-off-by: Wenqi Mou <[email protected]>
@ti-chi-bot
Copy link

ti-chi-bot bot commented Sep 12, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 0xPoe, terry1purcell, time-and-fate

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Tristan1900
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2025
@ti-chi-bot ti-chi-bot bot merged commit b75bdd1 into pingcap:master Sep 12, 2025
29 checks passed
@0xPoe 0xPoe added the needs-cherry-pick-release-8.4 Should cherry pick this PR to release-8.4 branch. label Nov 19, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Nov 19, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.4: #64575.
But this PR has conflicts, please resolve them!

@fixdb fixdb added needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. and removed needs-cherry-pick-release-8.4 Should cherry pick this PR to release-8.4 branch. labels Nov 19, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Nov 19, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #64577.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stats: build topN and histogram can be optimized

7 participants