Skip to content

Conversation

@dashpole
Copy link
Contributor

@dashpole dashpole commented Oct 8, 2025

Implement a lockless histogram using atomics, and use a sync.Map for attribute access. This improves performance by ~2x.

The design is very similar to #7427, but with one additional change to make the histogram data point itself atomic:

  • For cumulative histograms, which do not use a hot/cold limitedSyncMap, we use a hot/cold data point. This way, we maintain the keys in the sync map, but still ensure that collection gets a consistent view of measure() calls.

Parallel benchmarks:

                                                                       │  main.txt   │              hist.txt              │
                                                                       │   sec/op    │   sec/op     vs base               │
SyncMeasure/NoView/ExemplarsDisabled/Int64Histogram/Attributes/10-24     274.5n ± 2%   125.2n ± 5%  -54.42% (p=0.002 n=6)
SyncMeasure/NoView/ExemplarsDisabled/Float64Histogram/Attributes/10-24   274.1n ± 2%   132.5n ± 2%  -51.65% (p=0.002 n=6)
geomean                                                                  274.3n        128.8n       -53.05%

zero memory allocations before and after this change. Omitted for brevity

@codecov
Copy link

codecov bot commented Oct 8, 2025

Codecov Report

❌ Patch coverage is 94.85714% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.1%. Comparing base (5616ce4) to head (0de2d30).

Files with missing lines Patch % Lines
sdk/metric/internal/aggregate/atomic.go 83.6% 9 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@          Coverage Diff           @@
##            main   #7474    +/-   ##
======================================
  Coverage   86.1%   86.1%            
======================================
  Files        296     296            
  Lines      21594   21697   +103     
======================================
+ Hits       18601   18693    +92     
- Misses      2620    2631    +11     
  Partials     373     373            
Files with missing lines Coverage Δ
sdk/metric/internal/aggregate/aggregate.go 100.0% <100.0%> (ø)
sdk/metric/internal/aggregate/histogram.go 100.0% <100.0%> (ø)
sdk/metric/internal/aggregate/sum.go 100.0% <100.0%> (ø)
sdk/metric/internal/aggregate/atomic.go 88.1% <83.6%> (-4.6%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dashpole dashpole force-pushed the optimize_syncmap_histogram branch 4 times, most recently from ac02811 to 389d0bd Compare October 8, 2025 18:22
@dashpole dashpole marked this pull request as ready for review October 8, 2025 18:23
@dashpole dashpole force-pushed the optimize_syncmap_histogram branch 3 times, most recently from 34d7502 to f0b28ca Compare October 8, 2025 18:55
@pellared pellared mentioned this pull request Oct 10, 2025
@dashpole dashpole force-pushed the optimize_syncmap_histogram branch 3 times, most recently from 3e89e36 to 45effea Compare October 15, 2025 00:52
Copy link

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to help a bit with review of this.

Generally looks good, just small things, but also I'm not a maintainer and a bit new to the codebase (I maintain Prometheus client_golang SDK though).

Great work, great to see amazing results! Do you mid benchmarking with allocs too?

assert.Equal(t, int64(15), aSum.load())
}

func BenchmarkAtomicCounter(b *testing.B) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added a benchmark but I didn't see any results posted on PR or anywhere.

Is this useful benchmark then? Or there is some policy to add benchmarks for all low-level things just in case? (reminds me of YAGNI)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see tha rationales #7474 (comment)

Tests are great.

Just 2c, but adding benchmarks without ever planning to use it (realistically) might is same as adding a dead code. They could be added when we want to execute and measure. Is it used anywhere now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run benchmarks as part of CI, and also do a diff on push: https://github.com/open-telemetry/opentelemetry-go/blob/main/.github/workflows/benchmark.yml. Example: https://github.com/open-telemetry/opentelemetry-go/actions/runs/18314631364 from my PR for improving sum measure performance.

So if someone did make a change, we would at least be able to tell if it made performance significantly worse afterwards...

b.count++
func (b *histogramPointCounters[N]) loadCountsInto(into *[]uint64) uint64 {
// TODO (#3047): Making copies for bounds and counts incurs a large
// memory allocation footprint. Alternatives should be explored.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First would nice to understand that mem footprint. You benchmark results on a PR don't report allocs. Can we add those?

Image

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure if this comment is useful. You could always invest more time to improve memory. Here is not that obvious way - you do all you can to only allocate if you need to resize. Maybe it's good enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no allocs before or after. Added to the PR description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are allocs for Collect. I can try to publish the collect benchmarks as well, but I suspect those get worse.

b.counts[idx]++
b.count++
func (b *histogramPointCounters[N]) loadCountsInto(into *[]uint64) uint64 {
// TODO (#3047): Making copies for bounds and counts incurs a large
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO (#3047): Making copies for bounds and counts incurs a large
// TODO (#3047): Making copies for counts incurs a large

Only for counts?

// and resets the state of this set of counters. This is used by
// hotColdHistogramPoint to ensure that the cumulative counters continue to
// accumulate after being read.
func (b *histogramPointCounters[N]) mergeIntoAndReset( // nolint:revive // Intentional internal control flag
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What it is complaining for? What control flag means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revive doesn't like boolean arguments to functions. It considers them "control flags". See https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#flag-parameter. I'm not sure I agree that they should always be avoided, as it would cause a lot of code duplication...

//
// Then,
//
// buckets = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// buckets = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)
// counts = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)

Perhaps? To match the code

// newCumulativeHistogram returns a histogram that accumulates measurements
// into a histogram data structure. It is never reset.
func newCumulativeHistogram[N int64 | float64](
boundaries []float64,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often (if ever) histograms change bucketing? Never during runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never during runtime.

//
// Then,
//
// buckets = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// buckets = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)
// count = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)

// of s.bounds. This aligns with the histogramPoint in that the length of histogramPoint
// is len(s.bounds)+1, with the last bucket representing:
// (s.bounds[len(s.bounds)-1], +∞).
idx := sort.SearchFloat64s(s.bounds, float64(value))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to do it inside lock? s.bounds never change, right?

@dashpole
Copy link
Contributor Author

dashpole commented Nov 7, 2025

Great work, great to see amazing results! Do you mid benchmarking with allocs too?

Yeah, zero allocs on the Measure() path before and after, so I didn't post it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants