Use sync.Map and atomics for fixed bucket histograms #7474

dashpole · 2025-10-08T14:31:21Z

Implement a lockless histogram using atomics, and use a sync.Map for attribute access. This improves performance by ~2x.

The design is very similar to #7427, but with one additional change to make the histogram data point itself atomic:

For cumulative histograms, which do not use a hot/cold limitedSyncMap, we use a hot/cold data point. This way, we maintain the keys in the sync map, but still ensure that collection gets a consistent view of measure() calls.

Parallel benchmarks:

                                                                       │  main.txt   │              hist.txt              │
                                                                       │   sec/op    │   sec/op     vs base               │
SyncMeasure/NoView/ExemplarsDisabled/Int64Histogram/Attributes/10-24     274.5n ± 2%   125.2n ± 5%  -54.42% (p=0.002 n=6)
SyncMeasure/NoView/ExemplarsDisabled/Float64Histogram/Attributes/10-24   274.1n ± 2%   132.5n ± 2%  -51.65% (p=0.002 n=6)
geomean                                                                  274.3n        128.8n       -53.05%

zero memory allocations before and after this change. Omitted for brevity

codecov · 2025-10-08T14:34:56Z

Codecov Report

❌ Patch coverage is 94.85714% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.1%. Comparing base (5616ce4) to head (0de2d30).

Files with missing lines	Patch %	Lines
sdk/metric/internal/aggregate/atomic.go	83.6%	9 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff           @@
##            main   #7474    +/-   ##
======================================
  Coverage   86.1%   86.1%            
======================================
  Files        296     296            
  Lines      21594   21697   +103     
======================================
+ Hits       18601   18693    +92     
- Misses      2620    2631    +11     
  Partials     373     373

Files with missing lines	Coverage Δ
sdk/metric/internal/aggregate/aggregate.go	`100.0% <100.0%> (ø)`
sdk/metric/internal/aggregate/histogram.go	`100.0% <100.0%> (ø)`
sdk/metric/internal/aggregate/sum.go	`100.0% <100.0%> (ø)`
sdk/metric/internal/aggregate/atomic.go	`88.1% <83.6%> (-4.6%)`	⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sdk/metric/internal/aggregate/atomic.go

sdk/metric/internal/aggregate/histogram.go

bwplotka

Trying to help a bit with review of this.

Generally looks good, just small things, but also I'm not a maintainer and a bit new to the codebase (I maintain Prometheus client_golang SDK though).

Great work, great to see amazing results! Do you mid benchmarking with allocs too?

sdk/metric/internal/aggregate/atomic.go

bwplotka · 2025-11-07T05:53:31Z

sdk/metric/internal/aggregate/atomic_test.go

 	assert.Equal(t, int64(15), aSum.load())
 }

+func BenchmarkAtomicCounter(b *testing.B) {


You added a benchmark but I didn't see any results posted on PR or anywhere.

Is this useful benchmark then? Or there is some policy to add benchmarks for all low-level things just in case? (reminds me of YAGNI)

Ok I see tha rationales #7474 (comment)

Tests are great.

Just 2c, but adding benchmarks without ever planning to use it (realistically) might is same as adding a dead code. They could be added when we want to execute and measure. Is it used anywhere now?

We run benchmarks as part of CI, and also do a diff on push: https://github.com/open-telemetry/opentelemetry-go/blob/main/.github/workflows/benchmark.yml. Example: https://github.com/open-telemetry/opentelemetry-go/actions/runs/18314631364 from my PR for improving sum measure performance.

So if someone did make a change, we would at least be able to tell if it made performance significantly worse afterwards...

bwplotka · 2025-11-07T08:01:04Z

sdk/metric/internal/aggregate/histogram.go

-	b.count++
+func (b *histogramPointCounters[N]) loadCountsInto(into *[]uint64) uint64 {
+	// TODO (#3047): Making copies for bounds and counts incurs a large
+	// memory allocation footprint. Alternatives should be explored.


First would nice to understand that mem footprint. You benchmark results on a PR don't report allocs. Can we add those?

Also not sure if this comment is useful. You could always invest more time to improve memory. Here is not that obvious way - you do all you can to only allocate if you need to resize. Maybe it's good enough?

There are no allocs before or after. Added to the PR description.

There are allocs for Collect. I can try to publish the collect benchmarks as well, but I suspect those get worse.

bwplotka · 2025-11-07T08:37:45Z

sdk/metric/internal/aggregate/histogram.go

-	b.counts[idx]++
-	b.count++
+func (b *histogramPointCounters[N]) loadCountsInto(into *[]uint64) uint64 {
+	// TODO (#3047): Making copies for bounds and counts incurs a large


Suggested change

// TODO (#3047): Making copies for bounds and counts incurs a large

// TODO (#3047): Making copies for counts incurs a large

Only for counts?

bwplotka · 2025-11-07T08:39:56Z

sdk/metric/internal/aggregate/histogram.go

+// and resets the state of this set of counters. This is used by
+// hotColdHistogramPoint to ensure that the cumulative counters continue to
+// accumulate after being read.
+func (b *histogramPointCounters[N]) mergeIntoAndReset( // nolint:revive // Intentional internal control flag


What it is complaining for? What control flag means?

revive doesn't like boolean arguments to functions. It considers them "control flags". See https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#flag-parameter. I'm not sure I agree that they should always be avoided, as it would cause a lot of code duplication...

bwplotka · 2025-11-07T08:45:06Z

sdk/metric/internal/aggregate/histogram.go

 			//
 			// Then,
 			//
 			//   buckets = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)


Suggested change

// buckets = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)

// counts = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)

Perhaps? To match the code

bwplotka · 2025-11-07T08:46:38Z

sdk/metric/internal/aggregate/histogram.go

+// newCumulativeHistogram returns a histogram that accumulates measurements
+// into a histogram data structure. It is never reset.
+func newCumulativeHistogram[N int64 | float64](
+	boundaries []float64,


How often (if ever) histograms change bucketing? Never during runtime?

never during runtime.

bwplotka · 2025-11-07T08:46:47Z

sdk/metric/internal/aggregate/histogram.go

+			//
+			// Then,
+			//
+			//   buckets = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)


Suggested change

// buckets = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)

// count = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)

bwplotka · 2025-11-07T08:51:55Z

sdk/metric/internal/aggregate/histogram.go

+	// of s.bounds. This aligns with the histogramPoint in that the length of histogramPoint
+	// is len(s.bounds)+1, with the last bucket representing:
+	// (s.bounds[len(s.bounds)-1], +∞).
+	idx := sort.SearchFloat64s(s.bounds, float64(value))


Do you have to do it inside lock? s.bounds never change, right?

dashpole · 2025-11-07T15:19:44Z

Great work, great to see amazing results! Do you mid benchmarking with allocs too?

Yeah, zero allocs on the Measure() path before and after, so I didn't post it.

dashpole force-pushed the optimize_syncmap_histogram branch 4 times, most recently from ac02811 to 389d0bd Compare October 8, 2025 18:22

dashpole marked this pull request as ready for review October 8, 2025 18:23

dashpole requested review from MrAlias, XSAM, dmathieu, flc1125 and pellared as code owners October 8, 2025 18:23

dashpole force-pushed the optimize_syncmap_histogram branch 3 times, most recently from 34d7502 to f0b28ca Compare October 8, 2025 18:55

dashpole mentioned this pull request Oct 8, 2025

Use sync.Map and atomics for lastvalue aggregations #7478

Draft

MrAlias reviewed Oct 8, 2025

View reviewed changes

sdk/metric/internal/aggregate/atomic.go Outdated Show resolved Hide resolved

pellared mentioned this pull request Oct 10, 2025

SIG meeting notes #6648

Open

MrAlias reviewed Oct 10, 2025

View reviewed changes

dashpole force-pushed the optimize_syncmap_histogram branch 3 times, most recently from 3e89e36 to 45effea Compare October 15, 2025 00:52

dashpole added 9 commits October 15, 2025 20:13

rename valueMap in sum to sumValueMap

f4772f3

use sync Map for explicit bucket histogram aggregation

6999929

use atomics for explicit bucket histogram internals

52e2eb8

fix concurrency bug in atomicMinMax, and add tests

f176141

restore comment

a79ec5a

add benchmarks for new atomic types

a9b7e3b

move hotColdHistogramPoint and re-use bucketCounts

8969e6c

lint

cab55fb

document locking strategy for fixed bucket histogram

d4c340c

dashpole added 2 commits October 15, 2025 20:13

remove bin and sum

d3e2467

lint

b18ff3d

dashpole force-pushed the optimize_syncmap_histogram branch from 45effea to b18ff3d Compare October 15, 2025 20:14

MrAlias added this to the v1.39.0 milestone Oct 16, 2025

dashpole mentioned this pull request Oct 24, 2025

Improve concurrent performance of exponential histogram measurements #7535

Draft

Merge branch 'main' into optimize_syncmap_histogram

0de2d30

bwplotka reviewed Nov 7, 2025

View reviewed changes

dashpole mentioned this pull request Nov 7, 2025

Reconsider usage of generics in sdk/metric/internal/aggregate #7589

Open

	// TODO (#3047): Making copies for bounds and counts incurs a large
	// TODO (#3047): Making copies for counts incurs a large

	// buckets = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)
	// counts = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)

	// buckets = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)
	// count = (-∞, 0], (0, 5.0], (5.0, 10.0], (10.0, +∞)

Use sync.Map and atomics for fixed bucket histograms #7474

Are you sure you want to change the base?

Use sync.Map and atomics for fixed bucket histograms #7474

Uh oh!

Conversation

dashpole commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dashpole commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dashpole commented Oct 8, 2025 •

edited

Loading

codecov bot commented Oct 8, 2025 •

edited

Loading