-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Description
We recently tried adding some application metrics with the otel metrics sdk to one of our kafka consumers. When we rolled out the change to our staging environment, the consumers immediately fell behind & couldn't keep up with the throughput of the kafka partition.
some profiling revealed that we were spending a lot of time contended on this mutex
| s.Lock() |
We were also using delta metrics for this - I haven't evaluated if there is better performance with cumulative sums. It is possible that what we were actually seeing is needing to recreate the values every minute with newRes. I'm not 100% convinced of that, just because we aren't emitting the counters very often, and the slowdown is very substantial.
Environment
- OS: linux arm64
- Go Version: 1.24.4
- opentelemetry-go version: 1.37.0
Steps To Reproduce
I suspect that calling sum.Add(1) from a large number of goroutines in a hot loop will reproduce this issue.
I'll note that for our case, every goroutine is writing to an independent set of attributes - if the map used a rwlock & the values in the map had their own mutex, I suspect we would see little to no contention.