-
Notifications
You must be signed in to change notification settings - Fork 150
Handle changes to MutableSettings and ExporterSettings without rebuilding
#7724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: andrew/settings/5-move-mutable-settings-off-tracer-settings
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
tracer/src/Datadog.Trace/LibDatadog/DataPipeline/ManagedTraceExporter.cs
Outdated
Show resolved
Hide resolved
2bc63f6 to
34f0d90
Compare
e347879 to
8c472a5
Compare
34f0d90 to
f1e1c7e
Compare
8c472a5 to
8e19e3a
Compare
|
f1e1c7e to
c2b6a1c
Compare
8e19e3a to
7940c31
Compare
c2b6a1c to
48c7644
Compare
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (7724) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (76ms) : 71, 80
master - mean (75ms) : 71, 79
section Bailout
This PR (7724) - mean (80ms) : 74, 86
master - mean (79ms) : 75, 84
section CallTarget+Inlining+NGEN
This PR (7724) - mean (1,140ms) : 1075, 1205
master - mean (1,128ms) : 1061, 1194
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (118ms) : 110, 127
master - mean (119ms) : 110, 128
section Bailout
This PR (7724) - mean (120ms) : 115, 126
master - mean (120ms) : 114, 126
section CallTarget+Inlining+NGEN
This PR (7724) - mean (815ms) : 777, 853
master - mean (819ms) : 764, 874
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (106ms) : 99, 113
master - mean (105ms) : 97, 113
section Bailout
This PR (7724) - mean (107ms) : 101, 112
master - mean (106ms) : 99, 114
section CallTarget+Inlining+NGEN
This PR (7724) - mean (763ms) : 734, 793
master - mean (756ms) : 720, 793
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (104ms) : 97, 111
master - mean (105ms) : 98, 111
section Bailout
This PR (7724) - mean (105ms) : 99, 112
master - mean (104ms) : 98, 109
section CallTarget+Inlining+NGEN
This PR (7724) - mean (729ms) : 694, 763
master - mean (729ms) : 695, 764
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (192ms) : 189, 196
master - mean (192ms) : 188, 197
section Bailout
This PR (7724) - mean (197ms) : 192, 202
master - mean (196ms) : 193, 200
section CallTarget+Inlining+NGEN
This PR (7724) - mean (1,172ms) : 1122, 1222
master - mean (1,165ms) : 1113, 1216
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (279ms) : 273, 285
master - mean (277ms) : 271, 282
section Bailout
This PR (7724) - mean (280ms) : 272, 289
master - mean (277ms) : 273, 281
section CallTarget+Inlining+NGEN
This PR (7724) - mean (955ms) : 915, 994
master - mean (956ms) : 909, 1003
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (271ms) : 267, 274
master - mean (271ms) : 265, 276
section Bailout
This PR (7724) - mean (270ms) : 267, 274
master - mean (269ms) : 266, 273
section CallTarget+Inlining+NGEN
This PR (7724) - mean (939ms) : 878, 999
master - mean (931ms) : 884, 978
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (277ms) : 269, 285
master - mean (269ms) : 266, 272
section Bailout
This PR (7724) - mean (273ms) : 267, 278
master - mean (268ms) : 265, 271
section CallTarget+Inlining+NGEN
This PR (7724) - mean (873ms) : 824, 921
master - mean (855ms) : 829, 881
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
11c23d3 to
7038956
Compare
bouwkast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me but it is quite large 😅
Hardest part is the whole bit manipulation with the count / closing sign all packed into one int but I think it looks good but I think it may be nice to have some additional tests is possible around StatsdManager
| // Don't blame me, blame the fact we can't do Volatile.Read with a ulong in .NET FX... | ||
| var nodeHashBase = new NodeHashBase(unchecked((ulong)Volatile.Read(ref Unsafe.As<ulong, long>(ref _nodeHashBase)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅
So uh what exactly is going on here?
It is a struct so can't use InterLocked.CompareExchange but we also need to convert it from ulong to long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, yeah, this is a mess 😅
- We want to cache the value of
NodeHashBase NodeHashBasedepends on some mutable settings, e.g. service name, so we need to be able to replaceNodeHashBase.- We need to use
Interlocked.Exchange()to make sure we do the swap atomically.- You can't do
Interlocked.Exchange()with a struct (for technical reasons) 🙁
- You can't do
- Luckily,
NodeHashBaseis just a thin wrapper around aulong, which is 64 bit and can be used withInterlocked.Exchange🎉- Except, < .NET 5, you can't do
Interlocked.Exchange(ulong)because the overloads don't exist 😭
- Except, < .NET 5, you can't do
- The "solution" for updating is (you can see this in the
UpdateNodeHashmethod):- Calculate the new
NodeHashBase - Grab the
ulongfrom theNodeHashBase - Do an unchecked cast of the
ulongas along - Do an
Interlocked.Exchange(ref long), treating the storedulong _nodeHashBaseas alongfor the purposes of the exchange
- Calculate the new
- The solution for reading is:
- Do a
Volatile.Read(ref long), treating the storedulong _nodeHashBaseas alongfor the purposes of the read- Do an unchecked case of the
longas aulong, - Create the new
NodeHashBase
- Do an unchecked case of the
- Do a
Yes, it's a PITA.
All that said, I just realised that if we store the _nodeHashBase as an int, we can avoid all the Unsafe.As reinterpretation, and just stick with the unchecked casts, which simplifies this a bit! 🙂
| "key": "service.name", | ||
| "value": { | ||
| "string_value": "unknown_service:dotnet" | ||
| "string_value": "Samples.OpenTelemetrySdk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I do remember running into this previously with unknown_service I think I did something to override this like so a long time ago, unsure how / why this changed though now 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm going to defer to @link04 on this one 😅 My guess is that there were/are some race conditions here. I think this is now "correct" to be fair 😄
8f633c0 to
49ed373
Compare
Also: - slight refactor of LogFormatter to reduce some allocation - ignore "previous" when creating DirectLogSubmissionManager (seeing as that won't be a thing soon)
…n't respond to changes I left it like this because the debugger already doesn't respond to changes like other services do
- Move statsd instance creation to separate factory - Create a StatsdManager to handle automatic updating in response to setting changes - Always create a statsd instance, as it's hard to know if we're _ever_ going to need one, and reduces some of the compexity
… reconfiguration is not allowed
…s though, and doesn't respond to changes
This isn't necessary with the current design, and it causes issues today
Make sure we can't dispose a stats consumer that's in use (as it will throw) Rework to use a "lease" mechanism to track usages Make passing in a statsmanager required
The statsd client does sync-over-async in the flush and dispose paths, which can lead to deadlocks and thread exhaustion. To work around that, we push the dispose to happen on a thread-pool thread instead, in the background
49ed373 to
70af77a
Compare
Summary of changes
Reason for change
This is the "endpoint" that we've been heading for - services only being disposed/rebuilt at the end of the app, and otherwise only rebuilding the necessary parts. For example - we don't need to tear down all the API factories when a customer changes a global tag via remote config; they only need to change if the
ExporterSettingschange.The hope is that overall this reduces the overhead of using configuration in code and/or remote configuration, while also reducing the number of issues due to managing disposal of services.
Implementation details
Overall, this PR is kind of a pain. Moving from the "rebuild everything" to "reconfigure each service" couldn't be done piecemeal, so this is the one-shot PR. What's more, different services need different patterns (though we can probably consolidate some of them, this has taken a lot of work and I likely changed patterns unnecessarily in some places).
In general, there's a couple of patterns:
Managed*versions of some servicesVolatile.Read()(to ensure changes are visible) and are generally cached to a local variable (as the underlying field may be updated in the background).Test coverage
In the vast majority of places, this should be covered by existing tests
I plan to add some additional integration tests around reconfiguring and a bunch of manual testing to make sure I'm confident.
Other details
I strongly recommend reviewing commit-by-commit. They're generally self-contained, and hopefully simple enough to understand one commit at a time.
https://datadoghq.atlassian.net/browse/LANGPLAT-819
Part of a config stack
MutableSettingsfromTracerSettings#7522MutableSettingson dynamic config changes #7525DefaultServiceNametoMutableSettings#7530PerTraceSettings.GetServiceName()#7532TracerSettingsto useMutableSettingswhere appropriate #7543IsIntegrationEnabled(),IsErrorStatusCode(), andGetIntegrationAnalyticsSampleRate()#7544DictionaryExtensions.SequenceEqual#7722SettingsManagerfor managing mutable settings and ExporterSettings #7695TracerSettingswhich can change at runtime #7723MutableSettingsandExporterSettingswithout rebuilding #7724 👈This isn't the final PR in the stack, as there will be a bunch of cleaning up to do, but it's the final "implementation" PR