-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Refactor lock upgrade logic #122571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor lock upgrade logic #122571
Conversation
|
/azp run runtime-coreclr gcstress-extra |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Tagging subscribers to this area: @mangod9 |
|
/azp run runtime-coreclr gcstress-extra |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
When trying to get a repro of the lock failure, I got a semi-reliable repro for #122482. This repro would always hit a GC hole in |
|
/azp run runtime-coreclr gcstress-extra |
|
Azure Pipelines successfully started running 1 pipeline(s). |
You need to be careful, but it is not fundamentally unreliable. It would be useful to understand where the problem actually is. NativeAOT variant is still doing that. Does it have the same problem? |
|
I'm not 100% positive on the issue, but here's my guess as to why this isn't a problem on NativeAOT: On CoreCLR, we don't have a field for the MethodTable in System.Object. As a result, I was pinning the ref returned by On NativeAOT, we have a field for the MethodTable in System.Object, so pinning a ref to that definitely pins something in the middle of the object. It's also possible that there's a bug in the managed implementation I'm deleting here. I couldn't figure it out and realized that the reasons I had for moving this particular branch to managed didn't seem to still be concerns by the time I finished that PR, so I thought that moving it back would be fine (especially as I was able to get the code for the one-shot implementation relatively clean). |
If you can get it to crash near the failure with a big enough stress log (#45557 (comment)), we should be able to figure out what went wrong.
Right, this should work fine. |
|
I'll give it a try and see if I can get a repro with a larger stresslog. Every time it's crashed it's been either in execution of or very soon after (like in the return) of the spinning function, so I should be able to get useful info. |
|
I still need to do the repro to find the issue in the original code, but this fix seems to work (all the failures of both varieties have disappeared from the linux-arm runs). |
|
I took a look at the stress log and here's what I found: The runtime crashes, not because of a GC hole, but because the JIT is reporting the slot with the address of the object header. This is prone to crashes like this and is why we don't allow forming It ends up reporting the slot only on linux-arm because the call to runtime/src/coreclr/System.Private.CoreLib/src/System/Threading/Interlocked.CoreCLR.cs Lines 114 to 125 in 431c097
Because it's not must-expand, we end up having a fully-formed ref as we call into the FCall, and as a result a slot to report on the frame. We don't see this on other platforms because we never see the fully-formed ref as it gets optimized away as part of expanding the intrinsic. We don't see this on NativeAOT because NativeAOT has a pointer overload of CompareExchange for this exact use case: runtime/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Interlocked.cs Lines 27 to 43 in 431c097
Now that we've traced down the issue, I'll take this PR out of draft. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request refactors the thin lock acquisition and upgrade logic to address potential GC holes (issues #122275 and #122482). The key changes move thin lock spinning logic from managed C# code to native C++ code, improving GC safety and control.
Key Changes:
- Moved thin lock spinning and acquisition logic from managed code (
ObjectHeader.CoreCLR.cs) to native code (syncblk.inl) - Refactored lock upgrade logic in
SyncBlock::GetOrCreateLockto use explicit handle management instead ofOBJECTHANDLEHolder - Renamed FCall from
AcquireInternaltoAcquireand added newIsAcquiredFCall to replace managed implementation
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/syncblk.inl | Added native implementation of thin lock spinning logic in AcquireHeaderThinLockWithSpin with detailed comments explaining backoff strategy; added IsHeaderThinLockOwnedByThread to check lock ownership |
| src/coreclr/vm/syncblk.h | Added declarations for new private method TryUpgradeThinLockToFullLock and AcquireHeaderThinLockWithSpin, plus public IsHeaderThinLockOwnedByThread |
| src/coreclr/vm/syncblk.cpp | Refactored GetOrCreateLock to extract upgrade logic into TryUpgradeThinLockToFullLock, using explicit handle management with manual cleanup instead of RAII holder |
| src/coreclr/vm/ecalllist.h | Renamed FCall AcquireInternal to Acquire and added new IsAcquired FCall |
| src/coreclr/vm/comsynchronizable.h | Added declaration for ObjHeader_IsThinLockOwnedByThread FCall |
| src/coreclr/vm/comsynchronizable.cpp | Implemented ObjHeader_IsThinLockOwnedByThread FCall that delegates to native lock ownership check |
| src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs | Removed entire managed implementation of thin lock spinning logic, constants, and helper methods; replaced with direct FCalls to native implementations |
| src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs | Updated all call sites to use ObjectHeader.Acquire instead of TryAcquireThinLock, added null checks, removed NoInlining attribute from IsEntered |
Ah, I remember tracing this down on NativeAOT. |
An alternative is to apply the NAOT fix to CoreCLR. Which one has better perf - what does the change in this PR do to Monitor perf? |
| { | ||
| // Someone else owns the lock. | ||
| // Try again. | ||
| YieldProcessorNormalized(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The managed equivalent polled for GC. This one is less friendly to GC suspension.
Based on my prior investigations, I believe that this PR will have better performance than the managed implementation. Forcing the stack frame generation from the pinning was quite a significant perf hit for the initial no-spinning implementation, which is one of the reasons I moved it back into native code. I also had to refactor the implementation into a very particular shape to get good codegen, and I had not applied those rules to this spinwait path. |
Some numbers would be nice |
|
On a local run of the following benchmark (in my devbox, so technically in a VM), they came out about equal:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Toolchains.CoreRun;
BenchmarkRunner.Run<Benchmarks>(
DefaultConfig.Instance
.AddJob(
Job.Default.WithToolchain(
new CoreRunToolchain(new FileInfo(@"E:\runtime3\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe")))
.AsBaseline())
.AddJob(
Job.Default.WithToolchain(
new CoreRunToolchain(new FileInfo(@"E:\runtime3\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe")))));
public class Benchmarks
{
[Benchmark]
public async Task LockSpinning()
{
object obj = new();
using ManualResetEvent evt = new(false);
List<Task> tasks = [];
for (int i = 0; i < Environment.ProcessorCount; i++)
{
tasks.Add(Task.Run(() =>
{
evt.WaitOne();
lock (obj)
{
}
}));
}
evt.Set();
await Task.WhenAll(tasks);
}
} |
Yes. This is subtle, but safe.
I remember now we had problems with that on NativeAOT. It is a quite subtle and unusual scenario. |
I think if everything else is roughly the same, there is value in having similar implementations where possible. |
Testing out if this fixes a GC hole leading to #122275 and #122482