Skip to content

Conversation

@jkoritzinsky
Copy link
Member

Testing out if this fixes a GC hole leading to #122275 and #122482

@jkoritzinsky jkoritzinsky added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) area-VM-coreclr labels Dec 16, 2025
@jkoritzinsky
Copy link
Member Author

/azp run runtime-coreclr gcstress-extra

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@jkoritzinsky
Copy link
Member Author

/azp run runtime-coreclr gcstress-extra

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jkoritzinsky
Copy link
Member Author

When trying to get a repro of the lock failure, I got a semi-reliable repro for #122482. This repro would always hit a GC hole in TryAcquireThinLockSpin. As manipulating the object header from managed isn't particularly known to be reliable and the perf hit for the fast path was too much, I decided to just move the spinning logic back into native and use an implementation similar to the one-shot code we already have in native for the fast no-contention path.

@jkoritzinsky
Copy link
Member Author

/azp run runtime-coreclr gcstress-extra

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jkotas
Copy link
Member

jkotas commented Dec 16, 2025

As manipulating the object header from managed isn't particularly known to be reliable

You need to be careful, but it is not fundamentally unreliable. It would be useful to understand where the problem actually is.

NativeAOT variant is still doing that. Does it have the same problem?

@jkoritzinsky
Copy link
Member Author

I'm not 100% positive on the issue, but here's my guess as to why this isn't a problem on NativeAOT:

On CoreCLR, we don't have a field for the MethodTable in System.Object. As a result, I was pinning the ref returned by GetRawData(). This should be fine for a System.Object instance, but it's at the end of the object, not in the middle. My understanding is that this should be legal, but it's also my only guess as to why this would fail.

On NativeAOT, we have a field for the MethodTable in System.Object, so pinning a ref to that definitely pins something in the middle of the object.

It's also possible that there's a bug in the managed implementation I'm deleting here. I couldn't figure it out and realized that the reasons I had for moving this particular branch to managed didn't seem to still be concerns by the time I finished that PR, so I thought that moving it back would be fine (especially as I was able to get the code for the one-shot implementation relatively clean).

@jkotas
Copy link
Member

jkotas commented Dec 17, 2025

I'm not 100% positive on the issue

If you can get it to crash near the failure with a big enough stress log (#45557 (comment)), we should be able to figure out what went wrong.

My understanding is that this should be legal

Right, this should work fine.

@jkoritzinsky
Copy link
Member Author

jkoritzinsky commented Dec 17, 2025

I'll give it a try and see if I can get a repro with a larger stresslog.

Every time it's crashed it's been either in execution of or very soon after (like in the return) of the spinning function, so I should be able to get useful info.

@jkoritzinsky
Copy link
Member Author

I still need to do the repro to find the issue in the original code, but this fix seems to work (all the failures of both varieties have disappeared from the linux-arm runs).

@jkoritzinsky
Copy link
Member Author

I took a look at the stress log and here's what I found:

The runtime crashes, not because of a GC hole, but because the JIT is reporting the slot with the address of the object header. This is prone to crashes like this and is why we don't allow forming refs to invalid heap pointers, as you know.

It ends up reporting the slot only on linux-arm because the call to Interlocked.CompareExchange is not a must-expand intrinsic on ARM32:

[Intrinsic]
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int CompareExchange(ref int location1, int value, int comparand)
{
#if TARGET_X86 || TARGET_AMD64 || TARGET_ARM64 || TARGET_RISCV64
return CompareExchange(ref location1, value, comparand); // Must expand intrinsic
#else
if (Unsafe.IsNullRef(ref location1))
ThrowHelper.ThrowNullReferenceException();
return CompareExchange32(ref location1, value, comparand);
#endif
}

Because it's not must-expand, we end up having a fully-formed ref as we call into the FCall, and as a result a slot to report on the frame.

We don't see this on other platforms because we never see the fully-formed ref as it gets optimized away as part of expanding the intrinsic.

We don't see this on NativeAOT because NativeAOT has a pointer overload of CompareExchange for this exact use case:

// This is used internally by NativeAOT runtime in cases where having a managed
// ref to the location is unsafe (Ex: it is the syncblock of a pinned object).
// The intrinsic expansion for this overload is exactly the same as for the `ref int`
// variant and will go on the same path since expansion is triggered by the name and
// return type of the method.
// The important part is avoiding `ref *location` in the unexpanded scenario, like
// in a case when compiling the Debug flavor of the app.
[Intrinsic]
internal static unsafe int CompareExchange(int* location1, int value, int comparand)
{
#if TARGET_X86 || TARGET_AMD64 || TARGET_ARM64 || TARGET_RISCV64
return CompareExchange(location1, value, comparand); // Must expand intrinsic
#else
Debug.Assert(location1 != null);
return RuntimeImports.InterlockedCompareExchange(location1, value, comparand);
#endif
}

Now that we've traced down the issue, I'll take this PR out of draft.

@jkoritzinsky jkoritzinsky removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Dec 17, 2025
@jkoritzinsky jkoritzinsky marked this pull request as ready for review December 17, 2025 20:48
Copilot AI review requested due to automatic review settings December 17, 2025 20:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the thin lock acquisition and upgrade logic to address potential GC holes (issues #122275 and #122482). The key changes move thin lock spinning logic from managed C# code to native C++ code, improving GC safety and control.

Key Changes:

  • Moved thin lock spinning and acquisition logic from managed code (ObjectHeader.CoreCLR.cs) to native code (syncblk.inl)
  • Refactored lock upgrade logic in SyncBlock::GetOrCreateLock to use explicit handle management instead of OBJECTHANDLEHolder
  • Renamed FCall from AcquireInternal to Acquire and added new IsAcquired FCall to replace managed implementation

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/vm/syncblk.inl Added native implementation of thin lock spinning logic in AcquireHeaderThinLockWithSpin with detailed comments explaining backoff strategy; added IsHeaderThinLockOwnedByThread to check lock ownership
src/coreclr/vm/syncblk.h Added declarations for new private method TryUpgradeThinLockToFullLock and AcquireHeaderThinLockWithSpin, plus public IsHeaderThinLockOwnedByThread
src/coreclr/vm/syncblk.cpp Refactored GetOrCreateLock to extract upgrade logic into TryUpgradeThinLockToFullLock, using explicit handle management with manual cleanup instead of RAII holder
src/coreclr/vm/ecalllist.h Renamed FCall AcquireInternal to Acquire and added new IsAcquired FCall
src/coreclr/vm/comsynchronizable.h Added declaration for ObjHeader_IsThinLockOwnedByThread FCall
src/coreclr/vm/comsynchronizable.cpp Implemented ObjHeader_IsThinLockOwnedByThread FCall that delegates to native lock ownership check
src/coreclr/System.Private.CoreLib/src/System/Threading/ObjectHeader.CoreCLR.cs Removed entire managed implementation of thin lock spinning logic, constants, and helper methods; replaced with direct FCalls to native implementations
src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs Updated all call sites to use ObjectHeader.Acquire instead of TryAcquireThinLock, added null checks, removed NoInlining attribute from IsEntered

@jkotas
Copy link
Member

jkotas commented Dec 17, 2025

We don't see this on NativeAOT because NativeAOT has a pointer overload of CompareExchange for this exact use case:

Ah, I remember tracing this down on NativeAOT.

@jkotas
Copy link
Member

jkotas commented Dec 17, 2025

Now that we've traced down the issue, I'll take this PR out of draft.

An alternative is to apply the NAOT fix to CoreCLR. Which one has better perf - what does the change in this PR do to Monitor perf?

{
// Someone else owns the lock.
// Try again.
YieldProcessorNormalized(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The managed equivalent polled for GC. This one is less friendly to GC suspension.

@jkoritzinsky
Copy link
Member Author

Now that we've traced down the issue, I'll take this PR out of draft.

An alternative is to apply the NAOT fix to CoreCLR. Which one has better perf - what does the change in this PR do to Monitor perf?

Based on my prior investigations, I believe that this PR will have better performance than the managed implementation. Forcing the stack frame generation from the pinning was quite a significant perf hit for the initial no-spinning implementation, which is one of the reasons I moved it back into native code. I also had to refactor the implementation into a very particular shape to get good codegen, and I had not applied those rules to this spinwait path.

@jkotas
Copy link
Member

jkotas commented Dec 17, 2025

I believe that this PR will have better performance than the managed implementation.

Some numbers would be nice

@jkoritzinsky
Copy link
Member Author

jkoritzinsky commented Dec 18, 2025

On a local run of the following benchmark (in my devbox, so technically in a VM), they came out about equal:

Method Mean Error StdDev Ratio RatioSD
LockSpinning (PR) 37.33 us 0.710 us 0.899 us 1.01 0.03
LockSpinning (main) 36.97 us 0.665 us 0.841 us 1.00 0.03
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Toolchains.CoreRun;

BenchmarkRunner.Run<Benchmarks>(
    DefaultConfig.Instance
    .AddJob(
        Job.Default.WithToolchain(
            new CoreRunToolchain(new FileInfo(@"E:\runtime3\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe")))
            .AsBaseline())
    .AddJob(            
        Job.Default.WithToolchain(
            new CoreRunToolchain(new FileInfo(@"E:\runtime3\artifacts\bin\testhost\net11.0-windows-Release-x64\shared\Microsoft.NETCore.App\11.0.0\corerun.exe")))));

public class Benchmarks
{
    [Benchmark]
    public async Task LockSpinning()
    {
        object obj = new();
        using ManualResetEvent evt = new(false);
        List<Task> tasks = [];
        for (int i = 0; i < Environment.ProcessorCount; i++)
        {
            tasks.Add(Task.Run(() =>
            {
                evt.WaitOne();
                lock (obj)
                {
                }
            }));
        }

        evt.Set();

        await Task.WhenAll(tasks);
    }
}

@VSadov
Copy link
Member

VSadov commented Dec 18, 2025

I was pinning the ref returned by GetRawData()

Yes. This is subtle, but safe.
In case of an object with no fields this would be the location "just after the object", where typically the syncblock of the next object is located. However, by the spec and implementation the location of syncblock "belongs" to the preceding object. This is actually the reason why we cannot pin the location of the syncblock that we work with. Pinning that location would pin the previous object, if such even exists.

The runtime crashes, not because of a GC hole, but because the JIT is reporting the slot with the address of the object header. This is prone to crashes like this and is why we don't allow forming refs to invalid heap pointers, as you know.

I remember now we had problems with that on NativeAOT. It is a quite subtle and unusual scenario.

@VSadov
Copy link
Member

VSadov commented Dec 18, 2025

Now that we've traced down the issue, I'll take this PR out of draft.

An alternative is to apply the NAOT fix to CoreCLR. Which one has better perf - what does the change in this PR do to Monitor perf?

I think if everything else is roughly the same, there is value in having similar implementations where possible.
I assume "the NAOT fix" here basically means - use the same CompareExchange(int*, ...) intrinsic as in NativeAOT, which should be available to CoreCLR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants