Skip to content

Conversation

@TedHartMS
Copy link
Collaborator

@TedHartMS TedHartMS commented May 6, 2025

This PR converts Garnet to use a new ObjectAllocator with a revised in-memory and on-disk format that will use only a single log file, change ISessionFunctions to be (Disk)LogRecord-based and propagate this to operations such as Compaction and Migration, remove TKey and TValue from TsavoriteKV, and many related refactors.

Highlights:

  • ArgSlice and SpanByte have been merged to a PinnedSpanByte struct with much more limited use. SpanByte remains as a static class providing (ReadOnly)Span<byte> extensions.
  • All keys are now either PinnedSpanByte (mostly at and above the GarnetApi layer) or ReadOnlySpan<byte> at the StorageSession and below, including Tsavorite. There are no longer any byte[] keys. The TKey type argument is gone from Tsavorite and Garnet.
    • Non-SpanByte KeyComparer implementations have been removed.
  • Values are either (ReadOnly)Span<byte> or objects implementing IHeapObject. TValue has been removed from TsavoriteKV.
  • "ref TKey", "ref TValue", and "ref RecordInfo" on ISessionFunctions methods are replaced by one of the LogRecord variants, where they are properties.
  • ETag and Expiration are now properties on LogRecord rather than embedded within Value or IGarnetObject. Internally, they are after the Value and *LogRecord manages shifting as Value shrinks and grows; all "record extra length" management is done by LogRecord and is opaque to ISessionFunctions.
  • GenericAllocator has been replaced by ObjectAllocator; for details, see https://github.com/microsoft/garnet/tree/tedhar/object-allocator/website/docs/dev/tsavorite/logrecord.md.
  • GenericScanIterator has been replaced by ObjectScanIterator
  • For ObjectAllocator, keys and string values that are past a configurable size can "overflow" to out-of-line byte[], keeping the record size manageable. Tsavorite implements this threshold automatically, and LogRecord manages it internally when returning the Key or ValueSpan.
  • Some names have been changed for clarity:
    • "Lockable*" and "Manual*" have become "Transactional*"
    • "Transient" locking has become "Ephemeral"
  • ObjectAllocator still writes objects (Overflow and serialized Objects) to a separate log file

Major remaining tasks to be addressed in follow-on PRs:

  • Truncate
  • Change object-size tracking from page-level granularity to record-level.
  • Performance evaluation and tuning
  • Clean up error handling related to the VarLenInputMethods functions (make them consistent with returns from RMWMethods, etc.)
  • Finish converting and re-enabling all Tsavorite Unit Tests
  • Revise the current Garnet.common to be Garnet.core, then re-create Garnet.common as a common layer between Tsavorite and the Garnet processing layer, containing PinnedSpanByte, epochs, Utility functions (some of which are currently duplicated), and so on.
  • Investigate the performance impact of moving further along the (ReadOnly)Span<byte> path by replacing byte* usage (such as was done for Utility.HashBytes).

- Move CountdownWrapper to its own file
- Rename Lockable* to Transactional*
- Rename Transient* to Ephemeral*
@TalZaccai TalZaccai requested a review from Copilot November 11, 2025 19:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

var storeWrapper = clusterProvider.storeWrapper;
var transientObjectIdMap = storeWrapper.store.Log.TransientObjectIdMap;

// Use try/finally instead of "using" because we don't want the boxing that an interface call would entail. Double-Dispose() is OK for DiskLogRecord.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using does not box its argument, it'll do a constrained virtual call on a value type.

To illustrate:
{EA676DCB-0539-4BDF-86A6-2AE433E66A21}

S() will compile to something like:

IL_0000	ldloca.s	00 
IL_0002	initobj	FooStruct
IL_0008	leave.s	IL_0018
IL_000A	ldloca.s	00 
IL_000C	constrained.	FooStruct
IL_0012	callvirt	IDisposable.Dispose ()
IL_0017	endfinally	
IL_0018	ret

O() to:

IL_0000	newobj	FooObj..ctor
IL_0005	stloc.0	
IL_0006	leave.s	IL_0012
IL_0008	ldloc.0	
IL_0009	brfalse.s	IL_0011
IL_000B	ldloc.0	
IL_000C	callvirt	IDisposable.Dispose ()
IL_0011	endfinally	
IL_0012	ret

Note the absence of any box ops in S() - the only real difference is the constrained prefix.

internal readonly bool IsSet => SegmentSizeBits != 0;
internal readonly bool HasData => word != 0 && word != NotSet;

public ObjectLogFilePositionInfo()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty struct constructors are tricky - they aren't invoked for default, or when a struct is implicitly initialized (like when it's a field). That is, new ObjectLogFilePositionInfo().word != default(ObjectLogFilePositionInfo).word.

Makes me a bit nervous.

firstRead = true;
}

// Note: We may be sending to multiple replicas, so cannot serialize LogRecords directly to the network buffer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every connection has a distinct network buffer, so we can technically serialize it directly to the network buffer.
Is there another reason why we cannot write the key,value pair directly to the network buffer as previously?

/// </summary>
public PinnedSpanByte GetAvailableNetworkBufferSpan() => PinnedSpanByte.FromPinnedPointer(curr, (int)(end - curr));

public void IncrementRecordDirect(int size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is not used anywhere? Do we need it?

db.MainStore.Reset();
if (db.ObjectStore?.Log.TailAddress > 64)
db.ObjectStore?.Reset();
if (db.Store.Log.TailAddress > 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be PageHeader.Size?

var valueSB = scratchBufferBuilder.FormatScratch(sizeof(long), value).SpanByte;
valueSB.ExtraMetadata = DateTimeOffset.UtcNow.Ticks + expiry.Ticks;
return SET(ref _key, ref valueSB, ref context);
var input = new RawStringInput(RespCommand.APPEND, ref parseState, arg1: DateTimeOffset.UtcNow.Ticks + expiry.Ticks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why APPEND?

@badrishc badrishc requested a review from Copilot November 25, 2025 18:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants