Add ReadOnlyMemory<char> support #100

danipen · 2025-12-11T17:16:37Z

Summary

This PR introduces ReadOnlyMemory<char> support throughout the tokenization pipeline, enabling zero-allocation text handling. Combined with several allocation reduction optimizations, this delivers significant performance and memory improvements.

What's New

New `LineText` Type

A new LineText struct wraps ReadOnlyMemory<char>, providing a clean API for text handling without string allocations:

// Before: Always allocated strings
string lineText = model.GetLineText(lineIndex);
grammar.TokenizeLine(lineText, ruleStack, timeout);

// After: Zero-copy memory access
LineText lineText = model.GetLineText(lineIndex);
grammar.TokenizeLine(lineText, ruleStack, timeout);

Updated Public APIs

IGrammar.TokenizeLine() and TokenizeLine2() now accept LineText instead of string
IModelLines.GetLineText() returns LineText instead of string
Implicit conversions from string to LineText maintain backward compatibility

Benchmark Results

Tested with a 133,439 line C# file (5.8 MB):

Metric	master	This PR	Improvement
Execution Time	4.752 s	2.681 s	44% faster
Memory Allocated	658.27 MB	496.36 MB	25% less
Gen0 Collections	82,000	62,000	24% fewer
Gen1 Collections	8,000	4,000	50% fewer

Optimizations Applied

ArrayPool for line buffers - Reuse char arrays instead of allocating per line
Allocation-free timing - Use Stopwatch.GetTimestamp() instead of new Stopwatch()
List pooling - Reuse internal lists in hot paths (HandleCaptures, CheckWhileConditions)
Scope name caching - Cache GetScopeNames() result to avoid repeated list creation
Single-scope optimization - Avoid List<string> allocation for single scope pushes

⚠️ Breaking Changes

IModelLines.GetLineText() now returns LineText instead of string
IGrammar.TokenizeLine() signature changed to accept LineText

Migration: Replace string with LineText in implementations. The implicit conversion from string means most call sites work unchanged.

Testing

New benchmark project for performance validation

Line text can be implicitly converted from string and from ReadonlyMemory<char>

Optimizations applied: - Use ArrayPool<char> instead of allocating new char[] per line in Grammar.Tokenize() - Replace new Stopwatch() with Stopwatch.GetTimestamp() in LineTokenizer.Scan() - Pool List<LocalStackElement> and List<WhileStack> in LineTokenizer - Cache GetScopeNames() result in AttributedScopeStack - Avoid List<string> allocation for single-scope PushAtributed() Benchmark results (133K line file): - Execution time: 4.75s → 2.89s (39% faster) - Memory allocated: 658 MB → 488 MB (26% less) - Gen0 collections: 82K → 61K (26% fewer) - Gen1 collections: 8K → 4K (50% fewer)

…ilures The ArrayPool<char> optimization in Grammar.Tokenize() caused test failures on x64 Linux/Windows CI while passing on ARM64 macOS locally. Root cause: The rented buffer was returned to the pool in the finally block while LineTokens still held a ReadOnlyMemory<char> reference to it. On x64 platforms with aggressive buffer reuse, subsequent tokenize calls would reuse and overwrite the buffer, corrupting previous results. The other performance optimizations from commit 0c1c0aa remain intact: - Stopwatch.GetTimestamp() instead of new Stopwatch() - Pooled List<LocalStackElement> and List<WhileStack> in LineTokenizer - Cached GetScopeNames() in AttributedScopeStack - Single-scope PushAtributed() optimization

… test failures" This reverts commit 8e20d42.

… by 39%" This reverts commit 0c1c0aa.

…r.Scan()

…Tokenizer.Scan()" This reverts commit 54a330c.

…cations

…allocations

… tokenization

danipen added 19 commits December 11, 2025 10:16

Update to .net8

5d9039d

Update to Onigwrap 1.0.9

3b6f357

Target .net8

dba4dfc

Change public APIs to get LineText instead string

0890306

Line text can be implicitly converted from string and from ReadonlyMemory<char>

Update .gitignore

a99cc2f

Add unit tests for LineText

bd57ea0

Fixing failing test TMModel_Should_Parse_Until_Last_Document_Line

7109ac7

Add benchmark tests

81a865c

Revert "fix: Revert ArrayPool optimization that caused cross-platform…

059062c

… test failures" This reverts commit 8e20d42.

Revert "perf: Reduce allocations and improve tokenization performance…

32624a9

… by 39%" This reverts commit 0c1c0aa.

Cache GetScopeNames() result in AttributedScopeStack

9f70226

Replace new Stopwatch() with Stopwatch.GetTimestamp() in LineTokenize…

54a330c

…r.Scan()

Revert "Replace new Stopwatch() with Stopwatch.GetTimestamp() in Line…

dec2d8c

…Tokenizer.Scan()" This reverts commit 54a330c.

Organize the projects in the solution

4708f1d

Reuse Stopwatch instances in LineTokenizer and TMModel to reduce allo…

b5a6b78

…cations

Reuse local stack and while rules buffers in LineTokenizer to reduce …

6d145c3

…allocations

Use ArrayPool to reduce allocations when appending newline in Grammar…

fe73d44

… tokenization

danipen merged commit 69881cc into master Dec 12, 2025
5 checks passed

This was referenced Dec 15, 2025

Bump Onigwrap and TextMateSharp trackd/PSTextMate#31

Closed

Bump Onigwrap, TextMateSharp and TextMateSharp.Grammars trackd/PSTextMate#32

Closed

This was referenced Dec 15, 2025

v2.0.0 not working In my local avalonia project #102

Closed

Syntax highlight using ReadOnlyMemory<char> for reduced allocations AvaloniaUI/AvaloniaEdit#550

Open

This was referenced Dec 22, 2025

Bump Onigwrap and TextMateSharp trackd/PSTextMate#34

Closed

Bump Onigwrap, TextMateSharp and TextMateSharp.Grammars trackd/PSTextMate#35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add ReadOnlyMemory<char> support #100

Add ReadOnlyMemory<char> support #100

Uh oh!

danipen commented Dec 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add ReadOnlyMemory<char> support #100

Add ReadOnlyMemory<char> support #100

Uh oh!

Conversation

danipen commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's New

New LineText Type

Updated Public APIs

Benchmark Results

Optimizations Applied

⚠️ Breaking Changes

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danipen commented Dec 11, 2025 •

edited

Loading

New `LineText` Type