Skip to content

Optimize Cursor: skip strrpos on tab-free documents#1119

Open
GromNaN wants to merge 2 commits into
thephpleague:2.8from
GromNaN:perf/skip-strrpos-tab-detection
Open

Optimize Cursor: skip strrpos on tab-free documents#1119
GromNaN wants to merge 2 commits into
thephpleague:2.8from
GromNaN:perf/skip-strrpos-tab-detection

Conversation

@GromNaN
Copy link
Copy Markdown

@GromNaN GromNaN commented May 13, 2026

Cursor::__construct() called strrpos($line, "\t") on every line to locate the last tab position. For tab-free documents this is a full backward scan for no benefit — 40–165 ns/line depending on length.

MarkdownParser::parse() now scans the document once with str_contains() (~3 µs). When no tabs are found, new Cursor($line, false) sets lastTabPosition = false directly without calling strrpos. The $lineCouldHaveTabs parameter defaults to true, keeping the public API backward compatible.

Benchmarks run with tests/benchmark/benchmark_parse.php (new warm script: converter initialized once, 100 iterations + 10 warmup, hrtime()). PHP 8.5.2, OPcache on, Xdebug off.

sample.md (27 KB, 2.9% tabs):

median p95
Before 4.38 ms 6.26 ms
After 3.92 ms 4.94 ms

Larger corpora from vendor (not committed): CHANGELOG.md 570 KB −5.2%, spec.txt 200 KB −2.8%.

GromNaN added 2 commits May 13, 2026 23:25
benchmark_parse.php measures only the parsing phase (converter
initialized once outside the loop) unlike benchmark.php which
re-creates the converter on every iteration.

Reports min/median/p95/mean over N iterations (default 100 + 10 warmup).
The Cursor constructor previously called strrpos() on every line to find
the last tab position, even for documents with no tabs. For tab-free
documents (the common case), this was pure overhead — strrpos scans the
string backwards and costs 40–165 ns per line depending on length.

MarkdownParser now scans the whole document once with str_contains()
before the parse loop. When no tabs are found, each Cursor is constructed
with $lineCouldHaveTabs=false, setting lastTabPosition=false directly
without calling strrpos.

Benchmarks (warm, 100 iterations, CommonMarkConverter, sample.md 27 KB):
  Before: median 4.38 ms
  After:  median 3.92 ms  (-10.5%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant