Add Lark.scan() for finding grammar matches in text, and parsing them. by erezsh · Pull Request #1592 · lark-parser/lark

erezsh · 2026-05-10T13:30:21Z

It finds and parses each non-overlapping match, yielding the longest possible match.

Reimplementation of the long-standing #1429 PR by @MegaIng on top of the merged TextSlice support, with some modifications.

Works in 2 steps. First parses without callbacks, for cheap cloning. Then finally replays the chosen match with user callbacks.

codecov · 2026-05-10T14:27:52Z

Codecov Report

❌ Patch coverage is 98.78049% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.41%. Comparing base (c169b26) to head (0978b72).

Files with missing lines	Patch %	Lines
lark/parser_frontends.py	96.82%	2 Missing ⚠️
lark/lexer.py	97.91%	1 Missing ⚠️
tests/test_scan.py	99.47%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1592      +/-   ##
==========================================
+ Coverage   90.08%   90.41%   +0.33%     
==========================================
  Files          52       53       +1     
  Lines        8105     8422     +317     
==========================================
+ Hits         7301     7615     +314     
- Misses        804      807       +3

Flag	Coverage Δ
unittests	`90.41% <98.78%> (+0.33%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MegaIng

Looks very good, thank you for taking over this implementation, sorry for never getting around to it.

I genuinely don't have any detailed comments that aren't unrelated nits, this is very clean.

erezsh · 2026-05-10T17:28:50Z

No worries, thanks for looking it over!

…uce__ Both methods only copied start_pos/line/column, silently dropping the end positions.

It finds and parses each non-overlapping match, yielding the longest possible match. Reimplementation of the long-standing #1429 PR on top of the merged TextSlice support, with some modifications.

- Split ParsingFrontend.scan() into _scan() so configuration errors raise on the call - Lark.scan() now checks parser='lalr' up front too. Reject custom lexer - Re-raise ConfigurationError instead of counting it as ValueError - Check source positions on every replayed token, not just the last docstring. - Improves docs - Adds tests

erezsh · 2026-06-25T23:13:08Z

Added a few fixes.

I think it's now ready to merge.

Most important ones are the scan-related ones. Especially the last two: they fix performance issues, and a small bug around %ignore.

@MegaIng You're welcome to review the changes if you like.

…minor fixes

erezsh requested a review from MegaIng May 10, 2026 14:29

MegaIng approved these changes May 10, 2026

View reviewed changes

erezsh mentioned this pull request Jun 8, 2026

pyparsing porting guide #1008

Open

erezsh added 3 commits June 25, 2026 17:11

Token: preserve end_line/end_column/end_pos in __deepcopy__ and __red…

efc2b1e

…uce__ Both methods only copied start_pos/line/column, silently dropping the end positions.

Add Lark.scan() for finding grammar matches in text, and parsing them.

dd53672

It finds and parses each non-overlapping match, yielding the longest possible match. Reimplementation of the long-standing #1429 PR on top of the merged TextSlice support, with some modifications.

Docs: Added recipe for using scan()

15cdcd8

erezsh force-pushed the scan_parse branch from 811ef1d to a868add Compare June 25, 2026 15:13

erezsh added 6 commits June 25, 2026 23:01

Test that Token end positions survive deepcopy and pickle

f7268f5

Export ScanMatch from the top-level lark package

ba20a7a

scan(): carry line/column counts across passes instead of recounting

49def87

scan(): search start terminals directly so ignored spans aren't skipped

f65f88e

Add more scan() tests

3f5de4c

erezsh force-pushed the scan_parse branch from b95cb01 to 3f5de4c Compare June 25, 2026 22:24

Scan: Added small warning (possibly TODO if a need arises)

16244a7

scan: Add LineCounter.advance_to(text, pos) for better performance + …

0978b72

…minor fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Lark.scan() for finding grammar matches in text, and parsing them.#1592

Add Lark.scan() for finding grammar matches in text, and parsing them.#1592
erezsh wants to merge 11 commits into
masterfrom
scan_parse

erezsh commented May 10, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 10, 2026 •

edited

Loading

Uh oh!

MegaIng left a comment

Uh oh!

erezsh commented May 10, 2026

Uh oh!

erezsh commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

erezsh commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MegaIng left a comment

Choose a reason for hiding this comment

Uh oh!

erezsh commented May 10, 2026

Uh oh!

erezsh commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erezsh commented May 10, 2026 •

edited

Loading

codecov Bot commented May 10, 2026 •

edited

Loading