Add Lark.scan() for finding grammar matches in text, and parsing them.#1592
Add Lark.scan() for finding grammar matches in text, and parsing them.#1592erezsh wants to merge 11 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1592 +/- ##
==========================================
+ Coverage 90.08% 90.41% +0.33%
==========================================
Files 52 53 +1
Lines 8105 8422 +317
==========================================
+ Hits 7301 7615 +314
- Misses 804 807 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
MegaIng
left a comment
There was a problem hiding this comment.
Looks very good, thank you for taking over this implementation, sorry for never getting around to it.
I genuinely don't have any detailed comments that aren't unrelated nits, this is very clean.
|
No worries, thanks for looking it over! |
…uce__ Both methods only copied start_pos/line/column, silently dropping the end positions.
It finds and parses each non-overlapping match, yielding the longest possible match. Reimplementation of the long-standing #1429 PR on top of the merged TextSlice support, with some modifications.
- Split ParsingFrontend.scan() into _scan() so configuration errors raise on the call - Lark.scan() now checks parser='lalr' up front too. Reject custom lexer - Re-raise ConfigurationError instead of counting it as ValueError - Check source positions on every replayed token, not just the last docstring. - Improves docs - Adds tests
|
Added a few fixes. I think it's now ready to merge. Most important ones are the scan-related ones. Especially the last two: they fix performance issues, and a small bug around %ignore. @MegaIng You're welcome to review the changes if you like. |
It finds and parses each non-overlapping match, yielding the longest possible match.
Reimplementation of the long-standing #1429 PR by @MegaIng on top of the merged TextSlice support, with some modifications.
Works in 2 steps. First parses without callbacks, for cheap cloning. Then finally replays the chosen match with user callbacks.