Skip to content

Bugfix: call __default_token__ from an embedded transformer (Issue #1582)#1612

Open
Labib-Bin-Salam wants to merge 1 commit into
lark-parser:masterfrom
Labib-Bin-Salam:issue1582
Open

Bugfix: call __default_token__ from an embedded transformer (Issue #1582)#1612
Labib-Bin-Salam wants to merge 1 commit into
lark-parser:masterfrom
Labib-Bin-Salam:issue1582

Conversation

@Labib-Bin-Salam

Copy link
Copy Markdown

Fixes #1582.

Transformer.__default_token__ is called for tokens without a dedicated method when running transformer.transform(tree), but it was never invoked by the embedded transformer (Lark(transformer=...)). As confirmed by @erezsh in the issue, __default_token__ "was never implemented for the internal transformer".

_get_lexer_callbacks only registered a token callback when the transformer defined a method named after the terminal. It now falls back to an overridden __default_token__ for the remaining terminals, mirroring Transformer.transform(). The base no-op implementation is skipped, so transformers that don't override __default_token__ keep the existing fast path (no extra call per token), and specific token methods still take precedence.

Example

from lark import Lark, Transformer

class T(Transformer):
    def __default_token__(self, token):
        return token.update(value=token.upper())

parser = Lark("start: WORD+\n%import common.WORD\n%import common.WS\n%ignore WS",
              parser="lalr", transformer=T())
print(parser.parse("foo bar baz"))
# before: Tree('start', [Token('WORD', 'foo'), Token('WORD', 'bar'), Token('WORD', 'baz')])
# after:  Tree('start', [Token('WORD', 'FOO'), Token('WORD', 'BAR'), Token('WORD', 'BAZ')])

Notes

  • Added a regression test (test_default_token_in_treeless_mode) asserting the embedded transformer matches transform().
  • Scoped to the lalr parser, the only one that applies embedded token callbacks. The cyk parser does not apply embedded token callbacks at all (not even specific token methods) — a separate, pre-existing limitation.
  • python -m tests passes (1283 tests) and mypy is clean.

…rk-parser#1582)

When a transformer is applied during parsing via Lark(transformer=...),
tokens without a dedicated method were left untouched, whereas
Transformer.transform() falls back to __default_token__ for them.

_get_lexer_callbacks now wires up an overridden __default_token__ as the
fallback token callback, matching transform(). The base no-op is skipped,
so the common case keeps tokens untouched with no extra call per token.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

__default_token__ and __default__ not called for Lark(transformer=transformer).

1 participant