feat: support VirtualTable Read with inline and verbose text formats by wackywendell · Pull Request #101 · DataDog/substrait-explain

wackywendell · 2026-04-14T17:45:49Z

Description

Add support for VirtualTable in ReadRel (closes #56), enabling inline literal data tables in Substrait plans — similar to SQL VALUES clauses.

Two text formats are supported:

Inline (default for small tables, rows * columns <= 8):

Read:Virtual[(1, 'alice'), (2, 'bob') => id:i64, name:string]

Verbose (default for larger tables):

Read:Virtual[_ => id:i64, name:string]
  + Row[1, 'alice']
  + Row[2, 'bob']

Both forms are always accepted by the parser; the textifier selects based on a configurable threshold. Mixed form (inline rows + + Row addenda) is accepted but never produced.

Design decisions

Read:Virtual uses the Type:Subtype colon pattern (consistent with ExtensionLeaf:Name)
+ Row uses the +-prefixed addendum pattern (same structural role as + Enh: / + Opt:)
Addendum enum unifies AdvExt and Row handling in the structural parser

Parser refactoring (included)

VirtualTable support exposed an opportunity to make the parser trait more self-contained:

RelationParsePair redesign: parse_pair_with_context now returns (Self, usize) — each relation type computes its own output field count during parsing rather than having it reconstructed afterward. into_rel takes RelationAddenda (pre-parsed rows + advanced extension) so each type handles its own addenda.
Identity emit: new make_emit helper produces EmitKind::Direct for identity output mappings instead of an explicit Emit, matching Substrait semantics. This also fixes a Fetch textifier regression where passthrough columns were emitted as empty when EmitKind::Direct was in effect.
Structured error messages: ParseError::ValidationError now carries ParseContext for line-precise diagnostics.
Extension relations now reject + Enh: / + Opt: addenda with an explicit error (previously silently applied).

Type of Change

New feature
Documentation update

Testing

Added 8 roundtrip tests (inline, verbose, empty table, single-column, large table uses verbose, VirtualTable in complex plan, error paths)
All existing tests pass (309 total)
GRAMMAR.md doc tests compile and pass

Checklist

Code follows Rust conventions
Documentation updated if needed
No breaking changes (or breaking changes documented)
Pre-commit hooks pass

Related Issues

Closes #56

wackywendell · 2026-04-14T18:41:40Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2d2cd16ee

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f52fe6e74b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-16T15:30:31Z

+    let expression_list = unwrap_single_pair(pair);
+    assert_eq!(expression_list.as_rule(), Rule::expression_list);
+    nested::Struct {
+        fields: parse_expression_list(extensions, expression_list),


Avoid panicking on invalid VirtualTable row expressions

parse_virtual_row now builds row fields via parse_expression_list, but that helper unwraps each Expression::parse_pair with expect(...) instead of returning a MessageParseError. If a Read:Virtual row contains a semantically invalid expression (for example, a function call with no matching extension declaration), parsing will panic and abort instead of returning a normal parse error. This makes malformed user input crash the parser path introduced for VirtualTable rows.

Useful? React with 👍 / 👎.

benbellick

Thanks for this! I didn't look too closely at the code because it may change based off of the discussion around the verbose form of the VirtualTables. One way we could move forwards though and make the PR smaller could be to open a new PR where we just handle the compact form, which I am fully in agreement with. Then after that we can tackle the verbose form separately. What do you think?

benbellick · 2026-04-17T13:47:59Z

+  ("+ Row" "[" expression ("," expression)* "]")*
+```
+
+The `+ Row` lines are addenda attached to the `Read:Virtual` relation (same structural role as `+ Enh:` / `+ Opt:` lines). They must appear before any child relations.


I think that the above short form is great, though I have some questions about the verbose form.

I want to ask why + Row has the same structural role as + Enh and + Opt. Both enhancements and optimizations are additions on top of the core substrait, and the + communicates that it is an addition on top of a core. On the other hand, virtual tables are a core part of the spec. It would make sense to me for them to be visually different.

One other thing, it is counterintuitive to me that the inline format would use parentheses, but the verbose form would use brackets. Parenthesis communicate tuple-like things with a fixed width, so I think they are more appropriate for representing relations in both cases.

What do you think about one of the following alternative representations?

same as inline but with new lines

Root[id, name] Read:Virtual[_ => id:i64, name:string] (1, 'alice') (2, 'bob')

using dashes to demarcate new rows

Root[id, name] Read:Virtual[_ => id:i64, name:string] - (1, 'alice') - (2, 'bob')

And then one more item, it looks like there is precedent in other relations for not having the => syntax in the brackets after the relation name. What do you think about just dropping the _ => after the verbose form?

E.g.

Root[id, name] Read:Virtual[id:i64, name:string] (1, 'alice') (2, 'bob')

Thanks for the thoughtful feedback! I agree on splitting this. The inline form is ready now, and the verbose form deserves its own design discussion. I'll strip the verbose form out of this PR and file issues for the design questions you raised.

On the design questions:

Why + for rows

In my mind, + means "this is not a new operator in the tree; it's additional data attached to the operator above." It's about tree structure — how lines relate to their parent — rather than core-vs-non-core Substrait. + Enh, + Opt, and + Row all share the property that they modify a relation without introducing a new child in the plan hierarchy.

Why [...] instead of (...)

The + <name>[...] pattern is context-independent — the parser can identify and parse a + Row[...] line from its first tokens alone, without knowing what relation it's attached to. Bare (1, 'alice') on an indented line would be the only indented content that isn't a relation or a + ... addendum, breaking the invariant that line type is identifiable from the first token.

On dropping _ =>:

I think => should stay for the moment. Currently, => is required on every relation that can have arguments; it's what disambiguates "these are arguments" from "these are output columns." _ => means "I could have arguments here, I just don't". that's structurally different from Root and Project, which never take arguments and so never use =>. Dropping it would make VirtualTable the first relation where name:type column definitions appear inside [...] with no separator, blurring that visual pattern. That said, making => optional more broadly is worth exploring: see #104; I rather like the idea.

Using - for rows

I actually quite like using - as a lighter-weight prefix for "additional arguments" more generally. It could work for extension nodes with verbose argument lists too, not just VirtualTable rows. Filed #105 for that.

I'll update this PR to inline-only and we can iterate on the verbose form separately.

benbellick · 2026-04-17T13:52:45Z

Not for this PR but future work: it is not clear to me from the name foundation alone what is meant to go in this file. I propose that we either:

break this up into a few files with clearer names, or

add a docstring here clarfying what is intended to go here.

Mm, yes, good point!

benbellick · 2026-04-17T13:59:16Z

+                }
+            }
+            Some(ReadType::VirtualTable(vt)) => {
+                let col_count = columns.len();


just a suggestion, take it or leave it, but what do you think about extracting this out into a helper function? The nesting is getting pretty deep here.

wackywendell · 2026-04-20T15:18:20Z

Closing in favor of #106 / #107.

wackywendell force-pushed the wendell/virtual-table branch from aa68bf3 to c2d2cd1 Compare April 14, 2026 18:41

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread src/parser/structural.rs

Comment thread src/parser/relations.rs Outdated

wackywendell force-pushed the wendell/virtual-table branch 4 times, most recently from 1623980 to 60845a4 Compare April 15, 2026 16:37

feat: support VirtualTable Read with inline and verbose text formats

ac72172

wackywendell force-pushed the wendell/virtual-table branch 2 times, most recently from 01b1d09 to 50e2228 Compare April 15, 2026 23:10

refactor: self-contained RelationParsePair with identity emit

b95662b

wackywendell force-pushed the wendell/virtual-table branch from 50e2228 to f52fe6e Compare April 15, 2026 23:17

wackywendell marked this pull request as ready for review April 16, 2026 15:24

wackywendell requested a review from a team as a code owner April 16, 2026 15:24

chatgpt-codex-connector Bot reviewed Apr 16, 2026

View reviewed changes

refactor: structured errors and code quality polish

6d2ec1d

wackywendell force-pushed the wendell/virtual-table branch from f52fe6e to 6d2ec1d Compare April 16, 2026 22:13

benbellick reviewed Apr 17, 2026

View reviewed changes

wackywendell closed this Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support VirtualTable Read with inline and verbose text formats#101

feat: support VirtualTable Read with inline and verbose text formats#101
wackywendell wants to merge 3 commits intomainfrom
wendell/virtual-table

wackywendell commented Apr 14, 2026 •

edited

Loading

Uh oh!

wackywendell commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 16, 2026

Uh oh!

benbellick left a comment

Uh oh!

benbellick Apr 17, 2026

Uh oh!

wackywendell Apr 17, 2026

Uh oh!

benbellick Apr 17, 2026

Uh oh!

wackywendell Apr 17, 2026

Uh oh!

benbellick Apr 17, 2026

Uh oh!

wackywendell commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wackywendell commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Design decisions

Parser refactoring (included)

Type of Change

Testing

Checklist

Related Issues

Uh oh!

wackywendell commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

benbellick left a comment

Choose a reason for hiding this comment

Uh oh!

benbellick Apr 17, 2026

Choose a reason for hiding this comment

same as inline but with new lines

using dashes to demarcate new rows

Uh oh!

wackywendell Apr 17, 2026

Choose a reason for hiding this comment

Why + for rows

Why [...] instead of (...)

On dropping _ =>:

Using - for rows

Uh oh!

benbellick Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

wackywendell Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

benbellick Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

wackywendell commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wackywendell commented Apr 14, 2026 •

edited

Loading

Why `+` for rows

Why `[...]` instead of `(...)`

On dropping `_ =>`:

Using `-` for rows