Skip to content

feat: support VirtualTable Read with inline and verbose text formats#101

Closed
wackywendell wants to merge 3 commits intomainfrom
wendell/virtual-table
Closed

feat: support VirtualTable Read with inline and verbose text formats#101
wackywendell wants to merge 3 commits intomainfrom
wendell/virtual-table

Conversation

@wackywendell
Copy link
Copy Markdown
Collaborator

@wackywendell wackywendell commented Apr 14, 2026

Description

Add support for VirtualTable in ReadRel (closes #56), enabling inline literal data tables in Substrait plans — similar to SQL VALUES clauses.

Two text formats are supported:

Inline (default for small tables, rows * columns <= 8):

Read:Virtual[(1, 'alice'), (2, 'bob') => id:i64, name:string]

Verbose (default for larger tables):

Read:Virtual[_ => id:i64, name:string]
  + Row[1, 'alice']
  + Row[2, 'bob']

Both forms are always accepted by the parser; the textifier selects based on a configurable threshold. Mixed form (inline rows + + Row addenda) is accepted but never produced.

Design decisions

  • Read:Virtual uses the Type:Subtype colon pattern (consistent with ExtensionLeaf:Name)
  • + Row uses the +-prefixed addendum pattern (same structural role as + Enh: / + Opt:)
  • Addendum enum unifies AdvExt and Row handling in the structural parser

Parser refactoring (included)

VirtualTable support exposed an opportunity to make the parser trait more self-contained:

  • RelationParsePair redesign: parse_pair_with_context now returns (Self, usize) — each relation type computes its own output field count during parsing rather than having it reconstructed afterward. into_rel takes RelationAddenda (pre-parsed rows + advanced extension) so each type handles its own addenda.
  • Identity emit: new make_emit helper produces EmitKind::Direct for identity output mappings instead of an explicit Emit, matching Substrait semantics. This also fixes a Fetch textifier regression where passthrough columns were emitted as empty when EmitKind::Direct was in effect.
  • Structured error messages: ParseError::ValidationError now carries ParseContext for line-precise diagnostics.
  • Extension relations now reject + Enh: / + Opt: addenda with an explicit error (previously silently applied).

Type of Change

  • New feature
  • Documentation update

Testing

  • Added 8 roundtrip tests (inline, verbose, empty table, single-column, large table uses verbose, VirtualTable in complex plan, error paths)
  • All existing tests pass (309 total)
  • GRAMMAR.md doc tests compile and pass

Checklist

  • Code follows Rust conventions
  • Documentation updated if needed
  • No breaking changes (or breaking changes documented)
  • Pre-commit hooks pass

Related Issues

Closes #56

@wackywendell wackywendell force-pushed the wendell/virtual-table branch from aa68bf3 to c2d2cd1 Compare April 14, 2026 18:41
@wackywendell
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2d2cd16ee

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/parser/structural.rs
Comment thread src/parser/relations.rs Outdated
@wackywendell wackywendell force-pushed the wendell/virtual-table branch 4 times, most recently from 1623980 to 60845a4 Compare April 15, 2026 16:37
@wackywendell wackywendell force-pushed the wendell/virtual-table branch 2 times, most recently from 01b1d09 to 50e2228 Compare April 15, 2026 23:10
@wackywendell wackywendell force-pushed the wendell/virtual-table branch from 50e2228 to f52fe6e Compare April 15, 2026 23:17
@wackywendell wackywendell marked this pull request as ready for review April 16, 2026 15:24
@wackywendell wackywendell requested a review from a team as a code owner April 16, 2026 15:24
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f52fe6e74b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/parser/relations.rs
let expression_list = unwrap_single_pair(pair);
assert_eq!(expression_list.as_rule(), Rule::expression_list);
nested::Struct {
fields: parse_expression_list(extensions, expression_list),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid panicking on invalid VirtualTable row expressions

parse_virtual_row now builds row fields via parse_expression_list, but that helper unwraps each Expression::parse_pair with expect(...) instead of returning a MessageParseError. If a Read:Virtual row contains a semantically invalid expression (for example, a function call with no matching extension declaration), parsing will panic and abort instead of returning a normal parse error. This makes malformed user input crash the parser path introduced for VirtualTable rows.

Useful? React with 👍 / 👎.

@wackywendell wackywendell force-pushed the wendell/virtual-table branch from f52fe6e to 6d2ec1d Compare April 16, 2026 22:13
Copy link
Copy Markdown
Contributor

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I didn't look too closely at the code because it may change based off of the discussion around the verbose form of the VirtualTables. One way we could move forwards though and make the PR smaller could be to open a new PR where we just handle the compact form, which I am fully in agreement with. Then after that we can tackle the verbose form separately. What do you think?

Comment thread GRAMMAR.md
("+ Row" "[" expression ("," expression)* "]")*
```

The `+ Row` lines are addenda attached to the `Read:Virtual` relation (same structural role as `+ Enh:` / `+ Opt:` lines). They must appear before any child relations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the above short form is great, though I have some questions about the verbose form.

I want to ask why + Row has the same structural role as + Enh and + Opt. Both enhancements and optimizations are additions on top of the core substrait, and the + communicates that it is an addition on top of a core. On the other hand, virtual tables are a core part of the spec. It would make sense to me for them to be visually different.

One other thing, it is counterintuitive to me that the inline format would use parentheses, but the verbose form would use brackets. Parenthesis communicate tuple-like things with a fixed width, so I think they are more appropriate for representing relations in both cases.

What do you think about one of the following alternative representations?

same as inline but with new lines

Root[id, name]
  Read:Virtual[_ => id:i64, name:string]
    (1, 'alice')
    (2, 'bob')

using dashes to demarcate new rows

Root[id, name]
  Read:Virtual[_ => id:i64, name:string]
    - (1, 'alice')
    - (2, 'bob')

And then one more item, it looks like there is precedent in other relations for not having the => syntax in the brackets after the relation name. What do you think about just dropping the _ => after the verbose form?

E.g.

Root[id, name]
  Read:Virtual[id:i64, name:string]
    (1, 'alice')
    (2, 'bob')

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thoughtful feedback! I agree on splitting this. The inline form is ready now, and the verbose form deserves its own design discussion. I'll strip the verbose form out of this PR and file issues for the design questions you raised.

On the design questions:

Why + for rows

In my mind, + means "this is not a new operator in the tree; it's additional data attached to the operator above." It's about tree structure — how lines relate to their parent — rather than core-vs-non-core Substrait. + Enh, + Opt, and + Row all share the property that they modify a relation without introducing a new child in the plan hierarchy.

Why [...] instead of (...)

The + <name>[...] pattern is context-independent — the parser can identify and parse a + Row[...] line from its first tokens alone, without knowing what relation it's attached to. Bare (1, 'alice') on an indented line would be the only indented content that isn't a relation or a + ... addendum, breaking the invariant that line type is identifiable from the first token.

On dropping _ =>:

I think => should stay for the moment. Currently, => is required on every relation that can have arguments; it's what disambiguates "these are arguments" from "these are output columns." _ => means "I could have arguments here, I just don't". that's structurally different from Root and Project, which never take arguments and so never use =>. Dropping it would make VirtualTable the first relation where name:type column definitions appear inside [...] with no separator, blurring that visual pattern. That said, making => optional more broadly is worth exploring: see #104; I rather like the idea.

Using - for rows

I actually quite like using - as a lighter-weight prefix for "additional arguments" more generally. It could work for extension nodes with verbose argument lists too, not just VirtualTable rows. Filed #105 for that.

I'll update this PR to inline-only and we can iterate on the verbose form separately.

Comment thread src/textify/foundation.rs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but future work: it is not clear to me from the name foundation alone what is meant to go in this file. I propose that we either:

  • break this up into a few files with clearer names, or
  • add a docstring here clarfying what is intended to go here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, yes, good point!

Comment thread src/textify/rels.rs
}
}
Some(ReadType::VirtualTable(vt)) => {
let col_count = columns.len();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a suggestion, take it or leave it, but what do you think about extracting this out into a helper function? The nesting is getting pretty deep here.

@wackywendell
Copy link
Copy Markdown
Collaborator Author

Closing in favor of #106 / #107.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support Textify Representation of VirtualTable ReadRel Type

2 participants