feat: support VirtualTable Read with inline and verbose text formats#101
feat: support VirtualTable Read with inline and verbose text formats#101wackywendell wants to merge 3 commits intomainfrom
Conversation
aa68bf3 to
c2d2cd1
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2d2cd16ee
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
1623980 to
60845a4
Compare
01b1d09 to
50e2228
Compare
50e2228 to
f52fe6e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f52fe6e74b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let expression_list = unwrap_single_pair(pair); | ||
| assert_eq!(expression_list.as_rule(), Rule::expression_list); | ||
| nested::Struct { | ||
| fields: parse_expression_list(extensions, expression_list), |
There was a problem hiding this comment.
Avoid panicking on invalid VirtualTable row expressions
parse_virtual_row now builds row fields via parse_expression_list, but that helper unwraps each Expression::parse_pair with expect(...) instead of returning a MessageParseError. If a Read:Virtual row contains a semantically invalid expression (for example, a function call with no matching extension declaration), parsing will panic and abort instead of returning a normal parse error. This makes malformed user input crash the parser path introduced for VirtualTable rows.
Useful? React with 👍 / 👎.
f52fe6e to
6d2ec1d
Compare
benbellick
left a comment
There was a problem hiding this comment.
Thanks for this! I didn't look too closely at the code because it may change based off of the discussion around the verbose form of the VirtualTables. One way we could move forwards though and make the PR smaller could be to open a new PR where we just handle the compact form, which I am fully in agreement with. Then after that we can tackle the verbose form separately. What do you think?
| ("+ Row" "[" expression ("," expression)* "]")* | ||
| ``` | ||
|
|
||
| The `+ Row` lines are addenda attached to the `Read:Virtual` relation (same structural role as `+ Enh:` / `+ Opt:` lines). They must appear before any child relations. |
There was a problem hiding this comment.
I think that the above short form is great, though I have some questions about the verbose form.
I want to ask why + Row has the same structural role as + Enh and + Opt. Both enhancements and optimizations are additions on top of the core substrait, and the + communicates that it is an addition on top of a core. On the other hand, virtual tables are a core part of the spec. It would make sense to me for them to be visually different.
One other thing, it is counterintuitive to me that the inline format would use parentheses, but the verbose form would use brackets. Parenthesis communicate tuple-like things with a fixed width, so I think they are more appropriate for representing relations in both cases.
What do you think about one of the following alternative representations?
same as inline but with new lines
Root[id, name]
Read:Virtual[_ => id:i64, name:string]
(1, 'alice')
(2, 'bob')
using dashes to demarcate new rows
Root[id, name]
Read:Virtual[_ => id:i64, name:string]
- (1, 'alice')
- (2, 'bob')
And then one more item, it looks like there is precedent in other relations for not having the => syntax in the brackets after the relation name. What do you think about just dropping the _ => after the verbose form?
E.g.
Root[id, name]
Read:Virtual[id:i64, name:string]
(1, 'alice')
(2, 'bob')
There was a problem hiding this comment.
Thanks for the thoughtful feedback! I agree on splitting this. The inline form is ready now, and the verbose form deserves its own design discussion. I'll strip the verbose form out of this PR and file issues for the design questions you raised.
On the design questions:
Why + for rows
In my mind, + means "this is not a new operator in the tree; it's additional data attached to the operator above." It's about tree structure — how lines relate to their parent — rather than core-vs-non-core Substrait. + Enh, + Opt, and + Row all share the property that they modify a relation without introducing a new child in the plan hierarchy.
Why [...] instead of (...)
The + <name>[...] pattern is context-independent — the parser can identify and parse a + Row[...] line from its first tokens alone, without knowing what relation it's attached to. Bare (1, 'alice') on an indented line would be the only indented content that isn't a relation or a + ... addendum, breaking the invariant that line type is identifiable from the first token.
On dropping _ =>:
I think => should stay for the moment. Currently, => is required on every relation that can have arguments; it's what disambiguates "these are arguments" from "these are output columns." _ => means "I could have arguments here, I just don't". that's structurally different from Root and Project, which never take arguments and so never use =>. Dropping it would make VirtualTable the first relation where name:type column definitions appear inside [...] with no separator, blurring that visual pattern. That said, making => optional more broadly is worth exploring: see #104; I rather like the idea.
Using - for rows
I actually quite like using - as a lighter-weight prefix for "additional arguments" more generally. It could work for extension nodes with verbose argument lists too, not just VirtualTable rows. Filed #105 for that.
I'll update this PR to inline-only and we can iterate on the verbose form separately.
There was a problem hiding this comment.
Not for this PR but future work: it is not clear to me from the name foundation alone what is meant to go in this file. I propose that we either:
- break this up into a few files with clearer names, or
- add a docstring here clarfying what is intended to go here.
There was a problem hiding this comment.
Mm, yes, good point!
| } | ||
| } | ||
| Some(ReadType::VirtualTable(vt)) => { | ||
| let col_count = columns.len(); |
There was a problem hiding this comment.
just a suggestion, take it or leave it, but what do you think about extracting this out into a helper function? The nesting is getting pretty deep here.
Description
Add support for VirtualTable in ReadRel (closes #56), enabling inline literal data tables in Substrait plans — similar to SQL
VALUESclauses.Two text formats are supported:
Inline (default for small tables,
rows * columns <= 8):Verbose (default for larger tables):
Both forms are always accepted by the parser; the textifier selects based on a configurable threshold. Mixed form (inline rows +
+ Rowaddenda) is accepted but never produced.Design decisions
Read:Virtualuses theType:Subtypecolon pattern (consistent withExtensionLeaf:Name)+ Rowuses the+-prefixed addendum pattern (same structural role as+ Enh:/+ Opt:)Addendumenum unifiesAdvExtandRowhandling in the structural parserParser refactoring (included)
VirtualTable support exposed an opportunity to make the parser trait more self-contained:
RelationParsePairredesign:parse_pair_with_contextnow returns(Self, usize)— each relation type computes its own output field count during parsing rather than having it reconstructed afterward.into_reltakesRelationAddenda(pre-parsed rows + advanced extension) so each type handles its own addenda.make_emithelper producesEmitKind::Directfor identity output mappings instead of an explicitEmit, matching Substrait semantics. This also fixes a Fetch textifier regression where passthrough columns were emitted as empty whenEmitKind::Directwas in effect.ParseError::ValidationErrornow carriesParseContextfor line-precise diagnostics.+ Enh:/+ Opt:addenda with an explicit error (previously silently applied).Type of Change
Testing
Checklist
Related Issues
Closes #56