feat: implement real `relation` type #78

ingomueller-net · 2025-02-04T14:30:48Z

~~Once rebased onto #87, this PR will need llvm/llvm-project#127518 to make pyright happy.~~

This PR introduces a new RelationType as the operands and result types of RelOps, where previously tuple had been used as a placeholder. While this changes little in the structure of the IR, it (1) underlines the subtle difference between the relation, a container of records, and the records themselves, which had both been represented as tuples before and (2) it allows defining an even shorter type name, rel, for that type. The latter is implemented using a custom printer and parser for these "pseudo type names" (which are really keywords for the printer and parser), which lays the basis for similar short-hand forms for some of our other types with long type names.

ingomueller-net · 2025-02-11T13:35:22Z

I am still unsure about what assembly format is best. We have the following possibilities:

%0 = named_table @t1 as ["a"] : <si32>  // no typename by default
%0 = named_table @t1 as ["a"] : !substrait.relation<si32>  // always fully qualified name
%0 = named_table @t1 as ["a"] : relation<si32>  // spelled-out pseudo-typename
%0 = named_table @t1 as ["a"] : rel<si32>  // short pseudo-typename

If no type name is the default, we have to rely on the reader's knowledge that the ops always return relations. If we always print the fully qualified name, some ops get quite verbose (set ops can have several operands plus the result, which would all repeat that it's a relation). The pseudo-typenames would really be a keyword, which (1) would need to be defined for each op and (2) wouldn't color as a type in IDEs. Any opinions, @jpienaar or @dshaaban01?

This PR updates the LLVM submodule to llvm/llvm-project@0de2ccab7b, the latest version as of today. That version contains a fix for `nanobind`'s `stubgen` (in llvm/llvm-project#127518), which we require for the typing stubs for substrait-io#78. Signed-off-by: Ingo Müller <[email protected]>

…88) This PR updates the LLVM submodule to llvm/llvm-project@0de2ccab7b, the latest version as of today. That version contains a fix for `nanobind`'s `stubgen` (in llvm/llvm-project#127518), which we require for the typing stubs for #78. This PR requires to update the git modules in existing git clones: ```bash git submodule update --recursive ``` Signed-off-by: Ingo Müller <[email protected]>

ingomueller-net · 2025-04-24T10:53:16Z

@mortbopet: Any opinion from your side?

This includes wjakob/nanobind#939 and wjakob/nanobind#940, which fixes an issue I encountered while working on substrait-io#78, so we need the new version of `nanobind` for CI to pass and stub generation to be correct for that. Signed-off-by: Ingo Müller <[email protected]>

mortbopet · 2025-04-25T08:08:53Z

Well, first, i support adding a relation type to get out of the whole aliasing-with-tuples-in-IR issue. But it is a shame that we can't re-use the tuple type as an internal storage mechanism, and have to copy the type storage struct - which appears to me as non-trivial.
Is it not possible to have an implementation that keeps the custom printer/parser, but has a TupleType as parameter? That should take care of inferring a type storage, but we'd then still provide the custom builders for API neatness.

Re. assembly format, given the reasons that you highlight, i'd be in favor of the first (no typename). This would also be in line with substrait operations generally having elided their dialect name (getDialectNamespace). But mainly due to the size of the IR; i've already experienced IR bloat through substrait.decimal types (also a long name!) so adding in even more text would clearly worsen this.

This includes wjakob/nanobind#939 and wjakob/nanobind#940, which fixes an issue I encountered while working on #78, so we need the new version of `nanobind` for CI to pass and stub generation to be correct for that. Signed-off-by: Ingo Müller <[email protected]>

ingomueller-net · 2025-05-13T09:05:27Z

Is it not possible to have an implementation that keeps the custom printer/parser, but has a TupleType as parameter? That should take care of inferring a type storage, but we'd then still provide the custom builders for API neatness.

Turns out it is and I think it's a good idea. See the latest commit I just pushed. The change is so small that no usage of the op had to changed, so the fact that a TupleType is used under the hood is barely visible :)

Re. assembly format, given the reasons that you highlight, i'd be in favor of the first (no typename). This would also be in line with substrait operations generally having elided their dialect name (getDialectNamespace). But mainly due to the size of the IR; i've already experienced IR bloat through substrait.decimal types (also a long name!) so adding in even more text would clearly worsen this.

I agree with the goal to avoid IR bloat. How about the last option, though? It only adds three characters per type and may avoid a lot of confusion, in particular, for new users. The biggest downside I see is that it is not a real type name but the only practical implication of that that I see is a different color in IDEs, which is probably acceptable. WDYT?

ingomueller-net · 2025-05-13T14:34:26Z

I though a bit more about the "only a pseudo-type" issue. I think it can be acceptable, and I think it'll provide a better trade-off in other situations as well, namely to reduce the redundancy of nested types, including the field types of !substrait.relation. We have discussed this here for the list type. Concretely, I am thinking to use pseudo-types similar to how the LLVM dialect handles this upstream here:

static StringRef getTypeKeyword(Type type) {
  return TypeSwitch<Type, StringRef>(type)
      .Case<LLVMVoidType>([&](Type) { return "void"; })
      .Case<LLVMPPCFP128Type>([&](Type) { return "ppc_fp128"; })
      .Case<LLVMTokenType>([&](Type) { return "token"; })
     // ...
      .Default([](Type) -> StringRef {
        llvm_unreachable("unexpected 'llvm' type kind");
      });
}

void mlir::LLVM::detail::printType(Type type, AsmPrinter &printer) {
  // ....
  printer << getTypeKeyword(type);
  llvm::TypeSwitch<Type>(type)
      .Case<LLVMPointerType, LLVMArrayType, LLVMFunctionType, LLVMTargetExtType,
            LLVMStructType>([&](auto type) { type.print(printer); });
}

static Type dispatchParse(AsmParser &parser, bool allowAny = true) {
  // ...
  return StringSwitch<function_ref<Type()>>(key)
      .Case("void", [&] { return LLVMVoidType::get(ctx); })
      .Case("ppc_fp128", [&] { return LLVMPPCFP128Type::get(ctx); })
      .Case("token", [&] { return LLVMTokenType::get(ctx); })
      // ...
      .Default([&] {
        parser.emitError(keyLoc) << "unknown LLVM type: " << key;
        return Type();
      })();
}

That could lead to an assembly format like this:

substrait.plan version 0 : 42 : 1 {
  relation {
    %0 = named_table @t1 as ["a"] : rel<timestamp_tz>
    %1 = named_table @t2 as ["b"] : rel<timestamp_tz>
    %2 = join inner %0, %1 : rel<timestamp_tz>, rel<timestamp_tz> -> rel<timestamp_tz, timestamp_tz>
    yield %2 : rel<timestamp_tz>
  }
}

Would that be an acceptable trade-off?

ingomueller-net · 2025-05-16T08:58:40Z

I though a bit more about the "only a pseudo-type" issue.

I implemented part of this idea in the last commit of this PR and #131 stacked on top of it in order to see betterhow this could look like. @dshaaban01, @mortbopet: what are your impressions?

mortbopet

@ingomueller-net sorry for the delay. Glad to see that there was already a version upstream for handling this kind of scenario. I'm on-board with the shorthand version (#131).

mortbopet · 2025-05-17T08:20:20Z

lib/Dialect/Substrait/IR/Substrait.cpp

+  return TypeSwitch<Type, StringRef>(type)
+      .Case<RelationType>([&](Type) { return "rel"; })
+      .Default([](Type) -> StringRef {
+        llvm_unreachable("unexpected 'llvm' type kind");


is llvm intended here?

Good catch! No, not intended. Will fix in the next iteration.

ingomueller-net · 2025-05-18T08:13:34Z

@ingomueller-net sorry for the delay. Glad to see that there was already a version upstream for handling this kind of scenario. I'm on-board with the shorthand version (#131).

OK, cool, thanks for the feedback and no worries about delays -- I am have quite a few distractions these days myself... OK, will try to finish this iteration at some point next week.

This commit introduces a new `RelationType` as the operands and result types of `RelOp`s, where previously `tuple` had been used as a placeholder. Signed-off-by: Ingo Müller <[email protected]>

Signed-off-by: Ingo Müller <[email protected]>

This required extending and fixing the custom printer and parser to (1) multiple types and (2) non-Substrait types.

ingomueller-net · 2025-06-10T12:49:12Z

@mortbopet: I finally got around to wrapping up this PR. Do you want to take another look before I merge?

ingomueller-net force-pushed the relation-type branch 3 times, most recently from d5e458e to 6939882 Compare February 11, 2025 13:29

ingomueller-net mentioned this pull request Feb 19, 2025

chore: update LLVM submodule to latest as of 02/19/2025 (0de2ccab7b) #88

Merged

ingomueller-net force-pushed the relation-type branch from 349c09a to 864ff3d Compare April 24, 2025 10:51

ingomueller-net mentioned this pull request Apr 24, 2025

chore(deps): update to latest version of nanobind #128

Merged

ingomueller-net force-pushed the relation-type branch from 864ff3d to 141a3af Compare April 24, 2025 13:10

ingomueller-net force-pushed the relation-type branch from 141a3af to ae0d8d4 Compare May 13, 2025 09:02

mortbopet reviewed May 17, 2025

View reviewed changes

ingomueller-net force-pushed the relation-type branch from e648b83 to 0a64c30 Compare June 10, 2025 12:40

ingomueller-net changed the title ~~feat: implement real relation type [WIP]~~ feat: implement real relation type Jun 10, 2025

ingomueller-net added 6 commits June 10, 2025 12:42

feat: implement real relation type

b6b60d1

This commit introduces a new `RelationType` as the operands and result types of `RelOp`s, where previously `tuple` had been used as a placeholder. Signed-off-by: Ingo Müller <[email protected]>

refactor: use TupleType parameter and remove storage class

adcbe7c

Signed-off-by: Ingo Müller <[email protected]>

feat: implement short-hand rel type for cross op

6e0bd45

Signed-off-by: Ingo Müller <[email protected]>

feat: implement short-hand rel type for remaining rel ops

9571ce7

feat: implement short-hand rel type for yield op

3c54aca

This required extending and fixing the custom printer and parser to (1) multiple types and (2) non-Substrait types.

doc: add comments to RelationType that had been left as TODOs

f6af0ca

ingomueller-net force-pushed the relation-type branch from 0a64c30 to f6af0ca Compare June 10, 2025 12:43

ingomueller-net requested a review from mortbopet June 10, 2025 12:48

mortbopet approved these changes Jun 11, 2025

View reviewed changes

ingomueller-net merged commit 763519d into substrait-io:main Jun 11, 2025
11 checks passed

ingomueller-net deleted the relation-type branch June 11, 2025 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement real `relation` type #78

feat: implement real `relation` type #78

Uh oh!

ingomueller-net commented Feb 4, 2025 •

edited

Loading

Uh oh!

ingomueller-net commented Feb 11, 2025 •

edited

Loading

Uh oh!

ingomueller-net commented Apr 24, 2025

Uh oh!

mortbopet commented Apr 25, 2025

Uh oh!

ingomueller-net commented May 13, 2025

Uh oh!

ingomueller-net commented May 13, 2025

Uh oh!

ingomueller-net commented May 16, 2025

Uh oh!

mortbopet left a comment •

edited

Loading

Uh oh!

mortbopet May 17, 2025

Uh oh!

ingomueller-net May 18, 2025

Uh oh!

ingomueller-net commented May 18, 2025

Uh oh!

ingomueller-net commented Jun 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: implement real relation type #78

feat: implement real relation type #78

Uh oh!

Conversation

ingomueller-net commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ingomueller-net commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ingomueller-net commented Apr 24, 2025

Uh oh!

mortbopet commented Apr 25, 2025

Uh oh!

ingomueller-net commented May 13, 2025

Uh oh!

ingomueller-net commented May 13, 2025

Uh oh!

ingomueller-net commented May 16, 2025

Uh oh!

mortbopet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mortbopet May 17, 2025

Choose a reason for hiding this comment

Uh oh!

ingomueller-net May 18, 2025

Choose a reason for hiding this comment

Uh oh!

ingomueller-net commented May 18, 2025

Uh oh!

ingomueller-net commented Jun 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: implement real `relation` type #78

feat: implement real `relation` type #78

ingomueller-net commented Feb 4, 2025 •

edited

Loading

ingomueller-net commented Feb 11, 2025 •

edited

Loading

mortbopet left a comment •

edited

Loading