-
Notifications
You must be signed in to change notification settings - Fork 5
feat: implement real relation type
#78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement real relation type
#78
Conversation
d5e458e to
6939882
Compare
|
I am still unsure about what assembly format is best. We have the following possibilities: If no type name is the default, we have to rely on the reader's knowledge that the ops always return relations. If we always print the fully qualified name, some ops get quite verbose (set ops can have several operands plus the result, which would all repeat that it's a relation). The pseudo-typenames would really be a keyword, which (1) would need to be defined for each op and (2) wouldn't color as a type in IDEs. Any opinions, @jpienaar or @dshaaban01? |
This PR updates the LLVM submodule to llvm/llvm-project@0de2ccab7b, the latest version as of today. That version contains a fix for `nanobind`'s `stubgen` (in llvm/llvm-project#127518), which we require for the typing stubs for substrait-io#78. Signed-off-by: Ingo Müller <[email protected]>
…88) This PR updates the LLVM submodule to llvm/llvm-project@0de2ccab7b, the latest version as of today. That version contains a fix for `nanobind`'s `stubgen` (in llvm/llvm-project#127518), which we require for the typing stubs for #78. This PR requires to update the git modules in existing git clones: ```bash git submodule update --recursive ``` Signed-off-by: Ingo Müller <[email protected]>
349c09a to
864ff3d
Compare
|
@mortbopet: Any opinion from your side? |
This includes wjakob/nanobind#939 and wjakob/nanobind#940, which fixes an issue I encountered while working on substrait-io#78, so we need the new version of `nanobind` for CI to pass and stub generation to be correct for that. Signed-off-by: Ingo Müller <[email protected]>
864ff3d to
141a3af
Compare
|
Well, first, i support adding a relation type to get out of the whole aliasing-with-tuples-in-IR issue. But it is a shame that we can't re-use the tuple type as an internal storage mechanism, and have to copy the type storage struct - which appears to me as non-trivial. Re. assembly format, given the reasons that you highlight, i'd be in favor of the first (no typename). This would also be in line with substrait operations generally having elided their dialect name ( |
This includes wjakob/nanobind#939 and wjakob/nanobind#940, which fixes an issue I encountered while working on #78, so we need the new version of `nanobind` for CI to pass and stub generation to be correct for that. Signed-off-by: Ingo Müller <[email protected]>
141a3af to
ae0d8d4
Compare
Turns out it is and I think it's a good idea. See the latest commit I just pushed. The change is so small that no usage of the op had to changed, so the fact that a
I agree with the goal to avoid IR bloat. How about the last option, though? It only adds three characters per type and may avoid a lot of confusion, in particular, for new users. The biggest downside I see is that it is not a real type name but the only practical implication of that that I see is a different color in IDEs, which is probably acceptable. WDYT? |
|
I though a bit more about the "only a pseudo-type" issue. I think it can be acceptable, and I think it'll provide a better trade-off in other situations as well, namely to reduce the redundancy of nested types, including the field types of static StringRef getTypeKeyword(Type type) {
return TypeSwitch<Type, StringRef>(type)
.Case<LLVMVoidType>([&](Type) { return "void"; })
.Case<LLVMPPCFP128Type>([&](Type) { return "ppc_fp128"; })
.Case<LLVMTokenType>([&](Type) { return "token"; })
// ...
.Default([](Type) -> StringRef {
llvm_unreachable("unexpected 'llvm' type kind");
});
}
void mlir::LLVM::detail::printType(Type type, AsmPrinter &printer) {
// ....
printer << getTypeKeyword(type);
llvm::TypeSwitch<Type>(type)
.Case<LLVMPointerType, LLVMArrayType, LLVMFunctionType, LLVMTargetExtType,
LLVMStructType>([&](auto type) { type.print(printer); });
}
static Type dispatchParse(AsmParser &parser, bool allowAny = true) {
// ...
return StringSwitch<function_ref<Type()>>(key)
.Case("void", [&] { return LLVMVoidType::get(ctx); })
.Case("ppc_fp128", [&] { return LLVMPPCFP128Type::get(ctx); })
.Case("token", [&] { return LLVMTokenType::get(ctx); })
// ...
.Default([&] {
parser.emitError(keyLoc) << "unknown LLVM type: " << key;
return Type();
})();
}That could lead to an assembly format like this: substrait.plan version 0 : 42 : 1 {
relation {
%0 = named_table @t1 as ["a"] : rel<timestamp_tz>
%1 = named_table @t2 as ["b"] : rel<timestamp_tz>
%2 = join inner %0, %1 : rel<timestamp_tz>, rel<timestamp_tz> -> rel<timestamp_tz, timestamp_tz>
yield %2 : rel<timestamp_tz>
}
}Would that be an acceptable trade-off? |
I implemented part of this idea in the last commit of this PR and #131 stacked on top of it in order to see betterhow this could look like. @dshaaban01, @mortbopet: what are your impressions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ingomueller-net sorry for the delay. Glad to see that there was already a version upstream for handling this kind of scenario. I'm on-board with the shorthand version (#131).
| return TypeSwitch<Type, StringRef>(type) | ||
| .Case<RelationType>([&](Type) { return "rel"; }) | ||
| .Default([](Type) -> StringRef { | ||
| llvm_unreachable("unexpected 'llvm' type kind"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is llvm intended here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! No, not intended. Will fix in the next iteration.
OK, cool, thanks for the feedback and no worries about delays -- I am have quite a few distractions these days myself... OK, will try to finish this iteration at some point next week. |
e648b83 to
0a64c30
Compare
relation type [WIP]relation type
This commit introduces a new `RelationType` as the operands and result types of `RelOp`s, where previously `tuple` had been used as a placeholder. Signed-off-by: Ingo Müller <[email protected]>
Signed-off-by: Ingo Müller <[email protected]>
Signed-off-by: Ingo Müller <[email protected]>
This required extending and fixing the custom printer and parser to (1) multiple types and (2) non-Substrait types.
0a64c30 to
f6af0ca
Compare
|
@mortbopet: I finally got around to wrapping up this PR. Do you want to take another look before I merge? |
Once rebased onto #87, this PR will need llvm/llvm-project#127518 to makepyrighthappy.This PR introduces a new
RelationTypeas the operands and result types ofRelOps, where previouslytuplehad been used as a placeholder. While this changes little in the structure of the IR, it (1) underlines the subtle difference between the relation, a container of records, and the records themselves, which had both been represented astuples before and (2) it allows defining an even shorter type name,rel, for that type. The latter is implemented using a custom printer and parser for these "pseudo type names" (which are really keywords for the printer and parser), which lays the basis for similar short-hand forms for some of our other types with long type names.