-
Notifications
You must be signed in to change notification settings - Fork 817
[SM6.10][LinAlg] Impl MatrixRef, CreateMatrix builtins #7883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| @@ -0,0 +1,8 @@ | |||
| // RUN: %dxc -T cs_6_9 -E main %s -verify | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure I also want a .ll test but I don't know the magic incantation to generate the IR upto but right before the dxilgen pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./utils/hct/ExtractIRForPassTest.py -p dxilgen -o output.ll input.hlsl -- -T cs_6_9 -E main
damyanp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through this and didn't spot any obvious typos.
Do we need to apply the same "experimental DXIL" stuff to this that we're talking about for other SM 6.10 features?
| LICOMPTYPE_VK_BUFFER_POINTER = 55, | ||
| LICOMPTYPE_COUNT = 56 | ||
| #else | ||
| LICOMPTYPE_COUNT = 54 | ||
| LICOMPTYPE_COUNT = 55 | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter that these values have changed?
I've no reason to think it does, just checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming no but maybe someone else can weigh in. The ifdef basically forces them to change though since the last number is only incremented for SPIRV. Beyond that this seems to be an internal enum so changing it shouldn't be observable to end users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are mostly internal details, but the built-in intrinsic extension mechanism relies on this header (and this enum) as well. That's currently only used by Xbox backend, and wouldn't be impacted by these changes unless updated to depend on them, and even then, using the latest header would keep it in sync.
Ideally, I think we should get rid of the ifdef here (so LICOMPTYPE_VK_BUFFER_POINTER is always part of the enumeration), and update the corresponding table g_LegalIntrinsicCompTypes in SemaHLSL.cpp. But that should be a separate change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the ifdef SPIRV conditionals were removed in similar areas. This one was missed possibly as an oversight.
Yep |
|
In a discussion about a month ago, we agreed that createMatrix as written wasn't useful because it has no way of uniquely identifying the matrix in question, so any and all createMatrix calls will get merged as duplicates. From this discussion, we agreed to remove it, which would make the builtin unnecessary. In my own further investigation that I've been trying to get the opportunity to discuss for weeks, I've determined that we may need a createMatrix that is generated for local or global allocas, but not parameters since, without global variables, we have to use the alloca as the unique identifier and the fact that they are opaque handles makes it difficult to cleanly merge duplicate parameter allocas through inlining and subsequent passes. In any event, a creatematrix call that only takes the opcode isn't of much use. |
|
Per our discussion, you can disregard my last comment, however I just remembered that when I proposed a builtin for creatematrix and annotatematrix, @llvm-beanz rejected it on account of it would allow the header implementation or other method using builtins to generate invalid dxil. I found that a compelling argument. I don't know if he changed his mind since. Similar create dxil ops are generated as part of clang CodeGen. As an example I wouldn't strongly recommend following exactly, but as the broad idea, see how resource handle creation is generated:
The way nodehandle creation is generated is the way I would/have approached it though it will be through local/global variables instead of entry function params: DirectXShaderCompiler/tools/clang/lib/CodeGen/CGHLSLMS.cpp Lines 3852 to 3853 in 6679358
|
I think a builtin for The annotate builtin is a bit more complicated but it isn't the "creating invalid dxil" that concerns me as much as I'm unsure that the annotate builtin could be used to create valid DXIL. I'm not strongly opposed to either of these additions a lot of it just depends on the implementation details as we progress. |
Are you saying that create will have a builtin, but annotate will not? As a sanity check, where do you see the create builtin being called in the linalg.h header? |
Currently the plan is that they both have builtins, and the annotate is just overemitted and then cleaned up/deduped in a late pass
Create and initial annotate are in the constructor of the |
|
Using the implicit constructor for cases where the matrix is just declared and not initialized makes sense. I requested/suggested further explanation of expected behavior for uninitialized matrices in microsoft/hlsl-specs#691. I haven't heard back about that issue. I considered that a default splat of zeros or something would give a reasonable outcome. For what I've taken to calling the "matrix population functions" (loads, splat, cast, and outerproduct), were you still thinking of relying on the constructor or calling createMatrix and annotateMatrix directly there? If not, I don't see how either builtin would ever be explicitly called from HLSL. The constructor presents a complication if we allow these to be declared in global scope. I said in microsoft/hlsl-specs#695 that I assume that cbuffer is out, but statics might be in. I haven't heard back about that issue. Is the intent to allow statics? @llvm-beanz said that annotate is more complicated and it is. It may have some relation to the createMatrix approach. I presume you meant you were going to call it from the functions taking a matrix parameter, but if the user creates a function that takes a dx::linalg::Matrix parameter, there will need to be an annotate on that. How were you thinking about inserting that annotate? Overall, this sounds like my rejected proposal of two months ago. I'm not against it, I'm just trying to see if that's the case and whether there are other details that need considering before committing to it. |
pow2clk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pointing out a potential pitfall for annotateMatrix. If you've started on that work and already have a solution, I'd be interested to know how it works without templated type information.
| CXXRecordDecl *hlsl::DeclareMatrixRefType(ASTContext &Context) { | ||
| // MatrixRef { ... } | ||
| BuiltinTypeDeclBuilder TypeDeclBuilder(Context.getTranslationUnitDecl(), | ||
| "__builtin_la_MatrixRef"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked this before, without template arguments, I don't know where you will get the property information that the matrix provides as it is solely stored in the header-defined type that contains this type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I guess this relates to my previous question of whether annotateMatrix is expected to be called explicitly from HLSL. in that case, the type information from the header would be available, but that's not available for user function parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The property information would be tracked by the templated header class and forwarded along to annotate as needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have that working in a branch somewhere? I just don't see how that information would be available to clang in a user function needing an annotate without adding a dependency at the compiler implementation level to a class written in an HLSL header, which seems like a bad idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't no. I was planning on getting deep into things once I got back but this whole discussion preempted that :)
I'm not sure where you see a potential compiler dependency on the class though, this works (although optimization seems to be deleting everything) without any special knowledge from the compiler and it is pretty much a strawman of what I plan to do. https://godbolt.org/z/dnYfn7eTv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that. That's what I suspected. That works fine for methods of the matrix class defined in the header, but what about user functions that pass matrices around? They shouldn't pass the MatrixRef around, but there shouldn't be any reason they can't pass the dx::linalg::matrix class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have specific user functions in mind? The whole point of dx::linalg::Matrix is to restrict the matrix usage down to only the allowed operations. Anything that a user would want to do with the MatrixRef should be accessable via the header API
| void [[min_sm=6.10]] __builtin_VectorAccumulate(in LinAlg<c> InputVector, in RWByteAddressBuffer MatrixBuffer, in uint MatrixOffset); | ||
|
|
||
|
|
||
| MatrixRef [[min_sm=6.10]] __builtin_la_CreateMatrix(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my previous comment, if MatrixRef has template parameters, then it can't be uniquely a return type. It can be done with an output variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is for MatrixRef to not have template parameters
|
While the current plan (as I discussed with @V-FEXrt) is to add the annotate call to every method that consumes or produces a matrix, I'm not so sure the memory semantics for the I think the crux of the problem is that in HLSL the object looks like it is the memory for the matrix, or at least automatically manages the memory. In DXIL, this translates to a For instance, depending on the |
These are interesting issues. At least in the case of cast, I expected that the output matrixref was distinct from the one passed in. I was expecting that we'd follow the intuition of resources in which case the handle does reference the memory, but this is different in that the operations don't just sample from that memory, but completely populate or dump it to memory. The decisions made here might affect whether a default splat for uninitialized matrices like I described above will be possible or not because it will determine if the compiiler can identify that the splat isn't necessary if it is subsequently assigned. Here I think the recently added hitobjects are illustrative as they default to a noop assignment, but can be eliminated if it is subsequently assigned to something that overrides it. Interesting as all that is, can you help me understand how it relates to the question at hand? Regardless of whether they refer to the memory, I would think the matrixref would be consigned to the type that its parent class was declared as. Perhaps it was in reference to something else I said? Are you saying that user functions might not need annotates for their parameters or that there is another way to solve that problem? |
|
I opened this issue microsoft/hlsl-specs#756 against the spec, because I think the root issue is a lack of clear memory semantics/lifetime/ownership definition in the spec for the HLSL object and the matrix data it references. |
The explicit operations aren't the problem, since they would simply take these handles that refer to some memory managed elsewhere. The problem is the management of that memory, and how the HLSL code translates to the correct number of objects with appropriate lifetimes that the driver can reason about in all cases.
The key difference from resources is that a resource handle references something externally managed, rather than something managed by the shader itself.
No, the plan was to annotate on input and on create (constructor would create+annotate) for every operation. Annotate on input includes To illustrate, there would be something like a meant-for-internal-use namespace linalg {
template <...>
class Matrix {
// Meant to be private, but HLSL doesn't support private:
MatrixRef _Matrix;
MatrixRef _getAnnotatedHandle() {
return __builtin_la_AnnotateMatrix(_Matrix, <matrix properties...>);
}
// Public part of the interface:
Matrix() {
_Matrix = __builtin_la_CreateMatrix();
_Matrix = _getAnnotatedHandle();
}
template <ComponentType NewCompTy, MatrixUse NewUse = Use>
Matrix<NewCompTy, M, N, NewUse, Scope> Matrix::Cast() {
Matrix<NewCompTy, M, N, NewUse, Scope> Dest; // will create+annotate
// Need to annotate our own handle locally for the operation:
__builtin_la_CastMatrix(Dest._Matrix, _getAnnotatedHandle());
return Dest;
}
template <ComponentEnum LHSTy, ComponentEnum RHSTy, uint K,
MatrixUseEnum UseLocal = Use>
typename hlsl::enable_if<Use == MatrixUse::Accumulator && UseLocal == Use,
void>::type
MultiplyAccumulate(const Matrix<LHSTy, M, K, MatrixUse::A, Scope> A,
const Matrix<RHSTy, K, N, MatrixUse::B, Scope> B) {
__builtin_la_MultiplyAccumulateMatrix(_getAnnotatedHandle(),
A._getAnnotatedHandle(),
B._getAnnotatedHandle());
}
...
};
}This doesn't solve the memory management problem I'm referring to though. |
tex3d
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be updated for the experimental DXIL op table changes, however, I had some concerns about the naming of the HLSL intrinsic and DXIL op, and I think other questions may get in the way related to how the current approach will solve the memory management of the matrix contents of any given HLSL Matrix object.
| ) | ||
|
|
||
| self.add_dxil_op( | ||
| "CreateMatrix", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CreateMatrix seems overly generic. What about CreateLinAlgMatrix or CreateMatrixRef?
I'm a bit concerned with this just being "Matrix" when we have a different concept of matrix in HLSL, and with preserving vectors in IR, I could even imagine doing matrix operations on those vectors instead of scalarizing those operations, and needing to differentiate this sort of operation from the more abstract LinAlg MatrixRef.
This has me concerned with the other DXIL op names too, which seem overly generic for how specific they actually are, targeting a specialized hardware feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to update this but it's important to know that it would be a spec change and needs to be updated over there first.
https://github.com/microsoft/hlsl-specs/blob/main/proposals/0035-linalg-matrix.md#dxil-operations
| ) | ||
| for i in ( | ||
| "MatVecMul,MatVecMulAdd,OuterProductAccumulate,VectorAccumulate" | ||
| "MatVecMul,MatVecMulAdd,OuterProductAccumulate,VectorAccumulate,CreateMatrix" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rebase, move the new one to the end of populate_ExperimentalOps:
DirectXShaderCompiler/utils/hct/hctdb.py
Line 6061 in ba86cc0
| def populate_ExperimentalOps(self): |
| % next_op_idx | ||
| ) | ||
|
|
||
| self.add_dxil_op( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rebase, move this to the end of populate_categories_and_models_ExperimentalOps:
DirectXShaderCompiler/utils/hct/hctdb.py
Line 1042 in ba86cc0
| def populate_categories_and_models_ExperimentalOps(self): |
| void [[min_sm=6.10]] __builtin_VectorAccumulate(in LinAlg<c> InputVector, in RWByteAddressBuffer MatrixBuffer, in uint MatrixOffset); | ||
|
|
||
|
|
||
| MatrixRef [[min_sm=6.10]] __builtin_la_CreateMatrix(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this extra level of abbreviation is helpful: la.
Can we just use linalg instead?
Fixes #7846
Implments:
__builtin_la_MatrixRef__builtin_la_CreateMatrix()%dx.types.MatrixRef@dx.op.createMatrix(i32 312)Note: The DXIL op codes will change in response to microsoft/hlsl-specs#698