[SM6.10][LinAlg] Impl MatrixRef, CreateMatrix builtins #7883

V-FEXrt · 2025-11-07T03:50:40Z

Implments:

the HLSL type __builtin_la_MatrixRef
the HLSL intrinsic __builtin_la_CreateMatrix()
the DXIL type %dx.types.MatrixRef
the DXIL op @dx.op.createMatrix(i32 312)

Note: The DXIL op codes will change in response to microsoft/hlsl-specs#698

V-FEXrt · 2025-11-07T03:56:33Z

tools/clang/test/SemaHLSL/hlsl/objects/MatrixRef/matrixref_unavailable_pre_sm610.hlsl

@@ -0,0 +1,8 @@
+// RUN: %dxc -T cs_6_9 -E main %s -verify


Pretty sure I also want a .ll test but I don't know the magic incantation to generate the IR upto but right before the dxilgen pass

./utils/hct/ExtractIRForPassTest.py -p dxilgen -o output.ll input.hlsl -- -T cs_6_9 -E main

damyanp

I've read through this and didn't spot any obvious typos.

Do we need to apply the same "experimental DXIL" stuff to this that we're talking about for other SM 6.10 features?

damyanp · 2025-11-07T04:24:37Z

include/dxc/dxcapi.internal.h

+  LICOMPTYPE_VK_BUFFER_POINTER = 55,
+  LICOMPTYPE_COUNT = 56
 #else
-  LICOMPTYPE_COUNT = 54
+  LICOMPTYPE_COUNT = 55
 #endif


Does it matter that these values have changed?

I've no reason to think it does, just checking.

I'm assuming no but maybe someone else can weigh in. The ifdef basically forces them to change though since the last number is only incremented for SPIRV. Beyond that this seems to be an internal enum so changing it shouldn't be observable to end users.

These are mostly internal details, but the built-in intrinsic extension mechanism relies on this header (and this enum) as well. That's currently only used by Xbox backend, and wouldn't be impacted by these changes unless updated to depend on them, and even then, using the latest header would keep it in sync.

Ideally, I think we should get rid of the ifdef here (so LICOMPTYPE_VK_BUFFER_POINTER is always part of the enumeration), and update the corresponding table g_LegalIntrinsicCompTypes in SemaHLSL.cpp. But that should be a separate change.

A lot of the ifdef SPIRV conditionals were removed in similar areas. This one was missed possibly as an oversight.

V-FEXrt · 2025-11-07T17:22:53Z

Do we need to apply the same "experimental DXIL" stuff to this that we're talking about for other SM 6.10 features?

Yep

pow2clk · 2025-11-24T19:57:35Z

In a discussion about a month ago, we agreed that createMatrix as written wasn't useful because it has no way of uniquely identifying the matrix in question, so any and all createMatrix calls will get merged as duplicates. From this discussion, we agreed to remove it, which would make the builtin unnecessary.

In my own further investigation that I've been trying to get the opportunity to discuss for weeks, I've determined that we may need a createMatrix that is generated for local or global allocas, but not parameters since, without global variables, we have to use the alloca as the unique identifier and the fact that they are opaque handles makes it difficult to cleanly merge duplicate parameter allocas through inlining and subsequent passes.

In any event, a creatematrix call that only takes the opcode isn't of much use.

pow2clk · 2025-12-04T08:59:16Z

Per our discussion, you can disregard my last comment, however I just remembered that when I proposed a builtin for creatematrix and annotatematrix, @llvm-beanz rejected it on account of it would allow the header implementation or other method using builtins to generate invalid dxil. I found that a compelling argument. I don't know if he changed his mind since.

Similar create dxil ops are generated as part of clang CodeGen. As an example I wouldn't strongly recommend following exactly, but as the broad idea, see how resource handle creation is generated:

DirectXShaderCompiler/tools/clang/lib/CodeGen/CGHLSLMSFinishCodeGen.cpp

Line 1003 in 6679358

Value *Handle = CreateHandleFromResPtr(arg, HLM, HandleTy, Builder);

The way nodehandle creation is generated is the way I would/have approached it though it will be through local/global variables instead of entry function params:

DirectXShaderCompiler/tools/clang/lib/CodeGen/CGHLSLMS.cpp

Lines 3852 to 3853 in 6679358

    
           TranslateInputNodeRecordArgToHandle(HLM, NodeInputRecordParams); 
        
           TranslateNodeOutputParamToHandle(HLM, NodeOutputParams);

llvm-beanz · 2025-12-04T17:21:02Z

Per our discussion, you can disregard my last comment, however I just remembered that when I proposed a builtin for creatematrix and annotatematrix, @llvm-beanz rejected it on account of it would allow the header implementation or other method using builtins to generate invalid dxil. I found that a compelling argument. I don't know if he changed his mind since.

I think a builtin for createMatrix is fine, and probably necessary. The exact shape it takes might need to shift because we may want to have a single builtin function that generates both a create and an annotate, but what is in this PR is a fine first step.

The annotate builtin is a bit more complicated but it isn't the "creating invalid dxil" that concerns me as much as I'm unsure that the annotate builtin could be used to create valid DXIL. I'm not strongly opposed to either of these additions a lot of it just depends on the implementation details as we progress.

pow2clk · 2025-12-04T17:42:19Z

The annotate builtin is a bit more complicated but it isn't the "creating invalid dxil" that concerns me as much as I'm unsure that the annotate builtin could be used to create valid DXIL. I'm not strongly opposed to either of these additions a lot of it just depends on the implementation details as we progress.

Are you saying that create will have a builtin, but annotate will not?

As a sanity check, where do you see the create builtin being called in the linalg.h header?

V-FEXrt · 2025-12-05T23:05:17Z

Are you saying that create will have a builtin, but annotate will not?

Currently the plan is that they both have builtins, and the annotate is just overemitted and then cleaned up/deduped in a late pass

As a sanity check, where do you see the create builtin being called in the linalg.h header?

Create and initial annotate are in the constructor of the Matrix type

pow2clk · 2025-12-08T08:45:01Z

Using the implicit constructor for cases where the matrix is just declared and not initialized makes sense. I requested/suggested further explanation of expected behavior for uninitialized matrices in microsoft/hlsl-specs#691. I haven't heard back about that issue. I considered that a default splat of zeros or something would give a reasonable outcome. For what I've taken to calling the "matrix population functions" (loads, splat, cast, and outerproduct), were you still thinking of relying on the constructor or calling createMatrix and annotateMatrix directly there? If not, I don't see how either builtin would ever be explicitly called from HLSL.

The constructor presents a complication if we allow these to be declared in global scope. I said in microsoft/hlsl-specs#695 that I assume that cbuffer is out, but statics might be in. I haven't heard back about that issue. Is the intent to allow statics?

@llvm-beanz said that annotate is more complicated and it is. It may have some relation to the createMatrix approach. I presume you meant you were going to call it from the functions taking a matrix parameter, but if the user creates a function that takes a dx::linalg::Matrix parameter, there will need to be an annotate on that. How were you thinking about inserting that annotate?

Overall, this sounds like my rejected proposal of two months ago. I'm not against it, I'm just trying to see if that's the case and whether there are other details that need considering before committing to it.

pow2clk

Just pointing out a potential pitfall for annotateMatrix. If you've started on that work and already have a solution, I'd be interested to know how it works without templated type information.

pow2clk · 2025-12-08T22:10:06Z

tools/clang/lib/AST/ASTContextHLSL.cpp

+CXXRecordDecl *hlsl::DeclareMatrixRefType(ASTContext &Context) {
+  // MatrixRef { ... }
+  BuiltinTypeDeclBuilder TypeDeclBuilder(Context.getTranslationUnitDecl(),
+                                         "__builtin_la_MatrixRef");


I overlooked this before, without template arguments, I don't know where you will get the property information that the matrix provides as it is solely stored in the header-defined type that contains this type

Though I guess this relates to my previous question of whether annotateMatrix is expected to be called explicitly from HLSL. in that case, the type information from the header would be available, but that's not available for user function parameters.

The property information would be tracked by the templated header class and forwarded along to annotate as needed

Do you have that working in a branch somewhere? I just don't see how that information would be available to clang in a user function needing an annotate without adding a dependency at the compiler implementation level to a class written in an HLSL header, which seems like a bad idea.

I don't no. I was planning on getting deep into things once I got back but this whole discussion preempted that :)

I'm not sure where you see a potential compiler dependency on the class though, this works (although optimization seems to be deleting everything) without any special knowledge from the compiler and it is pretty much a strawman of what I plan to do. https://godbolt.org/z/dnYfn7eTv

Thanks for that. That's what I suspected. That works fine for methods of the matrix class defined in the header, but what about user functions that pass matrices around? They shouldn't pass the MatrixRef around, but there shouldn't be any reason they can't pass the dx::linalg::matrix class

Do you have specific user functions in mind? The whole point of dx::linalg::Matrix is to restrict the matrix usage down to only the allowed operations. Anything that a user would want to do with the MatrixRef should be accessable via the header API

pow2clk · 2025-12-08T22:10:41Z

utils/hct/gen_intrin_main.txt

 void [[min_sm=6.10]] __builtin_VectorAccumulate(in LinAlg<c> InputVector, in RWByteAddressBuffer MatrixBuffer, in uint MatrixOffset);

+
+MatrixRef [[min_sm=6.10]] __builtin_la_CreateMatrix();


Related to my previous comment, if MatrixRef has template parameters, then it can't be uniquely a return type. It can be done with an output variable

The intention is for MatrixRef to not have template parameters

tex3d · 2025-12-08T22:20:12Z

While the current plan (as I discussed with @V-FEXrt) is to add the annotate call to every method that consumes or produces a matrix, I'm not so sure the memory semantics for the linalg::Matrix type are entirely worked out in a consistent way.

I think the crux of the problem is that in HLSL the object looks like it is the memory for the matrix, or at least automatically manages the memory. In DXIL, this translates to a dx.types.MatrixRef, which just refers to a matrix. Presumably the MatrixRef value in DXIL does not represent the actual memory allocated, but the dx.op.createMatrix operation essentially allocates the matrix referred to by the MatrixRef. So, how do we track allocation lifetime? What happens when you assign one linalg::Matrix to another, or pass it through a function parameter (which copies it)? What rules guarantee the driver can track lifetimes and eliminate copies or re-use memory?

For instance, depending on the linalg::MatrixScope, the linalg::Matrix could refer to per-thread matrix loaded from global mem, which is meant to be folded into an operation through optimization without actually allocating any local memory. Or it could refer to local per-thread fragments making up a wave-scope matrix, or to groupshared memory, which isn't counted in our normal groupshared memory accounting for a shader.

pow2clk · 2025-12-08T23:00:26Z

While the current plan (as I discussed with @V-FEXrt) is to add the annotate call to every method that consumes or produces a matrix, I'm not so sure the memory semantics for the linalg::Matrix type are entirely worked out in a consistent way.

I think the crux of the problem is that in HLSL the object looks like it is the memory for the matrix, or at least automatically manages the memory. In DXIL, this translates to a dx.types.MatrixRef, which just refers to a matrix. Presumably the MatrixRef value in DXIL does not represent the actual memory allocated, but the dx.op.createMatrix operation essentially allocates the matrix referred to by the MatrixRef. So, how do we track allocation lifetime? What happens when you assign one linalg::Matrix to another, or pass it through a function parameter (which copies it)? What rules guarantee the driver can track lifetimes and eliminate copies or re-use memory?

For instance, depending on the linalg::MatrixScope, the linalg::Matrix could refer to per-thread matrix loaded from global mem, which is meant to be folded into an operation through optimization without actually allocating any local memory. Or it could refer to local per-thread fragments making up a wave-scope matrix, or to groupshared memory, which isn't counted in our normal groupshared memory accounting for a shader.

These are interesting issues. At least in the case of cast, I expected that the output matrixref was distinct from the one passed in. I was expecting that we'd follow the intuition of resources in which case the handle does reference the memory, but this is different in that the operations don't just sample from that memory, but completely populate or dump it to memory. The decisions made here might affect whether a default splat for uninitialized matrices like I described above will be possible or not because it will determine if the compiiler can identify that the splat isn't necessary if it is subsequently assigned. Here I think the recently added hitobjects are illustrative as they default to a noop assignment, but can be eliminated if it is subsequently assigned to something that overrides it.

Interesting as all that is, can you help me understand how it relates to the question at hand? Regardless of whether they refer to the memory, I would think the matrixref would be consigned to the type that its parent class was declared as. Perhaps it was in reference to something else I said?

Are you saying that user functions might not need annotates for their parameters or that there is another way to solve that problem?

tex3d · 2025-12-09T01:18:38Z

I opened this issue microsoft/hlsl-specs#756 against the spec, because I think the root issue is a lack of clear memory semantics/lifetime/ownership definition in the spec for the HLSL object and the matrix data it references.

tex3d · 2025-12-09T03:09:55Z

These are interesting issues. At least in the case of cast, I expected that the output matrixref was distinct from the one passed in.

The explicit operations aren't the problem, since they would simply take these handles that refer to some memory managed elsewhere. The problem is the management of that memory, and how the HLSL code translates to the correct number of objects with appropriate lifetimes that the driver can reason about in all cases.

I was expecting that we'd follow the intuition of resources in which case the handle does reference the memory

The key difference from resources is that a resource handle references something externally managed, rather than something managed by the shader itself.

Here I think the recently added hitobjects are illustrative as they default to a noop assignment, but can be eliminated if it is subsequently assigned to something that overrides it.

HitObject is well-defined because the object is immutable, and the storage requirement is fixed (though device dependent) and local. The driver compiler can replace dx.types.HitObject with the internal device-dependent type, and everything is correct. The noop assignment just gives the object a well-defined initial value.

Interesting as all that is, can you help me understand how it relates to the question at hand? Regardless of whether they refer to the memory, I would think the matrixref would be consigned to the type that its parent class was declared as. Perhaps it was in reference to something else I said?

Are you saying that user functions might not need annotates for their parameters or that there is another way to solve that problem?

No, the plan was to annotate on input and on create (constructor would create+annotate) for every operation. Annotate on input includes this and is necessary for when the call is in another function, or there's some other distance between the initial create+allocate and the use. This ensures that the operation has preserved the local type information in all cases, even if there's an assignment through an alloca (even a dynamic one). Every operand of a local DXIL operation would have come from an annotate handle which describes the type information necessary to interpret the operation.

To illustrate, there would be something like a meant-for-internal-use _getAnnotatedHandle() method on the linalg::Matrix class, which is used by this class to ensure preservation of type information:

namespace linalg {
template <...>
class Matrix {
  // Meant to be private, but HLSL doesn't support private:
  MatrixRef _Matrix;
  MatrixRef _getAnnotatedHandle() {
    return __builtin_la_AnnotateMatrix(_Matrix, <matrix properties...>);
  }

  // Public part of the interface:
  Matrix() {
    _Matrix = __builtin_la_CreateMatrix();
    _Matrix = _getAnnotatedHandle();
  }
  
  template <ComponentType NewCompTy, MatrixUse NewUse = Use>
  Matrix<NewCompTy, M, N, NewUse, Scope> Matrix::Cast() {
    Matrix<NewCompTy, M, N, NewUse, Scope> Dest; // will create+annotate
    // Need to annotate our own handle locally for the operation:
    __builtin_la_CastMatrix(Dest._Matrix, _getAnnotatedHandle());
    return Dest;
  }

  template <ComponentEnum LHSTy, ComponentEnum RHSTy, uint K,
            MatrixUseEnum UseLocal = Use>
  typename hlsl::enable_if<Use == MatrixUse::Accumulator && UseLocal == Use,
                           void>::type
  MultiplyAccumulate(const Matrix<LHSTy, M, K, MatrixUse::A, Scope> A,
                     const Matrix<RHSTy, K, N, MatrixUse::B, Scope> B) {
    __builtin_la_MultiplyAccumulateMatrix(_getAnnotatedHandle(),
                                          A._getAnnotatedHandle(),
                                          B._getAnnotatedHandle());
  }

  ...
};
}

This doesn't solve the memory management problem I'm referring to though.

tex3d

This needs to be updated for the experimental DXIL op table changes, however, I had some concerns about the naming of the HLSL intrinsic and DXIL op, and I think other questions may get in the way related to how the current approach will solve the memory management of the matrix contents of any given HLSL Matrix object.

tex3d · 2025-11-13T21:46:19Z

utils/hct/hctdb.py

        )

+        self.add_dxil_op(
+            "CreateMatrix",


CreateMatrix seems overly generic. What about CreateLinAlgMatrix or CreateMatrixRef?

I'm a bit concerned with this just being "Matrix" when we have a different concept of matrix in HLSL, and with preserving vectors in IR, I could even imagine doing matrix operations on those vectors instead of scalarizing those operations, and needing to differentiate this sort of operation from the more abstract LinAlg MatrixRef.

This has me concerned with the other DXIL op names too, which seem overly generic for how specific they actually are, targeting a specialized hardware feature.

I'm happy to update this but it's important to know that it would be a spec change and needs to be updated over there first.

https://github.com/microsoft/hlsl-specs/blob/main/proposals/0035-linalg-matrix.md#dxil-operations

tex3d · 2025-12-08T20:18:04Z

utils/hct/hctdb.py

            )
        for i in (
-            "MatVecMul,MatVecMulAdd,OuterProductAccumulate,VectorAccumulate"
+            "MatVecMul,MatVecMulAdd,OuterProductAccumulate,VectorAccumulate,CreateMatrix"


After rebase, move the new one to the end of populate_ExperimentalOps:

DirectXShaderCompiler/utils/hct/hctdb.py

Line 6061 in ba86cc0

def populate_ExperimentalOps(self):

tex3d · 2025-12-08T20:18:27Z

utils/hct/hctdb.py

            % next_op_idx
        )

+        self.add_dxil_op(


After rebase, move this to the end of populate_categories_and_models_ExperimentalOps:

DirectXShaderCompiler/utils/hct/hctdb.py

Line 1042 in ba86cc0

def populate_categories_and_models_ExperimentalOps(self):

tex3d · 2025-12-08T20:40:22Z

utils/hct/gen_intrin_main.txt

 void [[min_sm=6.10]] __builtin_VectorAccumulate(in LinAlg<c> InputVector, in RWByteAddressBuffer MatrixBuffer, in uint MatrixOffset);

+
+MatrixRef [[min_sm=6.10]] __builtin_la_CreateMatrix();


I don't think this extra level of abbreviation is helpful: la.
Can we just use linalg instead?

V-FEXrt added 2 commits November 6, 2025 20:44

[SM6.10][LinAlg] Impl MatrixRef, CreateMatrix builtins

1a5cbf6

format

5d8ca93

github-project-automation bot added this to HLSL Roadmap Nov 7, 2025

github-project-automation bot moved this to New in HLSL Roadmap Nov 7, 2025

V-FEXrt commented Nov 7, 2025

View reviewed changes

damyanp reviewed Nov 7, 2025

View reviewed changes

V-FEXrt mentioned this pull request Nov 13, 2025

Implement Matrix Annotate Builtin + Create dx.types.MatrixProperties #7896

Open

pow2clk reviewed Dec 8, 2025

View reviewed changes

tex3d reviewed Dec 9, 2025

View reviewed changes

		void [[min_sm=6.10]] __builtin_VectorAccumulate(in LinAlg<c> InputVector, in RWByteAddressBuffer MatrixBuffer, in uint MatrixOffset);


		MatrixRef [[min_sm=6.10]] __builtin_la_CreateMatrix();

[SM6.10][LinAlg] Impl MatrixRef, CreateMatrix builtins #7883

Are you sure you want to change the base?

[SM6.10][LinAlg] Impl MatrixRef, CreateMatrix builtins #7883

Conversation

V-FEXrt commented Nov 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pow2clk Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

damyanp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

V-FEXrt commented Nov 7, 2025

Uh oh!

pow2clk commented Nov 24, 2025

Uh oh!

pow2clk commented Dec 4, 2025

Uh oh!

llvm-beanz commented Dec 4, 2025

Uh oh!

pow2clk commented Dec 4, 2025

Uh oh!

V-FEXrt commented Dec 5, 2025

Uh oh!

pow2clk commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pow2clk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

V-FEXrt Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tex3d commented Dec 8, 2025

Uh oh!

pow2clk commented Dec 8, 2025

Uh oh!

tex3d commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tex3d commented Dec 9, 2025

Uh oh!

tex3d left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

V-FEXrt Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

pow2clk Nov 24, 2025 •

edited

Loading

pow2clk commented Dec 8, 2025 •

edited

Loading

V-FEXrt Dec 8, 2025 •

edited

Loading

tex3d commented Dec 9, 2025 •

edited

Loading

V-FEXrt Dec 10, 2025 •

edited

Loading