Add AMAX, AVG, NORM1, NORM2, MUL, MUL_NO_ZEROS reduction modes by rsuderman · Pull Request #325 · iree-org/fusilli

rsuderman · 2026-04-08T18:04:12Z

Enable the remaining cuDNN reduction modes in ReductionAttr and add the corresponding MLIR schemas to the asm emitter:

NORM1 lowers to abs + sum.dim_IntList.
AMAX lowers to abs + amax.
AVG lowers to mean.dim (float dtypes only — torch.aten.mean.dim is not defined on integer tensors, so the sample skips int32 for AVG).
NORM2 lowers to mul + sum.dim_IntList + sqrt.
MUL lowers directly to torch.prims.prod.
MUL_NO_ZEROS uses aten.ne.Scalar to build an i1 mask, then aten.where.ScalarOther to substitute 1 for zero entries before feeding the result to torch.prims.prod, so zero inputs are excluded from the product.

Extend samples/reduction/reduction_ops.cpp to exercise every new mode. Input data is built by a per-mode generateReductionInputData helper so MUL/MUL_NO_ZEROS get a non-trivial pattern (mostly 1s with a 2 and a 3, plus injected zeros for MUL_NO_ZEROS) that stays in range for fp16/int32, and the expected value is computed by the existing reference reduction loop rather than hardcoded.

Add lit tests for each new mode under tests/lit/ and register them in tests/CMakeLists.txt.

Enable the remaining cuDNN reduction modes in ReductionAttr and add the corresponding MLIR schemas to the asm emitter: - NORM1 lowers to abs + sum.dim_IntList. - AMAX lowers to abs + amax. - AVG lowers to mean.dim (float dtypes only — torch.aten.mean.dim is not defined on integer tensors, so the sample skips int32 for AVG). - NORM2 lowers to mul + sum.dim_IntList + sqrt. - MUL lowers directly to torch.prims.prod. - MUL_NO_ZEROS uses aten.ne.Scalar to build an i1 mask, then aten.where.ScalarOther to substitute 1 for zero entries before feeding the result to torch.prims.prod, so zero inputs are excluded from the product. Extend samples/reduction/reduction_ops.cpp to exercise every new mode. Input data is built by a per-mode generateReductionInputData helper so MUL/MUL_NO_ZEROS get a non-trivial pattern (mostly 1s with a 2 and a 3, plus injected zeros for MUL_NO_ZEROS) that stays in range for fp16/int32, and the expected value is computed by the existing reference reduction loop rather than hardcoded. Add lit tests for each new mode under tests/lit/ and register them in tests/CMakeLists.txt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Rob Suderman <rob.suderman@gmail.com>

Signed-off-by: Rob Suderman <rob.suderman@gmail.com> # Conflicts: # include/fusilli/support/asm_emitter.h # samples/reduction/reduction_ops.cpp

Signed-off-by: Rob Suderman <rob.suderman@gmail.com>

sjain-stanford

Might need rebase / CI seems unclean.

sjain-stanford · 2026-04-16T22:10:11Z

 #define FUSILLI_REDUCTION_MODES(OP)                                            \
  OP(NOT_SET)                                                                  \
  OP(SUM)                                                                      \
-  /* OP(ADD) */                                                                \


Is ADD dropped because it's equivalent to SUM or some other reason?

sjain-stanford · 2026-04-16T22:11:41Z

+                       permuteX,             // {0}
+                       dimListOss.str(),     // {1}
+                       suffix,               // {2}
+                       getResultNamesAsm(),  // {3}
+                       getOperandNamesAsm(), // {4}
+                       getOperandTypesAsm(), // {5}
+                       getResultTypesAsm(),  // {6}
+                       permuteY,             // {7}
+                       boolType              // {8}


hyper nit: Use /* {0} */ style comments in format string placeholders for consistency with the rest.

sjain-stanford · 2026-04-16T22:14:07Z

+    {0}
+    {1}
+    %dtype_{2} = torch.constant.none
+    {3}_{2}_perm = torch.prims.prod {4}, %reduction_dims_{2}, %dtype_{2} : {5}, !torch.list<int>, !torch.none -> {6}


Claude tells me this is not registered in the torch dialect. Is that true?

Do we need to use torch.aten.prod.dim_int (with chaining for multiple reduction dims)?

rsuderman requested a review from IanWood1 April 8, 2026 18:25

IanWood1 mentioned this pull request Apr 8, 2026

[NFC] Refactor reduction emitter to be macro-based #320

Merged

rsuderman added 2 commits April 9, 2026 11:08

Merge remote-tracking branch 'origin/main' into HEAD

f4c10ab

Signed-off-by: Rob Suderman <rob.suderman@gmail.com> # Conflicts: # include/fusilli/support/asm_emitter.h # samples/reduction/reduction_ops.cpp

Match the templating approach

2a5541c

Signed-off-by: Rob Suderman <rob.suderman@gmail.com>

rsuderman force-pushed the reduction_rest branch from 05be4e2 to 2a5541c Compare April 9, 2026 20:58

sjain-stanford requested changes Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AMAX, AVG, NORM1, NORM2, MUL, MUL_NO_ZEROS reduction modes#325

Add AMAX, AVG, NORM1, NORM2, MUL, MUL_NO_ZEROS reduction modes#325
rsuderman wants to merge 3 commits intoiree-org:mainfrom
rsuderman:reduction_rest

rsuderman commented Apr 8, 2026

Uh oh!

sjain-stanford left a comment

Uh oh!

sjain-stanford Apr 16, 2026

Uh oh!

sjain-stanford Apr 16, 2026

Uh oh!

sjain-stanford Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rsuderman commented Apr 8, 2026

Uh oh!

sjain-stanford left a comment

Choose a reason for hiding this comment

Uh oh!

sjain-stanford Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

sjain-stanford Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

sjain-stanford Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants