Add nvfp4 dual_gemm example #76

vickiw973 · 2025-11-11T13:49:44Z

Tested results on B200 chip with Python3.13.8 and CuTe DSL 4.3.0.dev0

(env13_8) nvfp4_dual_gemm$ python3 eval.py test task.yml
compile: start
compile: pass
test-count: 10
test.0.spec: m: 128; n: 256; k: 256; l: 1; seed: 1111
test.0.status: pass
test.1.spec: m: 128; n: 1536; k: 7168; l: 1; seed: 1111
test.1.status: pass
test.2.spec: m: 128; n: 3072; k: 1536; l: 1; seed: 1111
test.2.status: pass
test.3.spec: m: 256; n: 7168; k: 256; l: 1; seed: 1111
test.3.status: pass
test.4.spec: m: 256; n: 7168; k: 2048; l: 1; seed: 1111
test.4.status: pass
test.5.spec: m: 2304; n: 4608; k: 7168; l: 1; seed: 1111
test.5.status: pass
test.6.spec: m: 384; n: 7168; k: 2304; l: 1; seed: 1111
test.6.status: pass
test.7.spec: m: 512; n: 512; k: 7168; l: 1; seed: 1111
test.7.status: pass
test.8.spec: m: 512; n: 4096; k: 512; l: 1; seed: 1111
test.8.status: pass
test.9.spec: m: 512; n: 1536; k: 7168; l: 1; seed: 1111
test.9.status: pass
check: pass
(env13_8) nvfp4_dual_gemm$ python3 eval.py benchmark task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; n: 128; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 160051.9973784685
benchmark.0.std: 23031.866455664996
benchmark.0.err: 1628.5988954183692
benchmark.0.best: 152575.9994983673
benchmark.0.worst: 472128.0038356781
benchmark.1.spec: m: 4096; n: 128; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 99979.84111309052
benchmark.1.std: 20095.203008511555
benchmark.1.err: 1420.945431663883
benchmark.1.best: 93184.00174379349
benchmark.1.worst: 378879.99415397644
benchmark.2.spec: m: 7168; n: 128; k: 2048; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 74724.80170428753
benchmark.2.std: 21870.720279291818
benchmark.2.err: 1546.4934618921386
benchmark.2.best: 69632.00122117996
benchmark.2.worst: 374783.992767334
check: pass
(env13_8) nvfp4_dual_gemm$ python3 eval.py leaderboard task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; n: 128; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 253803.03986370564
benchmark.0.std: 9263.37772232849
benchmark.0.err: 655.019720415087
benchmark.0.best: 230399.9960422516
benchmark.0.worst: 283648.0140686035
benchmark.1.spec: m: 4096; n: 128; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 143465.91904759407
benchmark.1.std: 26430.637287532198
benchmark.1.err: 1868.9282857096032
benchmark.1.best: 136191.99395179749
benchmark.1.worst: 509952.00872421265
benchmark.2.spec: m: 7168; n: 128; k: 2048; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 114432.32048302889
benchmark.2.std: 28682.384716506524
benchmark.2.err: 2028.1508733643152
benchmark.2.best: 107519.99914646149
benchmark.2.worst: 506911.9930267334
check: pass

add nvfp4 dual_gemm example

9f0b150

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add nvfp4 dual_gemm example #76

Add nvfp4 dual_gemm example #76

Uh oh!

vickiw973 commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add nvfp4 dual_gemm example #76

Are you sure you want to change the base?

Add nvfp4 dual_gemm example #76

Uh oh!

Conversation

vickiw973 commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant