Skip to content

Conversation

@vickiw973
Copy link

Tested results on B200 chip with Python3.13.8 and CuTe DSL 4.3.0.dev0

(env13_8) nvfp4_dual_gemm$ python3 eval.py test task.yml
compile: start
compile: pass
test-count: 10
test.0.spec: m: 128; n: 256; k: 256; l: 1; seed: 1111
test.0.status: pass
test.1.spec: m: 128; n: 1536; k: 7168; l: 1; seed: 1111
test.1.status: pass
test.2.spec: m: 128; n: 3072; k: 1536; l: 1; seed: 1111
test.2.status: pass
test.3.spec: m: 256; n: 7168; k: 256; l: 1; seed: 1111
test.3.status: pass
test.4.spec: m: 256; n: 7168; k: 2048; l: 1; seed: 1111
test.4.status: pass
test.5.spec: m: 2304; n: 4608; k: 7168; l: 1; seed: 1111
test.5.status: pass
test.6.spec: m: 384; n: 7168; k: 2304; l: 1; seed: 1111
test.6.status: pass
test.7.spec: m: 512; n: 512; k: 7168; l: 1; seed: 1111
test.7.status: pass
test.8.spec: m: 512; n: 4096; k: 512; l: 1; seed: 1111
test.8.status: pass
test.9.spec: m: 512; n: 1536; k: 7168; l: 1; seed: 1111
test.9.status: pass
check: pass
(env13_8) nvfp4_dual_gemm$ python3 eval.py benchmark task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; n: 128; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 160051.9973784685
benchmark.0.std: 23031.866455664996
benchmark.0.err: 1628.5988954183692
benchmark.0.best: 152575.9994983673
benchmark.0.worst: 472128.0038356781
benchmark.1.spec: m: 4096; n: 128; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 99979.84111309052
benchmark.1.std: 20095.203008511555
benchmark.1.err: 1420.945431663883
benchmark.1.best: 93184.00174379349
benchmark.1.worst: 378879.99415397644
benchmark.2.spec: m: 7168; n: 128; k: 2048; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 74724.80170428753
benchmark.2.std: 21870.720279291818
benchmark.2.err: 1546.4934618921386
benchmark.2.best: 69632.00122117996
benchmark.2.worst: 374783.992767334
check: pass
(env13_8) nvfp4_dual_gemm$ python3 eval.py leaderboard task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; n: 128; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 253803.03986370564
benchmark.0.std: 9263.37772232849
benchmark.0.err: 655.019720415087
benchmark.0.best: 230399.9960422516
benchmark.0.worst: 283648.0140686035
benchmark.1.spec: m: 4096; n: 128; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 143465.91904759407
benchmark.1.std: 26430.637287532198
benchmark.1.err: 1868.9282857096032
benchmark.1.best: 136191.99395179749
benchmark.1.worst: 509952.00872421265
benchmark.2.spec: m: 7168; n: 128; k: 2048; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 114432.32048302889
benchmark.2.std: 28682.384716506524
benchmark.2.err: 2028.1508733643152
benchmark.2.best: 107519.99914646149
benchmark.2.worst: 506911.9930267334
check: pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant