Skip to content

Conversation

@vickiw973
Copy link

@vickiw973 vickiw973 commented Nov 11, 2025

Tested results on B200 chip with python3.13.8 and CuTe DSL 4.3.0.dev0

(env13_8) nvfp4_gemm$ python3 eval.py test task.yml
compile: start
compile: pass
test-count: 10
test.0.spec: m: 128; n: 256; k: 256; l: 1; seed: 1111
test.0.status: pass
test.1.spec: m: 128; n: 1536; k: 7168; l: 1; seed: 1111
test.1.status: pass
test.2.spec: m: 128; n: 3072; k: 1536; l: 1; seed: 1111
test.2.status: pass
test.3.spec: m: 256; n: 7168; k: 256; l: 1; seed: 1111
test.3.status: pass
test.4.spec: m: 256; n: 7168; k: 2048; l: 1; seed: 1111
test.4.status: pass
test.5.spec: m: 2304; n: 4608; k: 7168; l: 1; seed: 1111
test.5.status: pass
test.6.spec: m: 384; n: 7168; k: 2304; l: 1; seed: 1111
test.6.status: pass
test.7.spec: m: 512; n: 512; k: 7168; l: 1; seed: 1111
test.7.status: pass
test.8.spec: m: 512; n: 4096; k: 512; l: 1; seed: 1111
test.8.status: pass
test.9.spec: m: 512; n: 1536; k: 7168; l: 1; seed: 1111
test.9.status: pass
check: pass
(env13_8) nvfp4_gemm$ python3 eval.py benchmark task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; n: 128; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 122583.36134254932
benchmark.0.std: 24482.462646219574
benchmark.0.err: 1731.1715357288206
benchmark.0.best: 115712.0019197464
benchmark.0.worst: 455711.9905948639
benchmark.1.spec: m: 4096; n: 128; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 80429.76021766663
benchmark.1.std: 27085.550673724414
benchmark.1.err: 1915.2376553562394
benchmark.1.best: 72704.0022611618
benchmark.1.worst: 353311.9857311249
benchmark.2.spec: m: 7168; n: 128; k: 2048; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 59335.5206400156
benchmark.2.std: 18873.720483479116
benchmark.2.err: 1334.5735740087525
benchmark.2.best: 54271.99974656105
benchmark.2.worst: 320576.012134552
check: pass
(env13_8) nvfp4_gemm$ python3 eval.py leaderboard task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; n: 128; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 196519.99942958355
benchmark.0.std: 10528.200054341341
benchmark.0.err: 744.456165211334
benchmark.0.best: 174079.9993276596
benchmark.0.worst: 243744.00079250336
benchmark.1.spec: m: 4096; n: 128; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 118216.48139506578
benchmark.1.std: 26523.21540881495
benchmark.1.err: 1875.4745474444578
benchmark.1.best: 109567.99983978271
benchmark.1.worst: 486431.9860935211
benchmark.2.spec: m: 7168; n: 128; k: 2048; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 94226.08111053705
benchmark.2.std: 4438.020500021594
benchmark.2.err: 313.81543906101814
benchmark.2.best: 89088.00035715103
benchmark.2.worst: 129023.99897575378
check: pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant