Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tested results on B200 chip with python3.13.8 and CuTe DSL 4.3.0.dev0
(env13_8) nvfp4_gemm$ python3 eval.py test task.yml
compile: start
compile: pass
test-count: 10
test.0.spec: m: 128; n: 256; k: 256; l: 1; seed: 1111
test.0.status: pass
test.1.spec: m: 128; n: 1536; k: 7168; l: 1; seed: 1111
test.1.status: pass
test.2.spec: m: 128; n: 3072; k: 1536; l: 1; seed: 1111
test.2.status: pass
test.3.spec: m: 256; n: 7168; k: 256; l: 1; seed: 1111
test.3.status: pass
test.4.spec: m: 256; n: 7168; k: 2048; l: 1; seed: 1111
test.4.status: pass
test.5.spec: m: 2304; n: 4608; k: 7168; l: 1; seed: 1111
test.5.status: pass
test.6.spec: m: 384; n: 7168; k: 2304; l: 1; seed: 1111
test.6.status: pass
test.7.spec: m: 512; n: 512; k: 7168; l: 1; seed: 1111
test.7.status: pass
test.8.spec: m: 512; n: 4096; k: 512; l: 1; seed: 1111
test.8.status: pass
test.9.spec: m: 512; n: 1536; k: 7168; l: 1; seed: 1111
test.9.status: pass
check: pass
(env13_8) nvfp4_gemm$ python3 eval.py benchmark task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; n: 128; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 122583.36134254932
benchmark.0.std: 24482.462646219574
benchmark.0.err: 1731.1715357288206
benchmark.0.best: 115712.0019197464
benchmark.0.worst: 455711.9905948639
benchmark.1.spec: m: 4096; n: 128; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 80429.76021766663
benchmark.1.std: 27085.550673724414
benchmark.1.err: 1915.2376553562394
benchmark.1.best: 72704.0022611618
benchmark.1.worst: 353311.9857311249
benchmark.2.spec: m: 7168; n: 128; k: 2048; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 59335.5206400156
benchmark.2.std: 18873.720483479116
benchmark.2.err: 1334.5735740087525
benchmark.2.best: 54271.99974656105
benchmark.2.worst: 320576.012134552
check: pass
(env13_8) nvfp4_gemm$ python3 eval.py leaderboard task.yml
compile: start
compile: pass
benchmark-count: 3
benchmark.0.spec: m: 7168; n: 128; k: 16384; l: 1; seed: 1111
benchmark.0.runs: 200
benchmark.0.mean: 196519.99942958355
benchmark.0.std: 10528.200054341341
benchmark.0.err: 744.456165211334
benchmark.0.best: 174079.9993276596
benchmark.0.worst: 243744.00079250336
benchmark.1.spec: m: 4096; n: 128; k: 7168; l: 1; seed: 1111
benchmark.1.runs: 200
benchmark.1.mean: 118216.48139506578
benchmark.1.std: 26523.21540881495
benchmark.1.err: 1875.4745474444578
benchmark.1.best: 109567.99983978271
benchmark.1.worst: 486431.9860935211
benchmark.2.spec: m: 7168; n: 128; k: 2048; l: 1; seed: 1111
benchmark.2.runs: 200
benchmark.2.mean: 94226.08111053705
benchmark.2.std: 4438.020500021594
benchmark.2.err: 313.81543906101814
benchmark.2.best: 89088.00035715103
benchmark.2.worst: 129023.99897575378
check: pass