feat(report): show failed sweep cells with provenance by dutifulbob · Pull Request #11 · dutifuldev/localperf

dutifulbob · 2026-07-03T02:29:41Z

Opened on behalf of Onur Solmaz (osolmaz). Work is ready for review.

Summary

Benchmark sweeps can have valid partial failures, especially when high-concurrency or high-context points hit memory limits.
This change keeps mixed-success sweeps reportable instead of treating one failed point as a failed overall run.
It also makes failed and skipped cells visible in the main HTML grid, and lets a reviewer click a cell to see the exact model/profile/workload settings and commands behind that result.

What Changed

The runner now distinguishes partial workload failures from all-failed runs.
A sweep with at least one completed measurement can finish as completed while still preserving failed/skipped measurements in the artifact.

Changed final run error handling so partial sweep failures do not force a nonzero run error.
Changed stored run status so mixed completed/failed sweeps are marked completed, while all-failed sweeps remain failed.
Added failed/skipped labels such as mem floor, not ready, and server exit to headline report cells.
Added native HTML detail disclosures for report cells with run, model, profile, workload, shape, profile config, vLLM serve command, benchmark command, engine args, serve JSON, and env JSON.
Removed the command loading cap so larger sweeps keep command provenance available for every measurement.

Testing

I tested the changed runner behavior, the HTML rendering path, and the required local quality gates.
The dry benchmark path was also checked without starting a real model server.

go test ./internal/report ./internal/vllmbench
go test ./...
go vet ./...
npx -y @simpledoc/simpledoc check
go run github.com/dutifuldev/slophammer/go/cmd/slophammer-go@v0.4.0 check .
go run ./cmd/localperf bench run --dry-run --spec examples/diffusiongemma-vllm-standard/spec.json --profile 4k-reference --workload claim-repro-1k-out1024 --concurrency 1 --run-dir /tmp/localperf-onecase-dry
go run ./cmd/localperf artifact check /tmp/localperf-onecase-dry.sqlite
go run ./cmd/localperf artifact render /tmp/localperf-onecase-dry.sqlite --output /tmp/localperf-onecase-dry.html

Risks

The main behavior change is intentional: a run with partial failed points no longer exits as failed if at least one point completed.
All-failed runs and infrastructure errors still fail, so completely broken sweeps should not look successful.

The HTML report now includes more command/config provenance in hidden cell details.
That can make very large reports somewhat larger, but it is needed for clickable per-cell auditability.

dutifulbob · 2026-07-03T03:43:30Z

Final report:

Implemented failed/skipped benchmark cells in the HTML report with native click-through details for status, reason, profile config, vLLM serve command, benchmark command, engine args, serve JSON, and env JSON.
Partial benchmark-point failures stay reportable; all-failed sweeps and fatal run errors still fail the run. Fatal partial runs persist as failed.
Aggregate repeat/model-level rows now say they are aggregates and no longer claim first-repeat run or command provenance.
Dry-run planned rows stay neutral instead of rendering as failure labels.
Secret-bearing command argv values are redacted before rendering in standalone HTML reports.
Review loop: latest codex review --base main completed with no findings.
Validation passed: go test ./...; go vet ./...; npx -y @simpledoc/simpledoc check; go run github.com/dutifuldev/slophammer/go/cmd/slophammer-go@v0.4.0 check .; one-case dry localperf bench run; localperf artifact check; localperf artifact render; dry-render grep for planned failure labels.
CI: green (go checks passed).

feat(report): show failed sweep cells with provenance

32a3851

dutifulbob assigned osolmaz Jul 3, 2026

dutifulbob added 6 commits July 3, 2026 10:40

fix(report): tighten failed-cell provenance

de85cfa

fix(bench): keep fatal partial runs failed

9dc8a4a

fix(report): replace stale success counts

5928770

fix(report): label aggregate cell provenance

41724f8

fix(report): keep planned cells neutral

e4dc7ee

fix(report): redact secret argv values

9f2c40c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(report): show failed sweep cells with provenance#11

feat(report): show failed sweep cells with provenance#11
dutifulbob wants to merge 7 commits into
mainfrom
feat/report-failed-sweep-cells

dutifulbob commented Jul 3, 2026

Uh oh!

dutifulbob commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dutifulbob commented Jul 3, 2026

Summary

What Changed

Testing

Risks

Uh oh!

dutifulbob commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants