Benchmark A100 vs H200 GPUs by hanaol · Pull Request #120 · Quantum-Accelerators/electrai

hanaol · 2026-04-13T13:20:08Z

Summary

This adds a GPU comparison benchmark script that runs the same per-sample training experiment on both an A100 and an H200, recording peak GPU memory, forward/backward times, and OOM status for 10 large-grid Materials Project samples under f32 and bf16-mixed precision.

The 10 task IDs are Materials Project entries with relatively large charge-density grids, spanning 3.4 M – 46.7 M voxels across a variety of shapes and aspect ratios.

Benchmark results

Model: resunet.ResUNet3D, n_channels=32, n_residual_blocks=1, kernel_size=5, depth=2, batch_size=1, single GPU, 3 epochs per experiment.

f32

Task ID	Grid shape	Voxels	A100 status	A100 peak (GB)	A100 epoch (s)	H200 status	H200 peak (GB)	H200 epoch (s)	Speedup (H200/A100)
mp-1890579	56 × 56 × 1080	3.4 M	✅	10.50	1.65	✅	10.80	1.13	1.46×
mp-1849767	60 × 60 × 1120	4.0 M	✅	12.42	1.95	✅	12.80	1.28	1.52×
mp-1851604	60 × 60 × 1120	4.0 M	✅	12.42	1.93	✅	12.80	1.29	1.50×
mp-1862536	80 × 80 × 1024	6.6 M	✅	19.89	3.17	✅	20.56	2.19	1.45×
mp-1847208	1120 × 84 × 84	7.9 M	✅	23.89	3.87	✅	24.73	2.46	1.57×
mp-1936557	80 × 756 × 216	13.1 M	✅	39.09	6.47	✅	40.53	3.94	1.64×
mp-1850168	972 × 240 × 128	29.9 M	❌ OOM	—	—	✅	91.97	8.67	—
mp-1887804	320 × 320 × 320	32.8 M	❌ OOM	—	—	✅	100.55	111.10	—
mp-1889246	540 × 144 × 432	33.6 M	❌ OOM	—	—	✅	103.25	136.31	—
mp-1871122	360 × 360 × 360	46.7 M	❌ OOM	—	—	✅	131.65	303.15	—

bf16-mixed

Task ID	Grid shape	Voxels	A100 status	A100 peak (GB)	A100 epoch (s)	H200 status	H200 peak (GB)	H200 epoch (s)	Speedup (H200/A100)
mp-1890579	56 × 56 × 1080	3.4 M	✅	5.50	1.07	✅	5.50	0.70	1.53×
mp-1849767	60 × 60 × 1120	4.0 M	✅	6.50	1.27	✅	6.50	0.78	1.63×
mp-1851604	60 × 60 × 1120	4.0 M	✅	6.50	1.25	✅	6.50	0.78	1.60×
mp-1862536	80 × 80 × 1024	6.6 M	✅	10.40	1.94	✅	10.40	1.36	1.43×
mp-1847208	1120 × 84 × 84	7.9 M	✅	12.49	2.46	✅	12.49	1.51	1.63×
mp-1936557	80 × 756 × 216	13.1 M	✅	20.44	3.73	✅	20.44	2.64	1.41×
mp-1850168	972 × 240 × 128	29.9 M	✅	46.28	8.77	✅	46.28	5.12	1.71×
mp-1887804	320 × 320 × 320	32.8 M	✅	50.59	169.91	✅	50.59	109.60	1.55×
mp-1889246	540 × 144 × 432	33.6 M	✅	51.95	208.94	✅	51.95	135.30	1.54×
mp-1871122	360 × 360 × 360	46.7 M	❌ OOM	—	—	✅	71.90	188.83	—

Summary

Precision	A100 ✅	A100 ❌ OOM	H200 ✅	H200 ❌ OOM
f32	6 / 10	4 / 10	10 / 10	0 / 10
bf16-mixed	9 / 10	1 / 10	10 / 10	0 / 10

Key findings

H200 handles all 10 task IDs under both precisions — all A100 OOMs are resolved by the larger VRAM (139.8 GB).
H200 is consistently 1.4–1.7× faster per epoch across all grid sizes and precisions.

Files

scripts/benchmark_gpus.py -- Reads two JSON result files (one per GPU) produced by

hanaol · 2026-04-13T13:20:28Z

@claude review this PR

claude · 2026-04-13T13:21:01Z

Claude finished @hanaol's task in 1m 26s —— View job

Code Review: `scripts/benchmark_gpus.py`

Gather context
Read changed files
Review code and provide feedback

Overall this is a clean, well-structured comparison script. A few issues to flag:

Bug: `--out` omitted silently discards output

scripts/benchmark_gpus.py:135-140

output = "\n".join(lines)

if args.out:
    args.out.parent.mkdir(parents=True, exist_ok=True)
    args.out.write_text(output)
# no else: print(output)

The module docstring says the default is stdout, but when --out is not passed the output is computed and thrown away — nothing is printed. Fix this →

Minor: Helper functions redefined on every loop iteration

scripts/benchmark_gpus.py:104-117

for tid in all_task_ids:
    key = (tid, prec)
    r1 = idx1.get(key)
    r2 = idx2.get(key)

    def status(r): ...
    def peak_gb(r): ...
    def epoch_s(r): ...

status, peak_gb, and epoch_s don't close over any loop variable — they're pure helpers that get re-created on every iteration of the outer for prec and inner for tid loops. Move them outside the loops (e.g. as module-level helpers or before the lines = [] block). Fix this →

Minor: Metadata table uses generic GPU labels

scripts/benchmark_gpus.py:78-81

a("| GPU | Model | VRAM |")
a("|-----|-------|:----:|")
a(f"| GPU 1 (reference) | {gpu1_name} | {gpu1_info['total_mem_gb']} GB |")
a(f"| GPU 2 | {gpu2_name} | {gpu2_info['total_mem_gb']} GB |")

The label column uses the hardcoded strings "GPU 1 (reference)" / "GPU 2" while the per-precision tables use gpu1_name / gpu2_name. Consider using the actual GPU names here as well for consistency (e.g. f"| {gpu1_name} (reference) | ... |"). Fix this →

Nit: Ratio uses lowercase `x` instead of `×`

scripts/benchmark_gpus.py:34

return f"{a / b:.2f}x"

The PR description uses the proper multiplication sign ×. Using × here would make the markdown output match. Minor, but worth keeping consistent.

Suggestion: No summary table in output

The PR description includes a useful summary table (OOM counts per GPU/precision). The script doesn't emit one — consider adding it after the per-precision tables so the generated .md file is self-contained.

Summary: One real bug (stdout never written), one style issue (nested helper functions), and two minor consistency points. The core logic — key-based lookup, ratio computation, grid shape display — looks correct.

Fix the silent-discard bug when --out is omitted (docstring promises stdout as default) and move status/peak_gb/epoch_s to module level since they don't close over loop variables. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

forklady42

Addressed the bug and one of the minor comments. Okay to ignore the other minor comment and the nit for this script.

Hananeh Oliaei added 2 commits April 11, 2026 23:14

gpu comparison module

b6b7ab1

changed module

a9b3754

forklady42 approved these changes Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark A100 vs H200 GPUs#120

Benchmark A100 vs H200 GPUs#120
hanaol wants to merge 3 commits intohanaol/mixed-precisionfrom
hanaol/benchmark-a100-vs-h200

hanaol commented Apr 13, 2026

Uh oh!

hanaol commented Apr 13, 2026

Uh oh!

claude Bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

forklady42 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hanaol commented Apr 13, 2026

Summary

Benchmark results

f32

bf16-mixed

Summary

Key findings

Files

Uh oh!

hanaol commented Apr 13, 2026

Uh oh!

claude Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: scripts/benchmark_gpus.py

Bug: --out omitted silently discards output

Minor: Helper functions redefined on every loop iteration

Minor: Metadata table uses generic GPU labels

Nit: Ratio uses lowercase x instead of ×

Suggestion: No summary table in output

Uh oh!

forklady42 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented Apr 13, 2026 •

edited

Loading

Code Review: `scripts/benchmark_gpus.py`

Bug: `--out` omitted silently discards output

Nit: Ratio uses lowercase `x` instead of `×`