Skip to content

Conversation

@roclark
Copy link

@roclark roclark commented Oct 24, 2025

What does this PR do ?

Added the theoretical TFlops for H200 GPUs to measure the process efficiency.

Issues

N/A

Usage

N/A

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • Confirmed the format of the GPU output on a cluster with H200s:
$ python3 -c 'import torch; print(torch.cuda.get_device_name())'
NVIDIA H200
  • Confirmed theoretical TFlops in the H200 data sheet

Summary by CodeRabbit

  • New Features
    • Extended GPU performance tracking support to include NVIDIA H200 with bfloat16 and float32 precision types.

Added the theoretical TFlops for H200 GPUs which is equivalent to H100
80GB HBM3 estimates.

Signed-Off-By: Robert Clark <[email protected]>
@roclark roclark requested a review from a team as a code owner October 24, 2025 15:46
@roclark roclark changed the title Add theoretical TFlops for H200 GPU fix: Add theoretical TFlops for H200 GPU Oct 24, 2025
@roclark roclark changed the title fix: Add theoretical TFlops for H200 GPU fix: add theoretical TFlops for H200 GPU Oct 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 24, 2025

📝 Walkthrough

Walkthrough

The pull request adds theoretical TFLOPS benchmark entries for NVIDIA H200 GPUs in bfloat16 and float32 data types to the THEORETICAL_TFLOPS lookup table, extending device-dtype coverage without modifying control flow or behavioral logic.

Changes

Cohort / File(s) Summary
H200 TFLOPS Benchmark Additions
nemo_rl/utils/flops_tracker.py
Adds two THEORETICAL_TFLOPS entries for H200: bfloat16 (1979/2 TFLOPS) and float32 (989/2 TFLOPS with TF32 conditional, else 67.0), mirroring existing H100 structure.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Suggested reviewers

  • guyueh1
  • terrykong

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Test Results For Major Changes ✅ Passed The PR adds two new entries to the THEORETICAL_TFLOPS dictionary for the NVIDIA H200 GPU with specific bfloat16 and float32 values, mirroring the H100 entries. This is a minor change—purely a data addition to a lookup table with no new logic, behavioral changes, or code modifications that could affect numerics, convergence, or performance. The existing test file test_flops_counter.py validates FLOPS calculations for various models and configurations, and while it doesn't specifically test H200 values, the PR description explicitly states that pre-checks were completed including running unit and functional tests locally with no issues reported. The H200 TFLOPS values match official NVIDIA specifications and align with the established H100 values in the same table, making them straightforward reference data additions.
Title check ✅ Passed The title accurately describes the main change: adding theoretical TFLOPS values for H200 GPU to the benchmark table in the flops_tracker.py file.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guyueh1 to review

@terrykong terrykong added the CI:L0 Run doctests and unit tests label Nov 4, 2025
@terrykong
Copy link
Contributor

@guyueh1 bump

@terrykong
Copy link
Contributor

closing in favor of #1543 which has some tests

@terrykong terrykong closed this Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants