Skip to content

Conversation

@shanmugamr1992
Copy link

@shanmugamr1992 shanmugamr1992 commented Nov 11, 2025

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Introduced dynamic inference engine for improved generation performance with CUDA graph optimization support.
    • Added configuration for GRPO Llama 3.2 1B Instruct model with Megatron generation backend.
  • Bug Fixes

    • Fixed potential error in policy generation initialization.
  • Tests

    • Added new functional tests for GRPO Megatron generation workflow.
    • Enabled previously disabled generation tests.

@shanmugamr1992 shanmugamr1992 requested review from a team as code owners November 11, 2025 00:05
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 11, 2025

📝 Walkthrough

Walkthrough

This PR introduces GRPO support for Llama 3.2-1B with dynamic megatron generation inference, refactors MegatronPolicyWorker to use DynamicInferenceEngine instead of StaticInferenceEngine, adds new test scripts to the GPU and nightly test suites, adds a None-check guard for policy_generation setup, and enables previously skipped megatron generation unit tests.

Changes

Cohort / File(s) Summary
Configuration Addition
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.yaml
New YAML config for GRPO with Llama-3.2-1B-Instruct, megatron generation backend, and settings for checkpointing, logging, and inference (max_new_tokens 512).
Core Algorithm Logic
nemo_rl/algorithms/grpo.py
Adds None-check guard around prepare_refit_info(state_dict_info) call to prevent AttributeError when policy_generation is None.
Inference Engine Refactor
nemo_rl/models/policy/megatron_policy_worker.py
Replaces StaticInferenceEngine with DynamicInferenceContext and DynamicInferenceEngine in generate() method; introduces per-prompt request handling, dynamic sampling parameter configuration, explicit detokenization, and CUDA graph optimization support.
GPU Functional Tests
tests/functional/L1_Functional_Tests_GPU.sh, tests/functional/grpo_megatron_generation.sh
Extends GPU L1 test suite to include new GRPO megatron generation functional test; new test script sets up experiment environment, runs GRPO training, generates metrics, and validates token probability error < 1.05.
Nightly & Unit Tests
tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh, tests/test_suites/nightly.txt, tests/unit/models/policy/test_megatron_worker.py
Adds new nightly test script for GRPO megatron generation with config and metric validation; registers test in nightly suite; removes pytest skip decorator to enable megatron worker unit tests.

Sequence Diagram(s)

sequenceDiagram
    participant caller as Caller
    participant old as Old: StaticInferenceEngine
    participant new as New: DynamicInferenceEngine
    participant model as Model
    participant engine as Engine

    rect rgb(200, 220, 250)
    Note over old,engine: Previous Flow (Static)
    caller->>old: generate(prompts, ...)
    old->>old: run_mcore_engine()
    old->>engine: forward pass
    engine-->>old: output tokens + logprobs
    old-->>caller: return results
    end

    rect rgb(220, 250, 200)
    Note over new,engine: New Flow (Dynamic)
    caller->>new: generate(prompts, ...)
    new->>new: DynamicInferenceContext setup
    new->>new: GPTInferenceWrapper init
    new->>model: prep_model_for_inference()
    model->>model: enable CUDA graphs (local)
    new->>new: compute tokens_to_generate
    new->>new: configure SamplingParams (temp, top_k, top_p)
    loop per-prompt
        new->>engine: create request with params
        engine->>engine: dynamic batching
    end
    new->>engine: collect results by request_id
    new->>new: detokenize per-prompt
    new->>new: sort by request_id
    new-->>caller: return ordered results
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • MegatronPolicyWorker.generate() refactor (nemo_rl/models/policy/megatron_policy_worker.py): Significant internal logic change replacing inference engine approach; requires careful verification of per-prompt batching, sampling parameter handling, result ordering, and CUDA graph integration.
  • Test script logic (tests/functional/grpo_megatron_generation.sh and corresponding test suite script): Verify environment setup, metric validation thresholds, and log parsing correctness.
  • Policy generation None-check (nemo_rl/algorithms/grpo.py): Minor defensive fix; straightforward guard condition.

Possibly related PRs

Suggested labels

CI:L1, Run CICD

Suggested reviewers

  • parthchadha
  • terrykong
  • guyueh1

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR implements major inference engine refactoring and performance optimizations but PR description lacks test results, performance metrics, or convergence analysis documentation. Document performance benchmarks, test execution results, and convergence/regression analysis validating the StaticInferenceEngine to DynamicInferenceEngine refactoring.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'build: Use dynamic engine for generate' directly reflects the main change in the PR: replacing the hard-coded StaticInferenceEngine with DynamicInferenceEngine in the megatron_policy_worker.py file, which is the core technical change across the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch build_dynmamic

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a035bc and 8a0f86b.

📒 Files selected for processing (8)
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.yaml (1 hunks)
  • nemo_rl/algorithms/grpo.py (1 hunks)
  • nemo_rl/models/policy/megatron_policy_worker.py (2 hunks)
  • tests/functional/L1_Functional_Tests_GPU.sh (1 hunks)
  • tests/functional/grpo_megatron_generation.sh (1 hunks)
  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh (1 hunks)
  • tests/test_suites/nightly.txt (1 hunks)
  • tests/unit/models/policy/test_megatron_worker.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/unit/models/policy/test_megatron_worker.py
🧰 Additional context used
📓 Path-based instructions (10)
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Use uv run to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling python directly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
  • tests/functional/grpo_megatron_generation.sh
  • tests/functional/L1_Functional_Tests_GPU.sh
tests/test_suites/llm/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension

Files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
tests/test_suites/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Place driver shell scripts and common.env under tests/test_suites// and list nightly tests in tests/test_suites/nightly.txt

Files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
  • tests/test_suites/nightly.txt
examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name

Files:

  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.yaml
examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

LLM recipe YAML filenames must follow: --ng-[-modifiers][-long][.vN].yaml

Files:

  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.yaml
examples/configs/recipes/**/*.{yaml,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script

Files:

  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.yaml
examples/configs/recipes/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Place recipe YAMLs under examples/configs/recipes//

Files:

  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.yaml
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/algorithms/grpo.py
  • nemo_rl/models/policy/megatron_policy_worker.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/algorithms/grpo.py
  • nemo_rl/models/policy/megatron_policy_worker.py
tests/test_suites/nightly.txt

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Append the new driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Files:

  • tests/test_suites/nightly.txt
🧠 Learnings (10)
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
  • tests/test_suites/nightly.txt
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/llm/*.sh : LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension

Applied to files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
  • tests/test_suites/nightly.txt
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/**/*.{sh} : For new model support, add a matching driver shell script under tests/test_suites/<domain>/ that sources common.env and invokes 'uv run ... --config <yaml>'

Applied to files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
  • tests/functional/grpo_megatron_generation.sh
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
  • tests/functional/grpo_megatron_generation.sh
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

  • tests/functional/grpo_megatron_generation.sh
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml

Applied to files:

  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name

Applied to files:

  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.yaml
📚 Learning: 2025-09-18T14:57:31.003Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: nemo_rl/algorithms/distillation.py:312-354
Timestamp: 2025-09-18T14:57:31.003Z
Learning: The distillation algorithm's cluster setup logic is designed to follow the same patterns used in GRPO for handling distributed training clusters and resource allocation.

Applied to files:

  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/nightly.txt : Append the new driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Applied to files:

  • tests/test_suites/nightly.txt
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/** : Place driver shell scripts and common.env under tests/test_suites/<domain>/ and list nightly tests in tests/test_suites/nightly.txt

Applied to files:

  • tests/test_suites/nightly.txt
🧬 Code graph analysis (2)
nemo_rl/algorithms/grpo.py (10)
nemo_rl/models/policy/megatron_policy_worker.py (1)
  • prepare_refit_info (1984-1998)
nemo_rl/models/generation/vllm/vllm_backend.py (1)
  • prepare_refit_info (90-97)
nemo_rl/models/policy/dtensor_policy_worker.py (1)
  • prepare_refit_info (1733-1740)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (1)
  • prepare_refit_info (1694-1701)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
  • prepare_refit_info (631-633)
nemo_rl/models/generation/interfaces.py (1)
  • prepare_refit_info (239-241)
nemo_rl/models/generation/vllm/vllm_generation.py (1)
  • prepare_refit_info (751-768)
nemo_rl/models/policy/lm_policy.py (1)
  • prepare_refit_info (682-691)
nemo_rl/models/policy/interfaces.py (1)
  • prepare_refit_info (157-158)
tests/unit/algorithms/test_distillation.py (2)
  • prepare_refit_info (549-550)
  • prepare_refit_info (565-566)
nemo_rl/models/policy/megatron_policy_worker.py (1)
nemo_rl/distributed/batched_data_dict.py (1)
  • size (814-823)
🪛 Ruff (0.14.4)
nemo_rl/models/policy/megatron_policy_worker.py

1796-1796: Local variable max_batch_size is assigned to but never used

Remove assignment to unused variable max_batch_size

(F841)

🪛 Shellcheck (0.11.0)
tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/functional/grpo_megatron_generation.sh

[error] 39-39: Double quote array expansions to avoid re-splitting elements.

(SC2068)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (7)
nemo_rl/algorithms/grpo.py (1)

514-515: LGTM! Defensive None-check prevents AttributeError.

This correctly guards against calling prepare_refit_info when policy_generation is None (which occurs when backend == "megatron" at lines 439-444).

nemo_rl/models/policy/megatron_policy_worker.py (2)

1753-1756: LGTM! Ensures model is on CUDA for generation.

The model movement before disabling forward pre-hook is appropriate for the colocated inference setup.


1869-1897: LGTM! Dynamic engine request handling and result collection.

The per-prompt request loop, result collection via step_modern, and sorting by request_id correctly maintain batch order and integrate with the new dynamic inference flow.

tests/test_suites/nightly.txt (1)

15-15: LGTM! Test path correctly registered.

The new Megatron generation test path follows the established pattern and aligns with the new test script added to the suite.

tests/functional/L1_Functional_Tests_GPU.sh (1)

26-26: LGTM! Extends L1 GPU test suite.

The new grpo_megatron_generation.sh test invocation follows the existing pattern and appropriately expands test coverage for the dynamic Megatron generation path.

examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.yaml (1)

1-36: LGTM! Well-structured Megatron generation configuration.

The YAML configuration correctly defines all necessary settings for GRPO with Megatron-based generation, including:

  • Appropriate defaults inheritance
  • Megatron backend with dynamic generation parameters
  • Consistent resource allocation (8 GPUs per node)
  • Proper logging and checkpointing setup

The configuration aligns well with the dynamic inference changes in the Megatron policy worker.

tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh (1)

29-29: Quote the $@ expansion to handle arguments with spaces.

Unquoted $@ can cause word splitting and glob expansion issues when arguments contain spaces or special characters.

Apply this diff:

-    $@ \
+    "$@" \
⛔ Skipped due to learnings
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

from megatron.core.inference.sampling_params import SamplingParams

inference_max_batch_size = inference_wrapper_config.inference_max_requests
max_batch_size = inference_max_batch_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused variable max_batch_size.

The variable is assigned but never referenced, as confirmed by static analysis.

Apply this diff:

-max_batch_size = inference_max_batch_size
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
max_batch_size = inference_max_batch_size
🧰 Tools
🪛 Ruff (0.14.4)

1796-1796: Local variable max_batch_size is assigned to but never used

Remove assignment to unused variable max_batch_size

(F841)

🤖 Prompt for AI Agents
In nemo_rl/models/policy/megatron_policy_worker.py around line 1796, the local
variable assignment "max_batch_size = inference_max_batch_size" is unused;
remove that line (or if intended to be used, replace subsequent references to
inference_max_batch_size with max_batch_size). Ensure no other code depends on
max_batch_size after removal and run tests/linter to confirm the unused-variable
warning is resolved.

Comment on lines +1849 to 1867
sampling_params = SamplingParams(
temperature=temperature,
top_k=top_k_val,
top_p=top_p_val,
return_segments=False,
return_log_probs=True,
num_tokens_to_generate=tokens_to_generate,
top_n_logprobs=0,
return_prompt_top_n_logprobs=False,
)
sampling_params = SamplingParams(
temperature=1.0,
top_k=0,
top_p=0.0,
return_segments=True,
return_log_probs=True,
num_tokens_to_generate=self.cfg["generation"]["max_new_tokens"]
- data["input_ids"].size(1),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix duplicate SamplingParams initialization.

Lines 1849-1858 create a SamplingParams object, but lines 1859-1867 immediately overwrite it with different parameters. This appears to be a copy-paste error where the first initialization should be removed.

Apply this diff to keep only the second initialization:

-sampling_params = SamplingParams(
-    temperature=temperature,
-    top_k=top_k_val,
-    top_p=top_p_val,
-    return_segments=False,
-    return_log_probs=True,
-    num_tokens_to_generate=tokens_to_generate,
-    top_n_logprobs=0,
-    return_prompt_top_n_logprobs=False,
-)
 sampling_params = SamplingParams(
     temperature=1.0,
     top_k=0,
     top_p=0.0,
     return_segments=True,
     return_log_probs=True,
     num_tokens_to_generate=self.cfg["generation"]["max_new_tokens"]
         - data["input_ids"].size(1),
 )

Alternatively, if the first initialization is intended, remove the second one and verify the parameter values are correct for your use case.

🤖 Prompt for AI Agents
In nemo_rl/models/policy/megatron_policy_worker.py around lines 1849 to 1867
there are two consecutive SamplingParams initializations where the first (lines
1849-1858) is immediately overwritten by the second (lines 1859-1867); remove
the duplicate by deleting the first SamplingParams block (lines 1849-1858) so
only the intended second initialization remains, or if the first was the
intended configuration, delete the second instead and confirm the parameter
values match the desired behavior.

logger.wandb_enabled=false \
logger.monitor_gpus=true \
checkpointing.enabled=false \
$@ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Quote the $@ expansion to handle arguments with spaces.

Unquoted $@ can cause word splitting and glob expansion issues when arguments contain spaces or special characters.

Apply this diff:

-    $@ \
+    "$@" \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$@ \
"$@" \
🧰 Tools
🪛 Shellcheck (0.11.0)

[error] 39-39: Double quote array expansions to avoid re-splitting elements.

(SC2068)

🤖 Prompt for AI Agents
In tests/functional/grpo_megatron_generation.sh around line 39, the script uses
an unquoted $@ which can split arguments with spaces and allow globbing; update
the invocation to quote the expansion (use "$@") so each original argument is
preserved as a single word and special characters are not subject to word
splitting or glob expansion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants