Skip to content

Conversation

@adil-a
Copy link

@adil-a adil-a commented Nov 4, 2025

What does this PR do ?

Adds GPT-OSS SFT using AutoModel custom models + DeepEP.

To run, launch the nightly container and run

NRL_FORCE_REBUILD_VENVS=true uv run examples/run_sft.py --config examples/configs/recipes/llm/sft-gpt-oss-20b-1n8g-fsdp8ep8-automodel.yaml cluster.gpus_per_node=8 logger.wandb_enabled=false

adil-a and others added 17 commits November 1, 2025 17:23
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
Signed-off-by: adil-a <[email protected]>
@adil-a adil-a changed the title Hemil/automodel moe feat: DTensorPolicyV2 GPT-OSS support Nov 5, 2025
Signed-off-by: adil-a <[email protected]>
@adil-a adil-a marked this pull request as ready for review November 5, 2025 06:50
@adil-a adil-a requested review from a team as code owners November 5, 2025 06:50
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

⚠️ File Consistency Check

Check based on commit: e936ebf (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

❌ Submodule Fast-Forward Check Failed

Check based on commit: e936ebf (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Signed-off-by: adil-a <[email protected]>
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

⚠️ File Consistency Check

Check based on commit: 7df0cc5 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

❌ Submodule Fast-Forward Check Failed

Check based on commit: 7df0cc5 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Signed-off-by: adil-a <[email protected]>
@adil-a adil-a requested a review from a team as a code owner November 5, 2025 19:34
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

⚠️ File Consistency Check

Check based on commit: 1eef903 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

❌ Submodule Fast-Forward Check Failed

Check based on commit: 1eef903 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@github-actions
Copy link

github-actions bot commented Nov 6, 2025

⚠️ File Consistency Check

Check based on commit: 24214e9 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

github-actions bot commented Nov 6, 2025

❌ Submodule Fast-Forward Check Failed

Check based on commit: 24214e9 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@terrykong
Copy link
Contributor

@adil-a what's the current status of this PR?

Signed-off-by: adil-a <[email protected]>
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 2ed872a (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

❌ Submodule Fast-Forward Check Failed

Check based on commit: 2ed872a (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/5e995e9535e63cbe3358dc2bd81a8ed3a696cee7/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 5489b21 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

❌ Submodule Fast-Forward Check Failed

Check based on commit: 5489b21 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/d7f248adf367585f0bd9c5febea6401a6cd6ea4f/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Signed-off-by: adil-a <[email protected]>
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: ed69abd (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

❌ Submodule Fast-Forward Check Failed

Check based on commit: ed69abd (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/d7f248adf367585f0bd9c5febea6401a6cd6ea4f/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Signed-off-by: adil-a <[email protected]>
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: b754c7c (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

❌ Submodule Fast-Forward Check Failed

Check based on commit: b754c7c (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/d7f248adf367585f0bd9c5febea6401a6cd6ea4f/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Signed-off-by: adil-a <[email protected]>
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 3877e79 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

❌ Submodule Fast-Forward Check Failed

Check based on commit: 3877e79 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/d7f248adf367585f0bd9c5febea6401a6cd6ea4f/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 661b596 (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

❌ Submodule Fast-Forward Check Failed

Check based on commit: 661b596 (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/756ed10c29039cd9af551761d054a526021f559d/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

# when FSDP reduces the gradients over the DP dim, they're automatically averaged
# but we want to sum them so we cancel out the average here
loss *= self.dp_size * self.cp_size
# loss *= self.dp_size * self.cp_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this line and ensure that grad norm + loss matches for HF models with different TP sizes


with get_train_context(False, False, context_parallel_ctx)():
with torch.autocast(device_type="cuda", dtype=self.dtype):
with nullcontext():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this configurable with default to use autocast to maintain backwards compatibility.

Copy link
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adil-a @hemildesai thanks for the great effort! left some comments.

@@ -0,0 +1,29 @@
defaults: ../../sft.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the nightly test for this?
you can refer to tests/test_suites/llm/grpo-deepscaler-1.5b-8K.sh.

else OffloadPolicy(),
sequence_parallel=sequence_parallel_enabled,
else None,
backend="nccl",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, don't we need to set backend=backend here?

# Manually broadcast buffers
for _, buf in self.model.named_buffers():
torch.distributed.broadcast(to_local_if_dtensor(buf), src=0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know will this affect other models? @ffrujeri

# Load base model weights across all ranks using Automodel Checkpointer
# This mirrors build_model_and_optimizer's is_meta_device + load_weights path
print(self.model)
self._ensure_checkpointer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind to move all the checkpoint related code to nemo_rl/utils/automodel_checkpoint.py to make the code more clear?

I think you can add a class in automodel_checkpoint.py and only call its functions in dtensor_policy_worker_v2.py.

Also we should have unit tests for the new automodel's checkpoint.

cc @hemildesai @ffrujeri @joyang-nv

e.g.,

class AutoModelCheckpointer:
    def __init__(self, ):
        ...

    def save_checkpoint(self, ):
        ...

    def load_checkpoint(self, ):
        ...

@@ -0,0 +1,29 @@
defaults: ../../sft.yaml
policy:
model_name: openai/gpt-oss-20b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you have some plots for the convergence of gpt-oss, can you paste them to the PR? so that others can know this recipe's results.

Also do you have tested other models (e.g., llama, qwen) using this PR to make sure this PR won't affect other models? there's a lot of changes in the dtensor v2 worker.

@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: d89180c (PR #1470 from hemil/automodel-moe)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

❌ Submodule Fast-Forward Check Failed

Check based on commit: d89180c (PR #1470 from hemil/automodel-moe)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1470 from hemil/automodel-moe): https://github.com/NVIDIA-NeMo/Automodel/commits/756ed10c29039cd9af551761d054a526021f559d/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants