Skip to content

[megatron] support qwen3.5 models for megatron, bump mbridge + megatron-core to latest#1425

Merged
erictang000 merged 12 commits intomainfrom
qwen3_5_megatron
Apr 7, 2026
Merged

[megatron] support qwen3.5 models for megatron, bump mbridge + megatron-core to latest#1425
erictang000 merged 12 commits intomainfrom
qwen3_5_megatron

Conversation

@erictang000
Copy link
Copy Markdown
Collaborator

@erictang000 erictang000 commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a Megatron training script for Qwen 3.5, updates dependencies, and introduces a monkey-patch for transformers v5 compatibility. Review feedback identifies a likely version typo in pyproject.toml, an undefined variable and inconsistent model naming in the shell script, and suggests more specific exception handling for the vLLM engine workaround.

trainer.policy.megatron_config.expert_model_parallel_size=$MEGATRON_EP \
trainer.policy.megatron_config.expert_tensor_parallel_size=$MEGATRON_ETP \
trainer.use_sample_packing=false \
trainer.flash_attn=$FLASH_ATTN \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The variable $FLASH_ATTN is used here, but its definition on line 30 is commented out. This will result in an empty value being passed to the trainer, which may cause a parsing error in the entrypoint.

Suggested change
trainer.flash_attn=$FLASH_ATTN \
trainer.flash_attn=false \

@@ -0,0 +1,77 @@
set -x

# Colocated GRPO training+generation for Moonlight-16B-A3B-Instruct on GSM8K with Megatron.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment mentions Moonlight-16B-A3B-Instruct, but the script is configured for Qwen/Qwen3.5-0.8B (line 12). This should be updated to reflect the correct model.

generator.inference_engine.gpu_memory_utilization=0.6 \
trainer.logger="$LOGGER" \
trainer.project_name="gsm8k_megatron" \
trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_moonlight16b-a3b" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The run_name suffix refers to moonlight16b-a3b. It should be updated to match the Qwen 3.5 model being used.

Suggested change
trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_moonlight16b-a3b" \
trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_qwen3.5-0.8b" \

Comment on lines +57 to +58
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a broad Exception and passing silently is generally discouraged. While this is a monkey-patch workaround, it would be safer to catch specific errors (like ImportError or AttributeError) or at least log a warning if the patch fails, to aid in debugging if the library structure changes unexpectedly.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@erictang000 erictang000 merged commit 29c11ba into main Apr 7, 2026
3 of 4 checks passed
@erictang000 erictang000 deleted the qwen3_5_megatron branch April 7, 2026 00:00
tyler-griggs pushed a commit that referenced this pull request Apr 8, 2026
The uv.lock regeneration in #1425 bumped griffe2md from 1.3.4 to 1.5.0,
which renders docstring content slightly differently. A `<=` operator at
the start of a line in the FullyAsyncConfig docstring was left unescaped
in the MDX output, causing the Next.js MDX parser to interpret `<` as the
start of a JSX tag and fail with:

  Unexpected character `=` (U+003D) before name

Add a second regex pass in `sanitize_for_mdx()` that escapes any `<` not
followed by a valid tag-start character (letter, `/`, `!`), converting
patterns like `<=` to `&lt;=`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SumanthRH pushed a commit that referenced this pull request Apr 8, 2026
## Summary
- The `uv.lock` regeneration in #1425 bumped `griffe2md` from 1.3.4 to
1.5.0, which renders docstring content slightly differently
- A `<=` operator at the start of a line in the `FullyAsyncConfig`
docstring (`skyrl/train/config/config.py:394`) was left unescaped in the
MDX output, causing the Next.js MDX parser to fail with: `Unexpected
character '=' (U+003D) before name`
- Adds a second regex pass in `sanitize_for_mdx()` that escapes any bare
`<` not followed by a valid HTML tag-start character (letter, `/`, `!`)
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1478"
target="_blank">
  <picture>
<source media="(prefers-color-scheme: dark)"
srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img
src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1"
alt="Open with Devin">
  </picture>
</a>
<!-- devin-review-badge-end -->

Co-authored-by: Tyler <tyler@oci.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant