[megatron] support qwen3.5 models for megatron, bump mbridge + megatron-core to latest#1425
[megatron] support qwen3.5 models for megatron, bump mbridge + megatron-core to latest#1425erictang000 merged 12 commits intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds a Megatron training script for Qwen 3.5, updates dependencies, and introduces a monkey-patch for transformers v5 compatibility. Review feedback identifies a likely version typo in pyproject.toml, an undefined variable and inconsistent model naming in the shell script, and suggests more specific exception handling for the vLLM engine workaround.
| trainer.policy.megatron_config.expert_model_parallel_size=$MEGATRON_EP \ | ||
| trainer.policy.megatron_config.expert_tensor_parallel_size=$MEGATRON_ETP \ | ||
| trainer.use_sample_packing=false \ | ||
| trainer.flash_attn=$FLASH_ATTN \ |
There was a problem hiding this comment.
| @@ -0,0 +1,77 @@ | |||
| set -x | |||
|
|
|||
| # Colocated GRPO training+generation for Moonlight-16B-A3B-Instruct on GSM8K with Megatron. | |||
| generator.inference_engine.gpu_memory_utilization=0.6 \ | ||
| trainer.logger="$LOGGER" \ | ||
| trainer.project_name="gsm8k_megatron" \ | ||
| trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_moonlight16b-a3b" \ |
There was a problem hiding this comment.
The run_name suffix refers to moonlight16b-a3b. It should be updated to match the Qwen 3.5 model being used.
| trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_moonlight16b-a3b" \ | |
| trainer.run_name="gsm8k_megatron_tp${MEGATRON_TP}_pp${MEGATRON_PP}_cp${MEGATRON_CP}_ep${MEGATRON_EP}_etp${MEGATRON_ETP}_qwen3.5-0.8b" \ |
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Catching a broad Exception and passing silently is generally discouraged. While this is a monkey-patch workaround, it would be safer to catch specific errors (like ImportError or AttributeError) or at least log a warning if the patch fails, to aid in debugging if the library structure changes unexpectedly.
The uv.lock regeneration in #1425 bumped griffe2md from 1.3.4 to 1.5.0, which renders docstring content slightly differently. A `<=` operator at the start of a line in the FullyAsyncConfig docstring was left unescaped in the MDX output, causing the Next.js MDX parser to interpret `<` as the start of a JSX tag and fail with: Unexpected character `=` (U+003D) before name Add a second regex pass in `sanitize_for_mdx()` that escapes any `<` not followed by a valid tag-start character (letter, `/`, `!`), converting patterns like `<=` to `<=`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary - The `uv.lock` regeneration in #1425 bumped `griffe2md` from 1.3.4 to 1.5.0, which renders docstring content slightly differently - A `<=` operator at the start of a line in the `FullyAsyncConfig` docstring (`skyrl/train/config/config.py:394`) was left unescaped in the MDX output, causing the Next.js MDX parser to fail with: `Unexpected character '=' (U+003D) before name` - Adds a second regex pass in `sanitize_for_mdx()` that escapes any bare `<` not followed by a valid HTML tag-start character (letter, `/`, `!`) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1478" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end --> Co-authored-by: Tyler <tyler@oci.local> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GPU CI: https://github.com/NovaSky-AI/SkyRL/actions/runs/23869520430
Megatron GPU CI: https://github.com/NovaSky-AI/SkyRL/actions/runs/23869278330
Megatron GPU CI #2: https://github.com/NovaSky-AI/SkyRL/actions/runs/24045414612
megatron gpu CI #3: https://github.com/NovaSky-AI/SkyRL/actions/runs/24054807024
WandB run for Qwen3.5-0.8B:

https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_megatron/runs/5cm9tg0j