Skip to content

Conversation

@hushenwei2000
Copy link
Contributor

@hushenwei2000 hushenwei2000 commented Nov 6, 2025

PR types

CI/CE

PR changes

Others

Description

Add Qwen3MoE CI/CE configs.
Distributed config: TP2SPSD2EP4PP2-packing

Cover configs:

  • SFT / DPO
  • Use / Not Use LoRA
  • DeepEP / AllToAll

TODO:

  • Pretrain
  • DPO + Use LoRA: Currently use TP4SPSD2EP4PP2-packing. (Otherwise will rase "parameters not trainable" error)
  • DPO + AllToAll: Currently has precision problem, recommend set ep_communication_type: "deepep".

@paddle-bot
Copy link

paddle-bot bot commented Nov 6, 2025

Thanks for your contribution!

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@0ee5333). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #2876   +/-   ##
==========================================
  Coverage           ?   31.00%           
==========================================
  Files              ?      355           
  Lines              ?    59111           
  Branches           ?        0           
==========================================
  Hits               ?    18327           
  Misses             ?    40784           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

train_dataset_path: data-sft/train_gsm8k.json
train_dataset_prob: "1.0"
eval_dataset_path: data-sft/test_gsm8k.json
eval_dataset_prob: "1.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train_dataset_type: erniekit
eval_dataset_type: erniekit
train_dataset_path: ./data/sft/train.json
train_dataset_prob: "1.0"
eval_dataset_path: ./data/sft/dev.json
eval_dataset_prob: "1.0"
max_seq_len: 8192
packing: true
mix_strategy: concat
和其他模型一样

do_eval: false
per_device_eval_batch_size: 1
per_device_train_batch_size: 1
num_train_epochs: 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_train_epochs: 1
max_steps: -1

# use_filtered_label_loss: true
optim: adamw_custom
tensorwise_offload_optimizer: true
recompute: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offload_optim: false
use_fused_head_and_loss_fn: true
# use_filtered_label_loss: true
optim: adamw_custom
tensorwise_offload_optimizer: true

只留下tensorwise_offload_optimizer: true就行

fp16_opt_level: O2
unified_checkpoint: true

sharding_parallel_config: "split_param"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFT 4个yaml可以缩减为2yaml,只需要full_tp_pp_ep.yaml 和 lora_tp_pp_ep.yaml 默认为 all2all ep_communication_type: "alltoall" # choices: [deepep, alltoall] deepep only for Hooper GPU

learning_rate: 1.0e-6

# performance
tensor_parallel_degree: 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增一个lora的

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dpo/full_tp_pp_ep.yaml 和 dpo/lora_tp_pp_ep.yaml

@hushenwei2000 hushenwei2000 changed the title Add Qwen3MoE CI Config [CI/CE] Add Qwen3MoE CI Config Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants