-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[CI/CE] Add Qwen3MoE CI Config #2876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #2876 +/- ##
==========================================
Coverage ? 31.00%
==========================================
Files ? 355
Lines ? 59111
Branches ? 0
==========================================
Hits ? 18327
Misses ? 40784
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| train_dataset_path: data-sft/train_gsm8k.json | ||
| train_dataset_prob: "1.0" | ||
| eval_dataset_path: data-sft/test_gsm8k.json | ||
| eval_dataset_prob: "1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train_dataset_type: erniekit
eval_dataset_type: erniekit
train_dataset_path: ./data/sft/train.json
train_dataset_prob: "1.0"
eval_dataset_path: ./data/sft/dev.json
eval_dataset_prob: "1.0"
max_seq_len: 8192
packing: true
mix_strategy: concat
和其他模型一样
examples/config/sft/qwen3moe.yaml
Outdated
| do_eval: false | ||
| per_device_eval_batch_size: 1 | ||
| per_device_train_batch_size: 1 | ||
| num_train_epochs: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_train_epochs: 1
max_steps: -1
| # use_filtered_label_loss: true | ||
| optim: adamw_custom | ||
| tensorwise_offload_optimizer: true | ||
| recompute: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offload_optim: false
use_fused_head_and_loss_fn: true
# use_filtered_label_loss: true
optim: adamw_custom
tensorwise_offload_optimizer: true
只留下tensorwise_offload_optimizer: true就行
| fp16_opt_level: O2 | ||
| unified_checkpoint: true | ||
|
|
||
| sharding_parallel_config: "split_param" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SFT 4个yaml可以缩减为2yaml,只需要full_tp_pp_ep.yaml 和 lora_tp_pp_ep.yaml 默认为 all2all ep_communication_type: "alltoall" # choices: [deepep, alltoall] deepep only for Hooper GPU
| learning_rate: 1.0e-6 | ||
|
|
||
| # performance | ||
| tensor_parallel_degree: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
新增一个lora的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dpo/full_tp_pp_ep.yaml 和 dpo/lora_tp_pp_ep.yaml
PR types
CI/CE
PR changes
Others
Description
Add Qwen3MoE CI/CE configs.
Distributed config: TP2SPSD2EP4PP2-packing
Cover configs:
TODO:
ep_communication_type: "deepep".