[distill][phase1-2] Decouple DMD2 from Wan + YAML training args + checkpoint/resume#1122
[distill][phase1-2] Decouple DMD2 from Wan + YAML training args + checkpoint/resume#1122alexzms wants to merge 45 commits intohao-ai-lab:mainfrom
Conversation
Summary of ChangesHello @alexzms, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the distillation framework by decoupling the core algorithm from model-specific implementations and introducing a robust, YAML-driven configuration system. The changes aim to improve scalability, reusability, and maintainability of distillation training pipelines, enabling more flexible experimentation and easier integration of new models and methods. It also introduces a comprehensive checkpointing system and independent validation, laying a solid foundation for future distillation efforts. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is a significant and well-executed refactoring that decouples the distillation method from the model specifics, introduces a flexible YAML-based configuration system, and adds robust checkpointing and resume capabilities. The new structure with Trainer, Method, Adapter, and Builder is clean and promotes reusability.
I've added a couple of minor suggestions to improve code style and maintainability. Overall, this is a great contribution that significantly improves the distillation framework.
fastvideo/distillation/builder.py
Outdated
| checkpointing_type=training_args. | ||
| enable_gradient_checkpointing_type, |
There was a problem hiding this comment.
This line break in the middle of an attribute access is a bit unusual and harms readability. It's better to keep training_args.enable_gradient_checkpointing_type on a single line.
| checkpointing_type=training_args. | |
| enable_gradient_checkpointing_type, | |
| checkpointing_type=training_args.enable_gradient_checkpointing_type, |
b381383 to
a86cda7
Compare
|
Up to this version, only validator is not decoupled and decoupling will require to rewrite WanPipeline to accept mode of sde/ode. Planned to be done during phase3. |
1) Motivation
Phase 0 #1120 introduced a new distillation scaffold (
Trainer ↔ Method ↔ Adapter + ModelBundle), but it still had two big limitations:*_distillation_vN.pyper model family.This PR lands Phase 1 + Phase 2, pushing the refactor to the point where we can run few-step distillation via a YAML-only entrypoint and keep the method/algorithm reusable, while the adapter absorbs model/pipeline quirks.
2) Phase 1: Decouple DistillMethod
What Phase 1 changes
DistillTraineris infra-only (loop/accum/step/logging).DistillMethodowns the algorithm and update policy (multi-optimizer schedules, stepping cadence).DistillAdapterowns model/pipeline-specific forward context and batch normalization.Key outcomes
fastvideo/distillation/methods/distribution_matching/dmd2.pyfastvideo/distillation/adapters/wan.py3) Phase 2: YAML-only entrypoint + builder runtime + fully decouple
Phase 2 makes the new path standalone (no legacy distillation pipeline dependency):
New entrypoint (YAML-only)
fastvideo/training/distillation.py--config,--resume-from-checkpoint,--override-output-dir,--dry-run.Example YAML:
YAML config loading + typed spec
fastvideo/distillation/yaml_config.pyfastvideo/distillation/specs.pyBuilder that instantiates runtime from roles + method
fastvideo/distillation/builder.pyValidation without legacy pipelines
fastvideo/distillation/validators/base.pyfastvideo/distillation/validators/wan.py_log_validation).Checkpoint save/resume (new system)
fastvideo/distillation/checkpoint.pyfastvideo/distillation/trainer.py.“outside/” config tree (non-invasive)
fastvideo/distillation/outside/fastvideo/configs/distillation/...fastvideo/configs/*yet.4) What’s done vs. not done
✅ Done in this PR (Phase 1 + 2)
WanAdapter) that owns forward context (keeps methods clean).fastvideo/training/distillation.py.fastvideo/distillation/outside/fastvideo/configs/distillation/distill_wan2.1_t2v_1.3B_dmd2_8steps.yamlexamples/distillation/phase2/temp.sh(Will be cleaned in phase 3)❌ Not done yet (planned follow-ups)
studentand dataset would be provided), our new distillation framework is fully capable of aborbing the finetuning code into itself, resulting in a even cleaner training pipeline.Tests / Evidence
Feedback and suggestions are highly welcome!