[distill][phase1-2] Decouple DMD2 from Wan + YAML training args + checkpoint/resume by alexzms · Pull Request #1122 · hao-ai-lab/FastVideo

alexzms · 2026-02-22T22:36:08Z

1) Motivation

Phase 0 #1120 introduced a new distillation scaffold (Trainer ↔ Method ↔ Adapter + ModelBundle), but it still had two big limitations:

Algorithm/model coupling still leaked through (e.g. Wan-specific method naming and pipeline-backed behavior).
Entrypoints and configs were still “legacy-shaped”, which makes it hard to scale to many models/methods/roles without re-creating a new *_distillation_vN.py per model family.

This PR lands Phase 1 + Phase 2, pushing the refactor to the point where we can run few-step distillation via a YAML-only entrypoint and keep the method/algorithm reusable, while the adapter absorbs model/pipeline quirks.

2) Phase 1: Decouple DistillMethod

What Phase 1 changes

Move distillation logic toward a FastGen-style hierarchy where:
- DistillTrainer is infra-only (loop/accum/step/logging).
- DistillMethod owns the algorithm and update policy (multi-optimizer schedules, stepping cadence).
- DistillAdapter owns model/pipeline-specific forward context and batch normalization.

Key outcomes

Generic algorithm method: introduce a reusable DMD2 implementation under a method taxonomy:
- fastvideo/distillation/methods/distribution_matching/dmd2.py
Model-family adapter: Wan specifics live in the adapter (forward context, pipeline normalization):
- fastvideo/distillation/adapters/wan.py
Validation boundary: validation is no longer a “pipeline side-effect”; it becomes an explicit component rather than being baked into legacy pipelines.

3) Phase 2: YAML-only entrypoint + builder runtime + fully decouple

Phase 2 makes the new path standalone (no legacy distillation pipeline dependency):

New entrypoint (YAML-only)

fastvideo/training/distillation.py
- Only accepts new YAML configs (no legacy config fallback / merging).
- CLI stays minimal: runtime controls like --config, --resume-from-checkpoint, --override-output-dir, --dry-run.

Example YAML:

distill:
  model: wan
  method: dmd2

models:
  student:
    family: wan
    path: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
    trainable: true
  teacher:
    family: wan
    path: Wan-AI/Wan2.1-T2V-14B-Diffusers
    trainable: false
  critic:
    family: wan
    path: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
    trainable: true

training:
  # Distributed
  num_gpus: 8
  sp_size: 1
  tp_size: 1

  # Data (parquet dataset folder)
  data_path: data/Wan-Syn_77x448x832_600k
  dataloader_num_workers: 4
  ...

YAML config loading + typed spec

fastvideo/distillation/yaml_config.py
fastvideo/distillation/specs.py

Builder that instantiates runtime from roles + method

fastvideo/distillation/builder.py
- Builds roles (student/teacher/critic/…) → adapter → model bundle → method → trainer.
- Keeps “how to assemble a run” separate from both trainer and method.

Validation without legacy pipelines

fastvideo/distillation/validators/base.py
fastvideo/distillation/validators/wan.py
- Phase 2 runs validation via the new validator path (no legacy _log_validation).

Checkpoint save/resume (new system)

fastvideo/distillation/checkpoint.py
- Save/resume trainable roles, optimizer/scheduler, dataloader state, and RNG states (including adapter-exposed generators).
- Integrated into fastvideo/distillation/trainer.py.

“outside/” config tree (non-invasive)

fastvideo/distillation/outside/fastvideo/configs/distillation/...
- Used to iterate on a better distillation config scheme without touching fastvideo/configs/* yet.
- This PR’s entrypoint reads explicit paths; no “outside path auto-completion” magic.

4) What’s done vs. not done

✅ Done in this PR (Phase 1 + 2)

Land a generic DMD2 method in a our new distillation framework.
Land a model-family adapter (WanAdapter) that owns forward context (keeps methods clean).
Add a YAML-only distillation entrypoint: fastvideo/training/distillation.py.
Add a role-based builder that instantiates adapter/bundle/method/trainer from YAML.
Add independent validation (no legacy distillation pipeline dependency).
Add checkpoint save/resume to the new path (with retention policy).
Provide a runnable few-step distillation config + example script:
- YAML: fastvideo/distillation/outside/fastvideo/configs/distillation/distill_wan2.1_t2v_1.3B_dmd2_8steps.yaml
- Runner: examples/distillation/phase2/temp.sh (Will be cleaned in phase 3)

❌ Not done yet (planned follow-ups)

Additional distillation methods beyond DMD2 (e.g. Self-Forcing + ODE init path).
Additional adapters/models beyond Wan.
Long-term cleanup: remove legacy distillation pipelines (left intact for now; new code path is additive).
More rigorous automated tests for end-to-end training/validation (GPU CI, SSIM regression, etc.).
Since finetuning is a special distillation (only student and dataset would be provided), our new distillation framework is fully capable of aborbing the finetuning code into itself, resulting in a even cleaner training pipeline.
Better config yaml design. For supporting future distillation methods or even finetuning.
Decouple builder function in builer.py

Tests / Evidence

Manual GPU runs (few-step Wan DMD2) show the expected behavior: early validation outputs are blurry and become progressively clearer and higher quality as training proceeds.
Phase 1,2 training behavior matches Phase 0 very closely in practice.
- Phase 1: https://wandb.ai/alexzms-ucsd/phase1_wan_dmd2_8steps_wansyn/runs/dlyw3gh3?nw=nwuseralexzms
- Phase 2: https://wandb.ai/alexzms-ucsd/phase2_wan_dmd2_8steps_wansyn/runs/77zfy2r2?nw=nwuseralexzms

Feedback and suggestions are highly welcome!

gemini-code-assist · 2026-02-22T22:36:40Z

Summary of Changes

Hello @alexzms, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the distillation framework by decoupling the core algorithm from model-specific implementations and introducing a robust, YAML-driven configuration system. The changes aim to improve scalability, reusability, and maintainability of distillation training pipelines, enabling more flexible experimentation and easier integration of new models and methods. It also introduces a comprehensive checkpointing system and independent validation, laying a solid foundation for future distillation efforts.

Highlights

Decoupled Distillation Framework: Introduced a new distillation scaffold with a clear separation of concerns: DistillTrainer for infrastructure, DistillMethod for algorithms, and DistillAdapter for model/pipeline specifics. This refactors the distillation logic towards a FastGen-style hierarchy.
YAML-only Entrypoint: Implemented a new standalone entrypoint for distillation training that exclusively uses YAML configurations, moving away from legacy command-line argument structures. This simplifies configuration management and scalability.
Generic DMD2 Method and Wan Adapter: Landed a reusable DMD2 (Distribution Matching Distillation) implementation as a generic method and a WanAdapter to handle Wan-specific model and pipeline quirks, ensuring algorithm reusability across different models.
New Checkpoint and Resume System: Integrated a new checkpointing system that supports saving and resuming trainable roles, optimizers, schedulers, dataloader states, and RNG states, enhancing training robustness and flexibility.
Independent Validation: Established an independent validation mechanism that is no longer a side-effect of legacy pipelines, making validation an explicit and separate component of the distillation process.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

examples/distillation/phase0/distill_wan2.1_t2v_1.3B_dmd2_8steps.sh
- Added a new shell script for Phase 0 DMD2 distillation.
examples/distillation/phase0/temp.sh
- Added a temporary shell script for Phase 0 Wan DMD2 distillation.
examples/distillation/phase1/distill_wan2.1_t2v_1.3B_dmd2_8steps.sh
- Added a new shell script for Phase 1 DMD2 distillation using the new method/adapter entrypoint.
examples/distillation/phase1/run.md
- Added a markdown file for Phase 1 run links.
examples/distillation/phase1/temp.sh
- Added a temporary shell script for Phase 1 Wan DMD2 distillation.
examples/distillation/phase2/README.md
- Added a README for Phase 2 YAML-only distillation examples.
examples/distillation/phase2/distill_wan2.1_t2v_1.3B_dmd2_8steps.yaml
- Added a YAML configuration file for Phase 2 Wan DMD2 distillation.
examples/distillation/phase2/run_wan2.1_t2v_1.3B_dmd2_8steps.sh
- Added a shell script to run Phase 2 distillation from a YAML config.
examples/distillation/phase2/temp.sh
- Added a temporary shell script for Phase 2 Wan DMD2 distillation.
fastvideo/distillation/init.py
- Added an init.py file to define the distillation module's public API.
fastvideo/distillation/adapters/init.py
- Added an init.py file for distillation adapters.
fastvideo/distillation/adapters/base.py
- Added a base abstract class for distillation adapters.
fastvideo/distillation/adapters/wan.py
- Added the WanAdapter implementation for Wan-specific distillation logic.
fastvideo/distillation/builder.py
- Added a builder module to construct the distillation runtime from configuration.
fastvideo/distillation/bundle.py
- Added data structures for ModelBundle and RoleHandle to manage models and their roles.
fastvideo/distillation/checkpoint.py
- Added a checkpoint manager for saving and resuming distillation training states.
fastvideo/distillation/methods/init.py
- Added an init.py file for distillation methods.
fastvideo/distillation/methods/consistency_model/init.py
- Added an empty init.py for consistency model methods.
fastvideo/distillation/methods/distribution_matching/init.py
- Added an init.py for distribution matching methods.
fastvideo/distillation/methods/distribution_matching/dmd2.py
- Added the DMD2Method implementation, encapsulating the DMD2 algorithm.
fastvideo/distillation/methods/fine_tuning/init.py
- Added an empty init.py for fine-tuning methods.
fastvideo/distillation/methods/knowledge_distillation/init.py
- Added an empty init.py for knowledge distillation methods.
fastvideo/distillation/outside/README.md
- Added a README explaining the purpose of the outside/ directory for config iteration.
fastvideo/distillation/outside/fastvideo/configs/distillation/distill_wan2.1_t2v_1.3B_dmd2_8steps.yaml
- Added a YAML configuration file for Phase 2 Wan DMD2 distillation.
fastvideo/distillation/specs.py
- Added data classes for DistillSpec and RoleSpec to define distillation configurations.
fastvideo/distillation/trainer.py
- Added a generic DistillTrainer class for managing the training loop.
fastvideo/distillation/validators/init.py
- Added an init.py for distillation validators.
fastvideo/distillation/validators/base.py
- Added a base abstract class for distillation validators.
fastvideo/distillation/validators/wan.py
- Added the WanValidator implementation for Wan-specific validation.
fastvideo/distillation/yaml_config.py
- Added a module for loading distillation run configurations from YAML.
fastvideo/tests/distillation/test_optimizer_scheduler_alignment.py
- Added a test for optimizer and scheduler alignment in distillation methods.
fastvideo/training/distillation.py
- Added the main entrypoint for Phase 2 distillation training.

Activity

The pull request introduces a significant refactoring of the distillation framework, moving from a tightly coupled system to a more modular and extensible architecture.
Phase 1 focused on decoupling the DistillMethod from model specifics, introducing DistillTrainer, DistillMethod, and DistillAdapter components.
Phase 2 completed the refactor by enabling a YAML-only entrypoint, a role-based builder for runtime instantiation, and a new checkpointing system.
Manual GPU runs were performed to validate the expected behavior, showing progressively clearer and higher-quality validation outputs as training advanced.
The training behavior in Phase 1 and 2 closely matched that of Phase 0, indicating a successful refactor without regression in core functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a significant and well-executed refactoring that decouples the distillation method from the model specifics, introduces a flexible YAML-based configuration system, and adds robust checkpointing and resume capabilities. The new structure with Trainer, Method, Adapter, and Builder is clean and promotes reusability.

I've added a couple of minor suggestions to improve code style and maintainability. Overall, this is a great contribution that significantly improves the distillation framework.

gemini-code-assist · 2026-02-22T22:38:13Z

fastvideo/distillation/builder.py

+                    checkpointing_type=training_args.
+                    enable_gradient_checkpointing_type,


This line break in the middle of an attribute access is a bit unusual and harms readability. It's better to keep training_args.enable_gradient_checkpointing_type on a single line.

Suggested change

checkpointing_type=training_args.

enable_gradient_checkpointing_type,

checkpointing_type=training_args.enable_gradient_checkpointing_type,

fastvideo/distillation/adapters/wan.py

fastvideo/distillation/validators/wan.py

fastvideo/distillation/adapters/wan.py

… phase 3

alexzms · 2026-02-24T02:17:00Z

Up to this version, only validator is not decoupled and decoupling will require to rewrite WanPipeline to accept mode of sde/ode. Planned to be done during phase3.

alexzms · 2026-02-24T03:11:34Z

https://wandb.ai/alexzms-ucsd/phase2.9_wan_dmd2_8steps_wansyn/runs/2rwoplxd

alexzms added 30 commits February 21, 2026 07:17

try to read and design

ffec3b1

read fastgen

98a1d53

designing

ace0cac

phase 0

b3f9faa

progressing phase 0

b936be8

phase 0 should be done

a8bada4

phase0 warning

c4c2d89

validation in method

5720ae9

scripts for testing phase0

c8da42e

temp launch

0c1bd0d

phase 1 design

97fa792

progressing phase 1

d6ecdad

phase 1 init impl

7b2d8e5

general distill endpoint

6134721

distillation

4a88606

temporary run script

ce22aea

random generator fix

d20753b

Phase 1 works very well on training.

e36507b

dmd2 adapter comments

bd24192

removing phase 0 dependency

b9590f8

design phase 2

8461d68

designing phase 2: config

c9681ce

designing phase 2: config 2

889c1c5

progressing phase 2

7431e95

progressing phase 2. 2

f8029ad

phase 2 init impl

cef57ef

phase 2 config. training code

7f14865

remove all legacy dependency

99acff3

fix gpu num

0a3be30

ckpt manager for phase 2

225be11

alexzms requested review from JerryZhou54 and jzhang38 February 22, 2026 22:36

alexzms added the go Trigger Buildkite CI label Feb 22, 2026

gemini-code-assist bot reviewed Feb 22, 2026

View reviewed changes

alexzms added 2 commits February 22, 2026 23:02

config design

b58d5cd

designing phase 3

7d20269

jzhang38 reviewed Feb 23, 2026

View reviewed changes

fastvideo/distillation/validators/wan.py Show resolved Hide resolved

jzhang38 reviewed Feb 23, 2026

View reviewed changes

fastvideo/distillation/adapters/wan.py Show resolved Hide resolved

alexzms added 13 commits February 23, 2026 00:40

designing phase 2.9: decoupling adapter

02e948b

designing phase 2.9: explain why families registry

11709a4

phase 2.9 init impl

1e1e11f

wan adapter decouple

70157b4

removing dmd in wan adapters

f12732c

phase2.9: adapter ang families decouple from dmd

8853110

doc for every file

074f559

validation decouple from dmd and role

baac257

fix circular import and designing phase 2.9: validation config

c3c75c2

validator still use WanDMDPipeline. Future decoupling will be done in…

601da9d

… phase 3

phase 2.9 config

a8e728b

remove dev and doc

ebfa113

sheidewenti

a86cda7

alexzms force-pushed the distill-phase1+2 branch from b381383 to a86cda7 Compare February 24, 2026 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[distill][phase1-2] Decouple DMD2 from Wan + YAML training args + checkpoint/resume#1122

[distill][phase1-2] Decouple DMD2 from Wan + YAML training args + checkpoint/resume#1122
alexzms wants to merge 45 commits intohao-ai-lab:mainfrom
FoundationResearch:distill-phase1+2

alexzms commented Feb 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 22, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexzms commented Feb 24, 2026

Uh oh!

alexzms commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		checkpointing_type=training_args.
		enable_gradient_checkpointing_type,

	checkpointing_type=training_args.
	enable_gradient_checkpointing_type,
	checkpointing_type=training_args.enable_gradient_checkpointing_type,

Conversation

alexzms commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1) Motivation

2) Phase 1: Decouple DistillMethod

What Phase 1 changes

Key outcomes

3) Phase 2: YAML-only entrypoint + builder runtime + fully decouple

New entrypoint (YAML-only)

YAML config loading + typed spec

Builder that instantiates runtime from roles + method

Validation without legacy pipelines

Checkpoint save/resume (new system)

“outside/” config tree (non-invasive)

4) What’s done vs. not done

✅ Done in this PR (Phase 1 + 2)

❌ Not done yet (planned follow-ups)

Tests / Evidence

Feedback and suggestions are highly welcome!

Uh oh!

gemini-code-assist bot commented Feb 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexzms commented Feb 24, 2026

Uh oh!

alexzms commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexzms commented Feb 22, 2026 •

edited

Loading