On-policy Distillation (Qwen3-style) #1254

euronymous-aithal · 2025-10-02T05:16:55Z

euronymous-aithal
Oct 2, 2025
Maintainer

Student generates on-policy sequences and aligns logits to a larger teacher via KL, achieving near-larger-model quality at lower cost than RL. See On-policy Distillation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On-policy Distillation (Qwen3-style) #1254

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

On-policy Distillation (Qwen3-style) #1254

Uh oh!

euronymous-aithal Oct 2, 2025 Maintainer

Replies: 0 comments

euronymous-aithal
Oct 2, 2025
Maintainer