Skip to content

Raise a clear error when GKD student and teacher vocab sizes differ#6252

Merged
sergiopaniego merged 1 commit into
mainfrom
gkd-clear-vocab-size-error
Jul 3, 2026
Merged

Raise a clear error when GKD student and teacher vocab sizes differ#6252
sergiopaniego merged 1 commit into
mainfrom
gkd-clear-vocab-size-error

Conversation

@sergiopaniego

@sergiopaniego sergiopaniego commented Jul 2, 2026

Copy link
Copy Markdown
Member

What does this PR do?

White-box logit KD in GKDTrainer requires the student and teacher to share vocab_size. When they differ, training failed with a cryptic broadcast error in kl_div (e.g. Qwen2.5-0.5B is 151936 but Qwen2.5-7B is 152064). GKDTrainer now validates this at init and raises a clear error pointing to GOLD for cross-tokenizer distillation. DistillationTrainer already handles this by resizing the teacher embeddings.

Before submitting

AI writing disclosure

  • AI-assisted: some parts were suggested or improved by AI, but the PR was written and reviewed by a human.

Note

Low Risk
Init-time validation only; no change to training logic when vocab sizes already match.

Overview
GKDTrainer now checks that the student and teacher share the same vocab_size right after the teacher is loaded, instead of failing later with an obscure kl_div broadcast error when logits shapes disagree.

On mismatch it raises a ValueError that states both sizes, notes that GKD needs a shared vocabulary for full next-token distributions, and suggests matching teachers or GOLD for cross-tokenizer distillation (unlike DistillationTrainer, which can resize teacher embeddings).

Reviewed by Cursor Bugbot for commit 408920f. Bugbot is set up for automated code reviews on this repo. Configure here.

@bot-ci-comment

bot-ci-comment Bot commented Jul 2, 2026

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sergiopaniego sergiopaniego force-pushed the gkd-clear-vocab-size-error branch from 9e2c4d7 to 408920f Compare July 2, 2026 15:53
@sergiopaniego sergiopaniego requested a review from kashif July 2, 2026 16:02
@sergiopaniego sergiopaniego merged commit 012dc77 into main Jul 3, 2026
5 checks passed
@sergiopaniego sergiopaniego deleted the gkd-clear-vocab-size-error branch July 3, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants