fix: on-policy distillation missing temperature parameter fix #103

Andrewzh112 · 2025-11-20T08:11:41Z

The do_group_rollout_and_filter_constant_reward sampling step requires a temperature parameter, right now it is not specified in distillation/train_on_policy.py so the runs will fail. Added the temperature parameter to config and specifying it in do_group_rollout_and_filter_constant_reward for distillation/train_on_policy.py and the recipes fixes the problem.

…quires a temperature parameter, right now it is not specified so the run will fail. Added this parameter for distillation/train_on_policy.py and the recipes

joschu · 2025-11-23T20:07:32Z

@claude please merge in main then review

claude · 2025-11-23T20:07:45Z

Claude encountered an error —— View job

Failed with exit code 128

I'll analyze this and get back to you.

fix: the do_group_rollout_and_filter_constant_reward sampling step re…

21a6648

…quires a temperature parameter, right now it is not specified so the run will fail. Added this parameter for distillation/train_on_policy.py and the recipes

Merge remote-tracking branch 'origin/main' into pr-103

61c0382

joschu approved these changes Nov 23, 2025

View reviewed changes

joschu merged commit 41f93a8 into thinking-machines-lab:main Nov 23, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: on-policy distillation missing temperature parameter fix #103

fix: on-policy distillation missing temperature parameter fix #103

Uh oh!

Andrewzh112 commented Nov 20, 2025

Uh oh!

joschu commented Nov 23, 2025

Uh oh!

claude bot commented Nov 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: on-policy distillation missing temperature parameter fix #103

fix: on-policy distillation missing temperature parameter fix #103

Uh oh!

Conversation

Andrewzh112 commented Nov 20, 2025

Uh oh!

joschu commented Nov 23, 2025

Uh oh!

claude bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Nov 23, 2025 •

edited

Loading