Skip to content

Conversation

@Andrewzh112
Copy link
Contributor

The do_group_rollout_and_filter_constant_reward sampling step requires a temperature parameter, right now it is not specified in distillation/train_on_policy.py so the runs will fail. Added the temperature parameter to config and specifying it in do_group_rollout_and_filter_constant_reward for distillation/train_on_policy.py and the recipes fixes the problem.

…quires a temperature parameter, right now it is not specified so the run will fail. Added this parameter for distillation/train_on_policy.py and the recipes
@joschu
Copy link
Collaborator

joschu commented Nov 23, 2025

@claude please merge in main then review

@claude
Copy link

claude bot commented Nov 23, 2025

Claude encountered an error —— View job

Failed with exit code 128

I'll analyze this and get back to you.

@joschu joschu merged commit 41f93a8 into thinking-machines-lab:main Nov 23, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants