Skip to content

Align KTO with DPO: Support tool calling#6259

Open
qgallouedec wants to merge 2 commits into
mainfrom
kto-tool-calling-support
Open

Align KTO with DPO: Support tool calling#6259
qgallouedec wants to merge 2 commits into
mainfrom
kto-tool-calling-support

Conversation

@qgallouedec

@qgallouedec qgallouedec commented Jul 2, 2026

Copy link
Copy Markdown
Member

KTOTrainer ignored the tools column during tokenization, so tool schemas were never rendered into the prompt. Tool-calling datasets trained as if no tools were defined.

The existing test_train_toolcall_data (mirrors DPO's, uses trl-internal-testing/toolcall) now genuinely exercises tool rendering; it passes.

Matches DPO's behavior: tool calling is supported on the text path; the vision collator does not wire tools through in either trainer.

#4786


Note

Low Risk
Small change to KTO preprocessing only; it corrects tokenization for datasets that already carry a tools column and does not touch loss, reference model, or auth paths.

Overview
KTO text-path training now renders tool schemas in prompts, matching DPO behavior for tool-calling datasets.

During dataset tokenization, KTOTrainer reads each example’s optional tools field (JSON-parsing when stored as a string) and forwards it into apply_chat_template for both the generation prompt and the full prompt+completion sequence. Previously those columns were ignored, so KTO trained on tool data as if no tools existed.

For vision datasets, "tools" is added to the trainer’s signature columns so it is retained when unused columns are removed (the vision collator still does not wire tools through, same as DPO).

Reviewed by Cursor Bugbot for commit c198c65. Bugbot is set up for automated code reviews on this repo. Configure here.

@bot-ci-comment

bot-ci-comment Bot commented Jul 2, 2026

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant