Add ORPOTrainer tests to align coverage with DPO by DaoyuanLi2816 · Pull Request #6229 · huggingface/trl

DaoyuanLi2816 · 2026-07-01T02:33:04Z

ORPOTrainer has thin test coverage (5 tests) compared to DPOTrainer (~40). This ports a set of DPO's generic trainer tests to ORPO, in the spirit of the maintainers' "Align KTO with DPO: Add tests" work (e.g. #6034, #6160):

test_train_model_dtype — training with model_init_kwargs={"dtype": torch.float16} keeps the trained params in fp16.
test_train_with_eval — with eval_strategy="steps", an eval_loss is logged.
test_train_with_gradient_checkpointing — regression guard for the (default-on) gradient-checkpointing path.
test_tag_added / test_tag_added_peft — the ["orpo", "trl"] model-card tags are set (plain and with PEFT).

All pass on a single GPU (RTX 4080); ruff check / ruff format --check clean.

Note: while porting I found two DPO tests that don't apply yet because ORPOTrainer doesn't support the feature — training on an IterableDataset (__init__ calls .map(num_proc=...), which IterableDataset doesn't accept) and a dict eval_dataset (__init__ calls .map() on the dict). I left those out; aligning those two would need small __init__ changes and could be a separate follow-up.

Note

Low Risk
Test-only changes in the experimental ORPO test module; no runtime or training logic modified.

Overview
Expands ORPOTrainer integration tests in tests/experimental/test_orpo_trainer.py so coverage matches the pattern used for DPO / KTO (e.g. prior “align KTO with DPO” work). No trainer or library behavior is changed—only new regression tests.

Added cases cover fp16 training via model_init_kwargs, step-based eval (eval_loss in logs), gradient checkpointing (params still update), and model-card tags orpo and trl for full finetune and PEFT setups. Iterable-dataset and dict eval_dataset DPO-style tests were intentionally omitted because ORPOTrainer.__init__ does not support those paths yet.

^{Reviewed by Cursor Bugbot for commit 0c26e90. Bugbot is set up for automated code reviews on this repo. Configure here.}

ORPOTrainer had thin test coverage (5 tests) vs DPO (~40). Port DPO's generic trainer tests: test_train_model_dtype, test_train_with_eval, test_train_with_gradient_checkpointing, test_tag_added, test_tag_added_peft. All pass on a single GPU. (IterableDataset and dict eval_dataset tests were intentionally not ported: ORPOTrainer.__init__ doesn't yet support those, unlike DPO -- left as a possible follow-up.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ORPOTrainer tests to align coverage with DPO#6229

Add ORPOTrainer tests to align coverage with DPO#6229
DaoyuanLi2816 wants to merge 1 commit into
huggingface:mainfrom
DaoyuanLi2816:test/orpo-coverage

DaoyuanLi2816 commented Jul 1, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DaoyuanLi2816 commented Jul 1, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DaoyuanLi2816 commented Jul 1, 2026 •

edited by cursor Bot

Loading