Skip to content

Add ORPOTrainer tests to align coverage with DPO#6229

Open
DaoyuanLi2816 wants to merge 1 commit into
huggingface:mainfrom
DaoyuanLi2816:test/orpo-coverage
Open

Add ORPOTrainer tests to align coverage with DPO#6229
DaoyuanLi2816 wants to merge 1 commit into
huggingface:mainfrom
DaoyuanLi2816:test/orpo-coverage

Conversation

@DaoyuanLi2816

@DaoyuanLi2816 DaoyuanLi2816 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

ORPOTrainer has thin test coverage (5 tests) compared to DPOTrainer (~40). This ports a set of DPO's generic trainer tests to ORPO, in the spirit of the maintainers' "Align KTO with DPO: Add tests" work (e.g. #6034, #6160):

  • test_train_model_dtype — training with model_init_kwargs={"dtype": torch.float16} keeps the trained params in fp16.
  • test_train_with_eval — with eval_strategy="steps", an eval_loss is logged.
  • test_train_with_gradient_checkpointing — regression guard for the (default-on) gradient-checkpointing path.
  • test_tag_added / test_tag_added_peft — the ["orpo", "trl"] model-card tags are set (plain and with PEFT).

All pass on a single GPU (RTX 4080); ruff check / ruff format --check clean.

Note: while porting I found two DPO tests that don't apply yet because ORPOTrainer doesn't support the feature — training on an IterableDataset (__init__ calls .map(num_proc=...), which IterableDataset doesn't accept) and a dict eval_dataset (__init__ calls .map() on the dict). I left those out; aligning those two would need small __init__ changes and could be a separate follow-up.


Note

Low Risk
Test-only changes in the experimental ORPO test module; no runtime or training logic modified.

Overview
Expands ORPOTrainer integration tests in tests/experimental/test_orpo_trainer.py so coverage matches the pattern used for DPO / KTO (e.g. prior “align KTO with DPO” work). No trainer or library behavior is changed—only new regression tests.

Added cases cover fp16 training via model_init_kwargs, step-based eval (eval_loss in logs), gradient checkpointing (params still update), and model-card tags orpo and trl for full finetune and PEFT setups. Iterable-dataset and dict eval_dataset DPO-style tests were intentionally omitted because ORPOTrainer.__init__ does not support those paths yet.

Reviewed by Cursor Bugbot for commit 0c26e90. Bugbot is set up for automated code reviews on this repo. Configure here.

ORPOTrainer had thin test coverage (5 tests) vs DPO (~40). Port DPO's generic
trainer tests: test_train_model_dtype, test_train_with_eval,
test_train_with_gradient_checkpointing, test_tag_added, test_tag_added_peft.
All pass on a single GPU. (IterableDataset and dict eval_dataset tests were
intentionally not ported: ORPOTrainer.__init__ doesn't yet support those, unlike
DPO -- left as a possible follow-up.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant