Align ORPO with DPO: support iterable and dict eval datasets by DaoyuanLi2816 · Pull Request #6230 · huggingface/trl

DaoyuanLi2816 · 2026-07-01T03:58:40Z

ORPOTrainer prepares its datasets in __init__ with .map(..., num_proc=...), which fails on two inputs that DPOTrainer already accepts:

1. IterableDataset (streaming). IterableDataset.map() does not accept num_proc, so training on a streaming dataset fails immediately:

TypeError: IterableDataset.map() got an unexpected keyword argument 'num_proc'

2. dict eval_dataset. Passing multiple eval datasets as a dict calls .map() on the dict itself:

AttributeError: 'dict' object has no attribute 'map'

Dataset preparation is factored into a _prepare_dataset helper (mirroring DPOTrainer): num_proc is only passed for map-style Datasets, and a dict eval_dataset is prepared per key. For iterable datasets the raw (string) columns are dropped during tokenization, because streaming batches are passed through accelerate's find_batch_size, which only handles tensors. Regular datasets keep those columns since generate_during_eval relies on them (and that path — which needs select / len — isn't available for iterable datasets anyway).

Verification

Added test_train_with_iterable_dataset (streaming=True) and test_train_with_multiple_eval_dataset (dict eval_dataset). Both fail on main with the errors above and pass with this change.
Full tests/experimental/test_orpo_trainer.py passes locally (RTX 4080); ruff check / ruff format --check clean.

Note

Low Risk
Changes are confined to dataset preparation and Accelerate config defaults in ORPOTrainer, with regression tests; core ORPO loss/training logic is unchanged.

Overview
ORPOTrainer dataset setup is refactored to match DPOTrainer: preparation lives in _prepare_dataset, which only passes num_proc for map-style Datasets so streaming IterableDataset training no longer breaks on .map(..., num_proc=...).

For iterable training, dispatch_batches is forced to False (with a warning if it was True), and tokenization drops raw string columns so Accelerate’s batch handling only sees tensors. Eval now accepts a dict of datasets; each entry is prepared separately instead of calling .map on the dict.

Tests cover streaming train and multi-key eval (eval_data1_loss / eval_data2_loss in logs).

^{Reviewed by Cursor Bugbot for commit 13bdeaf. Bugbot is set up for automated code reviews on this repo. Configure here.}

`ORPOTrainer.__init__` prepared datasets with `.map(..., num_proc=...)`, which fails on two inputs that `DPOTrainer` already supports: - `IterableDataset`: `IterableDataset.map()` does not accept `num_proc`, so training on a streaming dataset raised `TypeError`. Streaming batches also keep the raw (string) columns, which `accelerate`'s `find_batch_size` rejects when iterating the dataloader; those columns are now dropped during tokenization for iterable datasets (regular datasets keep them, as `generate_during_eval` relies on them). - dict `eval_dataset`: `.map()` was called on the dict itself, raising `AttributeError`. Each dataset in the dict is now prepared individually. Dataset preparation is factored into a `_prepare_dataset` helper, mirroring `DPOTrainer`. Adds a test for each case.

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.

^{Reviewed by Cursor Bugbot for commit fead16e. Configure here.}

Mirror the guard already present in DPO/KTO/TPO/Reward/SFT: when training on an `IterableDataset`, Accelerate's dispatch mode may concatenate batches across processes and mis-batch data, so `dispatch_batches` is forced to `False` (with a warning if the user explicitly set it to `True`).

cursor Bot reviewed Jul 1, 2026

View reviewed changes

Comment thread trl/experimental/orpo/orpo_trainer.py

DaoyuanLi2816 mentioned this pull request Jul 1, 2026

Fix dataset fingerprinting in experimental KTO and TPO trainers #6233

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Align ORPO with DPO: support iterable and dict eval datasets#6230

Align ORPO with DPO: support iterable and dict eval datasets#6230
DaoyuanLi2816 wants to merge 2 commits into
huggingface:mainfrom
DaoyuanLi2816:fix/orpo-iterable-and-dict-eval

DaoyuanLi2816 commented Jul 1, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DaoyuanLi2816 commented Jul 1, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DaoyuanLi2816 commented Jul 1, 2026 •

edited by cursor Bot

Loading