Skip to content

Align ORPO with DPO: support iterable and dict eval datasets#6230

Open
DaoyuanLi2816 wants to merge 2 commits into
huggingface:mainfrom
DaoyuanLi2816:fix/orpo-iterable-and-dict-eval
Open

Align ORPO with DPO: support iterable and dict eval datasets#6230
DaoyuanLi2816 wants to merge 2 commits into
huggingface:mainfrom
DaoyuanLi2816:fix/orpo-iterable-and-dict-eval

Conversation

@DaoyuanLi2816

@DaoyuanLi2816 DaoyuanLi2816 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

ORPOTrainer prepares its datasets in __init__ with .map(..., num_proc=...), which fails on two inputs that DPOTrainer already accepts:

1. IterableDataset (streaming). IterableDataset.map() does not accept num_proc, so training on a streaming dataset fails immediately:

TypeError: IterableDataset.map() got an unexpected keyword argument 'num_proc'

2. dict eval_dataset. Passing multiple eval datasets as a dict calls .map() on the dict itself:

AttributeError: 'dict' object has no attribute 'map'

Dataset preparation is factored into a _prepare_dataset helper (mirroring DPOTrainer): num_proc is only passed for map-style Datasets, and a dict eval_dataset is prepared per key. For iterable datasets the raw (string) columns are dropped during tokenization, because streaming batches are passed through accelerate's find_batch_size, which only handles tensors. Regular datasets keep those columns since generate_during_eval relies on them (and that path — which needs select / len — isn't available for iterable datasets anyway).

Verification

  • Added test_train_with_iterable_dataset (streaming=True) and test_train_with_multiple_eval_dataset (dict eval_dataset). Both fail on main with the errors above and pass with this change.
  • Full tests/experimental/test_orpo_trainer.py passes locally (RTX 4080); ruff check / ruff format --check clean.

Note

Low Risk
Changes are confined to dataset preparation and Accelerate config defaults in ORPOTrainer, with regression tests; core ORPO loss/training logic is unchanged.

Overview
ORPOTrainer dataset setup is refactored to match DPOTrainer: preparation lives in _prepare_dataset, which only passes num_proc for map-style Datasets so streaming IterableDataset training no longer breaks on .map(..., num_proc=...).

For iterable training, dispatch_batches is forced to False (with a warning if it was True), and tokenization drops raw string columns so Accelerate’s batch handling only sees tensors. Eval now accepts a dict of datasets; each entry is prepared separately instead of calling .map on the dict.

Tests cover streaming train and multi-key eval (eval_data1_loss / eval_data2_loss in logs).

Reviewed by Cursor Bugbot for commit 13bdeaf. Bugbot is set up for automated code reviews on this repo. Configure here.

`ORPOTrainer.__init__` prepared datasets with `.map(..., num_proc=...)`, which
fails on two inputs that `DPOTrainer` already supports:

- `IterableDataset`: `IterableDataset.map()` does not accept `num_proc`, so
  training on a streaming dataset raised `TypeError`. Streaming batches also
  keep the raw (string) columns, which `accelerate`'s `find_batch_size` rejects
  when iterating the dataloader; those columns are now dropped during
  tokenization for iterable datasets (regular datasets keep them, as
  `generate_during_eval` relies on them).
- dict `eval_dataset`: `.map()` was called on the dict itself, raising
  `AttributeError`. Each dataset in the dict is now prepared individually.

Dataset preparation is factored into a `_prepare_dataset` helper, mirroring
`DPOTrainer`. Adds a test for each case.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit fead16e. Configure here.

Comment thread trl/experimental/orpo/orpo_trainer.py
Mirror the guard already present in DPO/KTO/TPO/Reward/SFT: when training on
an `IterableDataset`, Accelerate's dispatch mode may concatenate batches across
processes and mis-batch data, so `dispatch_batches` is forced to `False` (with a
warning if the user explicitly set it to `True`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant