Align ORPO with DPO: support iterable and dict eval datasets#6230
Open
DaoyuanLi2816 wants to merge 2 commits into
Open
Align ORPO with DPO: support iterable and dict eval datasets#6230DaoyuanLi2816 wants to merge 2 commits into
DaoyuanLi2816 wants to merge 2 commits into
Conversation
`ORPOTrainer.__init__` prepared datasets with `.map(..., num_proc=...)`, which fails on two inputs that `DPOTrainer` already supports: - `IterableDataset`: `IterableDataset.map()` does not accept `num_proc`, so training on a streaming dataset raised `TypeError`. Streaming batches also keep the raw (string) columns, which `accelerate`'s `find_batch_size` rejects when iterating the dataloader; those columns are now dropped during tokenization for iterable datasets (regular datasets keep them, as `generate_during_eval` relies on them). - dict `eval_dataset`: `.map()` was called on the dict itself, raising `AttributeError`. Each dataset in the dict is now prepared individually. Dataset preparation is factored into a `_prepare_dataset` helper, mirroring `DPOTrainer`. Adds a test for each case.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit fead16e. Configure here.
Mirror the guard already present in DPO/KTO/TPO/Reward/SFT: when training on an `IterableDataset`, Accelerate's dispatch mode may concatenate batches across processes and mis-batch data, so `dispatch_batches` is forced to `False` (with a warning if the user explicitly set it to `True`).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

ORPOTrainerprepares its datasets in__init__with.map(..., num_proc=...), which fails on two inputs thatDPOTraineralready accepts:1.
IterableDataset(streaming).IterableDataset.map()does not acceptnum_proc, so training on a streaming dataset fails immediately:2. dict
eval_dataset. Passing multiple eval datasets as a dict calls.map()on the dict itself:Dataset preparation is factored into a
_prepare_datasethelper (mirroringDPOTrainer):num_procis only passed for map-styleDatasets, and a dicteval_datasetis prepared per key. For iterable datasets the raw (string) columns are dropped during tokenization, because streaming batches are passed throughaccelerate'sfind_batch_size, which only handles tensors. Regular datasets keep those columns sincegenerate_during_evalrelies on them (and that path — which needsselect/len— isn't available for iterable datasets anyway).Verification
test_train_with_iterable_dataset(streaming=True) andtest_train_with_multiple_eval_dataset(dicteval_dataset). Both fail onmainwith the errors above and pass with this change.tests/experimental/test_orpo_trainer.pypasses locally (RTX 4080);ruff check/ruff format --checkclean.Note
Low Risk
Changes are confined to dataset preparation and Accelerate config defaults in ORPOTrainer, with regression tests; core ORPO loss/training logic is unchanged.
Overview
ORPOTrainer dataset setup is refactored to match DPOTrainer: preparation lives in
_prepare_dataset, which only passesnum_procfor map-styleDatasets so streamingIterableDatasettraining no longer breaks on.map(..., num_proc=...).For iterable training,
dispatch_batchesis forced toFalse(with a warning if it wasTrue), and tokenization drops raw string columns so Accelerate’s batch handling only sees tensors. Eval now accepts adictof datasets; each entry is prepared separately instead of calling.mapon the dict.Tests cover streaming train and multi-key eval (
eval_data1_loss/eval_data2_lossin logs).Reviewed by Cursor Bugbot for commit 13bdeaf. Bugbot is set up for automated code reviews on this repo. Configure here.