feat: additional validation losses for preference data #1367

jveronvialard · 2025-10-15T18:30:51Z

What does this PR do ?

For validation preference datasets only, you can specify the validation loss function to use: PreferenceLoss (default, also used as training loss), PreferredAmongNAccuracy (one response is preferred among N responses), or RewardBestofNValue (weighted average of the response with the highest reward score among the first i responses, for i=1, ..., N).

Usage

uv run ./examples/run_rm.py \
   ...
   ++data.train_data_path=<PathToTrainingDataset> \
   ++data.val_data_paths.<NameOfValidationDataset1>=<PathToValidationDataset1> \
   ++data.val_loss_fns.<NameOfValidationDataset1>=<LossFunctionName>

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Signed-off-by: Julien Veron Vialard <[email protected]>

…m-losses

Signed-off-by: Julien Veron Vialard <[email protected]>

jveronvialard added 3 commits October 15, 2025 10:28

adding per val set loss, PreferredAmongNAccuracy, and RewardBestofNValue

41db360

Signed-off-by: Julien Veron Vialard <[email protected]>

Merge branch 'main' of github.com:NVIDIA-NeMo/RL into jveronvialard/r…

69f179c

…m-losses

update docs

e4373e0

Signed-off-by: Julien Veron Vialard <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Oct 15, 2025

jveronvialard requested a review from odelalleau October 15, 2025 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: additional validation losses for preference data #1367

feat: additional validation losses for preference data #1367

Uh oh!

jveronvialard commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: additional validation losses for preference data #1367

Are you sure you want to change the base?

feat: additional validation losses for preference data #1367

Uh oh!

Conversation

jveronvialard commented Oct 15, 2025

What does this PR do ?

Usage

Before your PR is "Ready for review"

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants