Skip to content

Conversation

@jveronvialard
Copy link
Contributor

What does this PR do ?

For validation preference datasets only, you can specify the validation loss function to use: PreferenceLoss (default, also used as training loss), PreferredAmongNAccuracy (one response is preferred among N responses), or RewardBestofNValue (weighted average of the response with the highest reward score among the first i responses, for i=1, ..., N).

Usage

uv run ./examples/run_rm.py \
   ...
   ++data.train_data_path=<PathToTrainingDataset> \
   ++data.val_data_paths.<NameOfValidationDataset1>=<PathToValidationDataset1> \
   ++data.val_loss_fns.<NameOfValidationDataset1>=<LossFunctionName>

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants