Skip to content

Conversation

@v-shobhit
Copy link
Contributor

In the future, benchmarks (like gpt-oss) may have separate perf and accuracy datasets

This PR adds a separate config field, accuracy_sample_count, to set the number of samples in the acc eval dataset - separate from the existing performance_sample_count which will be used for the size of the perf eval dataset.

This new field defaults to performance_sample_count for backwards compatibility.

@v-shobhit v-shobhit requested a review from a team as a code owner December 17, 2025 13:15
@github-actions
Copy link
Contributor

github-actions bot commented Dec 17, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@v-shobhit v-shobhit force-pushed the shobhitv/acc_sample_count branch from 865c33b to 3f2f719 Compare December 17, 2025 13:37
@nvzhihanj
Copy link
Contributor

@pgmpablo157321 @tanvi-mlcommons @mrmhodak please help review this PR - the accuracy sample count is something new we add to separate the accuracy and performance test dataset. Can you help review and suggest what else is needed for this feature?

@mrmhodak
Copy link
Contributor

@pgmpablo157321: Please take a look to see if you agree with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants