feat: Add AceMathRL recipe #1484
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Adds support for the AceReason-Math dataset with a GRPO training recipe for 7B models with 16K context length.
This PR introduces a new dataset adapter for
nvidia/AceReason-Math, a comprehensive GRPO training configuration for DeepSeek-R1-Distill-Qwen-7B, and corresponding test infrastructure for validating the training pipeline.Issues
List issues that this PR closes (syntax):
Usage
Training with the AceReason-Math dataset:
Running GRPO training with the new recipe:
uv run examples/run_grpo_math.py \ --config examples/configs/recipes/llm/grpo-acereason-math-7b-16K.yaml \ grpo.max_num_steps=1000 \ logger.wandb_enabled=TrueChanges in this PR
New Dataset Adapter:
nemo_rl/data/datasets/response_datasets/acereason_math.pyAceReasonMathDatasetclass with train/validation splitsnvidia/AceReason-Mathfor training andHuggingFaceH4/aime_2024for validationGRPO Recipe:
examples/configs/recipes/llm/grpo-acereason-math-7b-16K.yamlPrompt Template:
examples/prompts/acemath_qwen_cot.txt\boxed{}Test Suite:
tests/test_suites/llm/grpo-acereason-math-7b-16K.shBefore your PR is "Ready for review"
Pre checks:
Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.
Additional Information