Fix prompt path resolution when CWD differs from project root#474
Merged
simonrosenberg merged 3 commits intomainfrom Mar 3, 2026
Merged
Fix prompt path resolution when CWD differs from project root#474simonrosenberg merged 3 commits intomainfrom
simonrosenberg merged 3 commits intomainfrom
Conversation
The old code used `p.relative_to(Path.cwd())` which raises ValueError when the process is launched from a different directory than the project root. Replace with `str(p)` (absolute paths) and extract the duplicated logic into `add_prompt_path_argument()` in args_parser.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Solves Real Problem, but with Trade-offs
Good: Fixes actual bug (ValueError when CWD ≠ project root) and eliminates code duplication.
Issues:
- UX regression: Help shows ugly absolute paths instead of clean relative ones
- Breaking change risk: Existing users passing relative paths will fail validation
- Missing tests: Bug fix without regression tests
See inline comments for specifics.
- Use bare filenames in --help (via metavar) instead of absolute paths - Accept bare filenames, relative paths, and absolute paths via custom type function for backwards compatibility - Drop argparse choices in favor of runtime validation with clear error messages listing available templates - Add tests/test_prompt_path.py with 7 test cases covering: default value, bare filename, absolute path, different CWD (the original bug), invalid template, missing default.j2, and relative path resolution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
all-hands-bot
approved these changes
Mar 3, 2026
Collaborator
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ Clean refactor that fixes a real bug
Eliminates 84 lines of duplication while fixing ValueError when CWD ≠ project root. Path resolution logic is pragmatic, tests cover real scenarios (not mocks), and backwards compatibility is preserved.
Previous review concerns (UX regression, breaking changes) were properly addressed. Ready to ship.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
run_infer.pyfiles. The old code usedp.relative_to(Path.cwd())which raisesValueErrorwhen the process is launched from a directory other than the project root (e.g. when running as an installed package via NeMo Evaluator).add_prompt_path_argument()utility inbenchmarks/utils/args_parser.py, replacing ~14 identical lines in each file with a single function call.Extracted from #455 (NeMo Evaluator Integration).
Files changed
benchmarks/utils/args_parser.pyadd_prompt_path_argument(parser, caller_file)functionbenchmarks/swebench/run_infer.pybenchmarks/swtbench/run_infer.pybenchmarks/swebenchmultimodal/run_infer.pybenchmarks/swefficiency/run_infer.pybenchmarks/multiswebench/run_infer.pybenchmarks/commit0/run_infer.pyBefore / After
Validation
All benchmarks pass on the
nemo-evaluatorbranch (which includes this change):Test plan
--prompt-pathargument still works when running from the project root--prompt-pathargument works when running from a different directory (the bug this fixes)--helpstill lists available prompt templates🤖 Generated with Claude Code