Fix prompt path resolution when CWD differs from project root by simonrosenberg · Pull Request #474 · OpenHands/benchmarks

simonrosenberg · 2026-03-03T09:44:14Z

Summary

Fixes prompt template path resolution in all 6 run_infer.py files. The old code used p.relative_to(Path.cwd()) which raises ValueError when the process is launched from a directory other than the project root (e.g. when running as an installed package via NeMo Evaluator).
Extracts duplicated logic into a shared add_prompt_path_argument() utility in benchmarks/utils/args_parser.py, replacing ~14 identical lines in each file with a single function call.

Extracted from #455 (NeMo Evaluator Integration).

Files changed

File	Change
`benchmarks/utils/args_parser.py`	New `add_prompt_path_argument(parser, caller_file)` function
`benchmarks/swebench/run_infer.py`	Use shared utility
`benchmarks/swtbench/run_infer.py`	Use shared utility
`benchmarks/swebenchmultimodal/run_infer.py`	Use shared utility
`benchmarks/swefficiency/run_infer.py`	Use shared utility
`benchmarks/multiswebench/run_infer.py`	Use shared utility
`benchmarks/commit0/run_infer.py`	Use shared utility

Before / After

# Before (broken when CWD != project root):
prompt_dir = (Path(__file__).parent / "prompts").resolve()
choices = [str(p.relative_to(Path.cwd())) for p in prompt_dir.glob("*.j2")]
default_prompt_path = prompt_dir / "default.j2"
assert default_prompt_path.exists(), ...
parser.add_argument("--prompt-path", ..., choices=choices, default=str(default_prompt_path))

# After (single line per benchmark):
parser = get_parser()
add_prompt_path_argument(parser, __file__)

Validation

All benchmarks pass on the nemo-evaluator branch (which includes this change):

Test plan

Verify --prompt-path argument still works when running from the project root
Verify --prompt-path argument works when running from a different directory (the bug this fixes)
Verify --help still lists available prompt templates

🤖 Generated with Claude Code

The old code used `p.relative_to(Path.cwd())` which raises ValueError when the process is launched from a different directory than the project root. Replace with `str(p)` (absolute paths) and extract the duplicated logic into `add_prompt_path_argument()` in args_parser.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

all-hands-bot

🟡 Acceptable - Solves Real Problem, but with Trade-offs

Good: Fixes actual bug (ValueError when CWD ≠ project root) and eliminates code duplication.

Issues:

UX regression: Help shows ugly absolute paths instead of clean relative ones
Breaking change risk: Existing users passing relative paths will fail validation
Missing tests: Bug fix without regression tests

See inline comments for specifics.

benchmarks/utils/args_parser.py

- Use bare filenames in --help (via metavar) instead of absolute paths - Accept bare filenames, relative paths, and absolute paths via custom type function for backwards compatibility - Drop argparse choices in favor of runtime validation with clear error messages listing available templates - Add tests/test_prompt_path.py with 7 test cases covering: default value, bare filename, absolute path, different CWD (the original bug), invalid template, missing default.j2, and relative path resolution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

all-hands-bot

✅ Clean refactor that fixes a real bug

Eliminates 84 lines of duplication while fixing ValueError when CWD ≠ project root. Path resolution logic is pragmatic, tests cover real scenarios (not mocks), and backwards compatibility is preserved.

Previous review concerns (UX regression, breaking changes) were properly addressed. Ready to ship.

all-hands-bot reviewed Mar 3, 2026

View reviewed changes

benchmarks/utils/args_parser.py Outdated Show resolved Hide resolved

benchmarks/utils/args_parser.py Outdated Show resolved Hide resolved

simonrosenberg and others added 2 commits March 3, 2026 06:51

Fix ruff formatting

628f734

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

simonrosenberg requested a review from all-hands-bot March 3, 2026 10:00

all-hands-bot approved these changes Mar 3, 2026

View reviewed changes

simonrosenberg merged commit 71407ca into main Mar 3, 2026
3 checks passed

simonrosenberg deleted the fix/prompt-path-use-absolute-paths branch March 3, 2026 11:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix prompt path resolution when CWD differs from project root#474

Fix prompt path resolution when CWD differs from project root#474
simonrosenberg merged 3 commits intomainfrom
fix/prompt-path-use-absolute-paths

simonrosenberg commented Mar 3, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonrosenberg commented Mar 3, 2026

Summary

Files changed

Before / After

Validation

Test plan

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 Acceptable - Solves Real Problem, but with Trade-offs

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants