refactor(swtbench): regroup all hyperparameters in constants.py#372
refactor(swtbench): regroup all hyperparameters in constants.py#372simonrosenberg wants to merge 4 commits intomainfrom
Conversation
This commit creates a single source of truth for all SWTBench hyperparameters and constant values by: 1. Creating benchmarks/swtbench/constants.py with all constants: - Docker/Image related constants (prefixes, registries, build targets) - Dataset related constants (default dataset, split, model name) - Environment variable names - Default values for various parameters - File/directory paths (repo dir, evaluation results dir, report filename) - Git/repository related constants - Patch processing constants - Environment setup commands 2. Updating all swtbench modules to import from constants.py: - run_infer.py - eval_infer.py - build_eval_env_images.py - image_utils.py This makes it easier to check and modify hyperparameters for the benchmark as they are now centralized in one location. Fixes #364 Co-authored-by: openhands <openhands@all-hands.dev>
e275ab3 to
328490e
Compare
…ility
- Add typing.Final annotations to all constants for type safety
- Convert mutable lists to immutable tuples:
- SETUP_FILES_TO_REMOVE
- BUILD_MODE_CHOICES
- DEFAULT_ENV_SETUP_COMMANDS
- Add BuildMode enum for type-safe build mode selection
- Convert string numeric constants to proper int types:
- DEFAULT_STARTUP_TIMEOUT: '600' -> 600
- DEFAULT_EVAL_WORKERS: '12' -> 12
- Update callers to handle type changes:
- eval_infer.py: workers parameter now int, add type=int to argparse
- run_infer.py: convert tuple to list for env_setup_commands,
convert int to str for os.getenv default
Co-authored-by: openhands <openhands@all-hands.dev>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
- Add TargetType alias to constants.py matching openhands.sdk.workspace.TargetType - Update DEFAULT_BUILD_TARGET to use TargetType instead of str - Update run_infer.py to use TargetType for build_target variable and function parameter Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
🔴 Code Review: PR #372 - SWTBench Constants RefactorTaste Rating: 🟡 Acceptable - Works but has one behavioral change that needs attention[CRITICAL ISSUES] (Must fix - these break fundamental principles)[benchmarks/swtbench/eval_infer.py, Lines 246-248] # Original:
def run_swtbench_evaluation(..., workers: str = "12") -> None:
# New:
def run_swtbench_evaluation(..., workers: int = DEFAULT_EVAL_WORKERS) -> None:While this works at runtime (Python is dynamically typed), it's a function signature change that could break type-checked callers. The argparse also changed from accepting a string to Recommendation: If this is intentional (improving type correctness), document it in the PR description as a breaking change. Otherwise, keep [IMPROVEMENT OPPORTUNITIES] (Should fix - violates good taste)[benchmarks/swtbench/constants.py, Line 35] 🔍 Magic Number: [benchmarks/swtbench/constants.py, Lines 38-49] 🔧 Inconsistent Enum Usage: You define DEFAULT_BUILD_MODE: Final[str] = BuildMode.CLI.value # Why not just BuildMode.CLI?
BUILD_MODE_CHOICES: Final[Tuple[str, ...]] = tuple(m.value for m in BuildMode)Consider using the enum directly and converting to string only at the argparse boundary. [benchmarks/swtbench/constants.py, Line 13] 🔧 Inconsistent Type Definition: [STYLE NOTES] (Minor observations)[benchmarks/swtbench/constants.py] 📝 The heavy use of section separators ( [benchmarks/swtbench/eval_infer.py, Line 214] ✅ VERDICT:❌ Needs minor rework: The KEY INSIGHT:The refactoring successfully centralizes constants without changing their values, but inadvertently "improves" the Suggested Next Steps for Cleaner Code:
|
Remove the BuildMode enum class and replace with simple string constants. The enum was only used to generate choices and default values, which can be done more simply with a tuple and string constant. Co-authored-by: openhands <openhands@all-hands.dev>
Summary
This PR creates a single source of truth for all SWTBench hyperparameters and constant values by introducing
benchmarks/swtbench/constants.py.Fixes #364
Changes
1. Created
benchmarks/swtbench/constants.pyThis new file contains all constant values organized into logical categories:
TargetTypeliteral type alias for build targetsBuildModeenum with API and CLI options2. Updated all swtbench modules to import from constants.py
run_infer.pyeval_infer.pybuild_eval_env_images.pyimage_utils.py3. Type Safety Improvements
typing.Finalannotations for immutabilitySETUP_FILES_TO_REMOVEBUILD_MODE_CHOICESDEFAULT_ENV_SETUP_COMMANDSBuildModeenum for type-safe build mode selectionTargetTypeliteral type for build targets (matchesopenhands.sdk.workspace.TargetType)inttypes instead of strings:DEFAULT_STARTUP_TIMEOUT: 600 (int)DEFAULT_EVAL_WORKERS: 12 (int)Benefits
Finalannotations and appropriate types