Skip to content

Example for training on SWE (agentic software-engineering) tasks? #573

@dipta007

Description

@dipta007

Is there an example/recipe for grpo training on multi-turn SWE-style tasks (SWE-bench / SWE-Gym)?

Looking for:

  • Multi-turn agentic rollouts with tool calls (shell, file edits, tests)
  • How environment/reward is wired (e.g. tests-passing as reward)
  • Setup for long trajectories / large context
  • Setup for how docker/podman is handled efficiently during rollout

If nothing SWE-specific exists, a pointer to the closest multi-turn example to adapt would help. Happy to contribute one back.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions