Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu…#784
Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu…#784Bhoy1 wants to merge 15 commits intoPrimeIntellect-ai:mainfrom
Conversation
|
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| from .envs.actor import Actor # noqa # isort: skip | ||
| from .envs.protocol import EpisodeRequest, GenerateResult, Protocol # noqa # isort: skip | ||
| from .envs.multiagent_env import MultiAgentEnv # noqa # isort: skip | ||
| from .rubrics.multiagent_rubric import MultiAgentRubric # noqa # isort: skip |
There was a problem hiding this comment.
New multi-agent classes lack documentation updates
Medium Severity · Bugbot Rules
This PR adds major new user-facing classes (Actor, Protocol, MultiAgentEnv, MultiAgentRubric, MultiAgentOrchestrator) exported in __all__, but no corresponding documentation updates are included. Per the review rules, any PR adding core user-facing functionality needs to update relevant documentation in docs/environments.md, docs/training.md, and docs/reference.md.
Additional Locations (2)
verifiers/envs/protocol.py
Outdated
| env_name = inp.get("task") or self._get_default_env() | ||
| env = self.get_env(env_name) | ||
| if env.rubric: | ||
| await env.rubric.score_rollout(state, score_sem=score_sem) |
There was a problem hiding this comment.
spawn() uses wrong scoring method for multi-agent rubrics
Medium Severity
The spawn() method calls score_rollout() for scoring when score=True (the default). However, MultiAgentRubric stores per-actor reward functions in actor_reward_funcs, which score_rollout() (inherited from parent Rubric) does not use—it only processes functions in self.funcs. Additionally, score_rollout() does not compute advantages, which are required for GRPO training. Spawned multi-agent states would have incomplete rewards and missing advantages.
…lay different models per actor. Should show in logs in CLI display. Example in twenty question/poker.
… Also, can setup currently to have different models with different system prompts play each other. Still validating environments
Description
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Adds first-class multi-agent support across the library with per-actor rollouts, scoring, and training.
Actor,Protocol,MultiAgentEnv, andMultiAgentRubricfor multi-agent turn management, spawning, per-actor trajectory tagging, and per-actor rewards/advantagesMultiAgentEnv.generate()flattens game states into per-actor states and computes per-actor GRPO advantages; results now includeactor_idMultiAgentOrchestratordrives training viaProtocol.generate()and builds microbatches from flattened, per-actor trajectoriesverifiers/__init__.py;eval_utils.save_rollout_resultswritesactor_idrock_paper_scissors(simultaneous moves with custom rollout) andtwenty_questions(alternating turns, asymmetric actors), each with datasets and rubrics for per-actor rewardsWritten by Cursor Bugbot for commit 1e5f474. This will update automatically on new commits. Configure here.