Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu… by Bhoy1 · Pull Request #784 · PrimeIntellect-ai/verifiers

Bhoy1 · 2026-01-25T10:40:16Z

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Note

Adds first-class multi-agent support across the library with per-actor rollouts, scoring, and training.

Introduces Actor, Protocol, MultiAgentEnv, and MultiAgentRubric for multi-agent turn management, spawning, per-actor trajectory tagging, and per-actor rewards/advantages
MultiAgentEnv.generate() flattens game states into per-actor states and computes per-actor GRPO advantages; results now include actor_id
New MultiAgentOrchestrator drives training via Protocol.generate() and builds microbatches from flattened, per-actor trajectories
Exports updated in verifiers/__init__.py; eval_utils.save_rollout_results writes actor_id
Example environments: rock_paper_scissors (simultaneous moves with custom rollout) and twenty_questions (alternating turns, asymmetric actors), each with datasets and rubrics for per-actor rewards

^{Written by Cursor Bugbot for commit 1e5f474. This will update automatically on new commits. Configure here.}

…bric

CLAassistant · 2026-01-25T10:40:25Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-01-25T10:49:26Z

verifiers/__init__.py

+from .envs.actor import Actor  # noqa # isort: skip
+from .envs.protocol import EpisodeRequest, GenerateResult, Protocol  # noqa # isort: skip
+from .envs.multiagent_env import MultiAgentEnv  # noqa # isort: skip
+from .rubrics.multiagent_rubric import MultiAgentRubric  # noqa # isort: skip


New multi-agent classes lack documentation updates

Medium Severity · Bugbot Rules

This PR adds major new user-facing classes (Actor, Protocol, MultiAgentEnv, MultiAgentRubric, MultiAgentOrchestrator) exported in __all__, but no corresponding documentation updates are included. Per the review rules, any PR adding core user-facing functionality needs to update relevant documentation in docs/environments.md, docs/training.md, and docs/reference.md.

Additional Locations (2)

verifiers/envs/multiagent_env.py#L70-L83

verifiers/rl/trainer/multiagent_orchestrator.py#L38-L49

cursor · 2026-01-25T10:49:26Z

verifiers/envs/protocol.py

+                env_name = inp.get("task") or self._get_default_env()
+                env = self.get_env(env_name)
+                if env.rubric:
+                    await env.rubric.score_rollout(state, score_sem=score_sem)


spawn() uses wrong scoring method for multi-agent rubrics

Medium Severity

The spawn() method calls score_rollout() for scoring when score=True (the default). However, MultiAgentRubric stores per-actor reward functions in actor_reward_funcs, which score_rollout() (inherited from parent Rubric) does not use—it only processes functions in self.funcs. Additionally, score_rollout() does not compute advantages, which are required for GRPO training. Spawned multi-agent states would have incomplete rewards and missing advantages.

…de rollout

…tory step

… environments

…lay different models per actor. Should show in logs in CLI display. Example in twenty question/poker.

… Also, can setup currently to have different models with different system prompts play each other. Still validating environments

Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu…

1e5f474

…bric

cursor bot reviewed Jan 25, 2026

View reviewed changes

Bhoy1 marked this pull request as draft January 25, 2026 11:28

wjh581 added 14 commits January 26, 2026 21:54

Simplify multi-agent: remove orchestrator, streamline env and rubric

62ab5f8

cleaning up

f097902

cleaning up

7246cc9

cleaning up

929c90f

cleaning up: added some hooks into multiturn so i dont have to overri…

c4d5613

…de rollout

cleaning up: added some hooks into multiturn to add extras for tracje…

fdf4071

…tory step

cleaning up some more

177b228

trying to unify it better with multiturn and multiagent with the game…

b62afdc

… environments

Add proposer_solver and poker environments, update multiagent support

2a663cc

Add model and client to actor so you can use different endpoints to p…

828f80a

…lay different models per actor. Should show in logs in CLI display. Example in twenty question/poker.

adding poker_multi which is poker extension that allows more players.…

c3af35c

… Also, can setup currently to have different models with different system prompts play each other. Still validating environments

adjust protocol spawn rubric.

dda05da

added error handling and working on making sure timing is per actor

5b1141e

adding documentation

842cb9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu…#784

Add multi-agent support: Actor, Protocol, MultiAgentEnv, MultiAgentRu…#784
Bhoy1 wants to merge 15 commits intoPrimeIntellect-ai:mainfrom
Bhoy1:Bhoy1/Multiagent

Bhoy1 commented Jan 25, 2026 •

edited by cursor bot

Loading

Uh oh!

CLAassistant commented Jan 25, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 25, 2026

Uh oh!

cursor bot Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Bhoy1 commented Jan 25, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

CLAassistant commented Jan 25, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 25, 2026

Choose a reason for hiding this comment

New multi-agent classes lack documentation updates

Uh oh!

cursor bot Jan 25, 2026

Choose a reason for hiding this comment

spawn() uses wrong scoring method for multi-agent rubrics

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bhoy1 commented Jan 25, 2026 •

edited by cursor bot

Loading