Skip to content

Add cogames-watch-replay skill and headless frame capture script#7

Open
SolbiatiAlessandro wants to merge 3 commits intoMetta-AI:mainfrom
SolbiatiAlessandro:pr/cogames-watch-replay-skill
Open

Add cogames-watch-replay skill and headless frame capture script#7
SolbiatiAlessandro wants to merge 3 commits intoMetta-AI:mainfrom
SolbiatiAlessandro:pr/cogames-watch-replay-skill

Conversation

@SolbiatiAlessandro
Copy link
Copy Markdown

@SolbiatiAlessandro SolbiatiAlessandro commented Mar 29, 2026

What this adds

scripts/capture_frames.py — runs an episode headlessly and saves emoji grid snapshots to a text file at regular intervals. No GUI, no TTY, no interactive input. Works with any policy; defaults to StarterPolicy (no LLM required).

.claude/skills/cogames-watch-replay/SKILL.md — a Claude Code skill that invokes the script and guides structured analysis of the output.

Why

Watching what the policy is actually doing spatially is the highest-leverage debugging tool — are agents stuck, are they reaching gear stations, are they spreading across the map? The existing unicode renderer requires interactive keyboard input (SPACE to unpause), which makes it unusable by autonomous Claude agents or in CI.

This script hooks into Rollout.event_handlers directly, runs headlessly, and writes a plain text file that Claude (or a human) can read and parse.

Real example: StarterPolicy gets stuck after step 50

Running the script on the default machina_1 mission with 4 agents reveals an immediate problem in the starter policy:

A0: moved 4 cells step 0→50, then STUCK for all 250 remaining steps (row=47, col≈41)
A1: moved 2 cells step 0→50, then STUCK for all 250 remaining steps (row=48, col≈41)
A2: moved 2 cells step 0→50, then STUCK for all 250 remaining steps (row=48, col≈42)
A3: moved 14 cells step 0→50, then STUCK for all 250 remaining steps (row=55, col≈46)
Total reward: 0.0000 across all 300 steps

All 4 agents freeze near their spawn point around step 50 and never move again. This is exactly the kind of spatial insight the script is designed to surface — numbers alone (reward=0) don't tell you why, but the frame sequence makes it unambiguous.

How to use

# Default: StarterPolicy, 500 steps, snapshot every 50
python scripts/capture_frames.py

# Gear-up phase (watch first 200 steps closely)
python scripts/capture_frames.py --steps 200 --every 10

# Full episode
python scripts/capture_frames.py --steps 1000 --every 100

# Single agent to isolate behavior
python scripts/capture_frames.py --agents 1 --steps 500 --every 50

# Your own policy
python scripts/capture_frames.py --policy class=cogames.policy.my_policy.MyPolicy

# Output to a specific file
python scripts/capture_frames.py --out docs/replay_frames.txt

From Claude Code: /cogames-watch-replay --steps 500 --every 50

What the skill teaches Claude to do

The skill guides structured analysis beyond just reading the grid visually:

  1. Extract agent positions programmatically — search for 🟦🟧🟩🟨 symbols, record (row, col) per frame
  2. Compute movement deltas — Manhattan distance between frames; flag agents stuck for >30% of episode
  3. Track reward growth rate — deceleration signals hub depletion or navigation failure
  4. Zoom into stuck areas — extract 15×15 subgrid around frozen agent to identify blocker (wall, extractor, wrong-gear station)
  5. Compare configs — run 1/3/8-agent and compare per-agent reward to distinguish individual vs. contention problems

🤖 Generated with Claude Code

relh and others added 3 commits March 26, 2026 14:13
scripts/capture_frames.py: runs an episode headlessly and saves emoji
grid snapshots to a text file at regular intervals. Works with any
policy (defaults to StarterPolicy, no LLM required). Useful for
diagnosing navigation, gear acquisition, and routing without a GUI.

.claude/skills/cogames-watch-replay/SKILL.md: Claude skill that invokes
the script and guides analysis — extract agent coordinates
programmatically, detect stuck agents by movement delta, zoom into
blocked areas with a 15x15 subgrid, compare 1/3/8-agent configs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nishu-builder nishu-builder force-pushed the main branch 4 times, most recently from a577b23 to 0454f65 Compare April 1, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants