[copilot-cli-research] Copilot CLI Deep Research - 2026-05-16 #32544

2026-05-16T04:53:57Z

github-actions[bot]
Bot May 16, 2026

Analysis Date: 2026-05-16
Repository: github/gh-aw
Scope: 229 total workflows, 128 using Copilot engine (56%), 99 simple form + 29 extended form

📊 Executive Summary

This is the 6th consecutive deep research run tracking Copilot CLI feature adoption across this repository. The overall Copilot footprint has grown from 121 → 128 workflows (56% of the repo). Strong cache-memory adoption continues (94 workflows), version pinning has rebounded (10 workflows, up from 0 last run), and web-search via MCP has made its first appearance (2 workflows). However, three persistent gaps remain unresolved for 9+ consecutive runs: engine.args, engine.env, and engine.api-target are all at zero usage despite being documented and supported features.

The most significant finding is that 5 custom agent files go completely unused in production workflows, and max-runs — the invocation cap feature — is used by only 1 workflow (vs. 128 that could benefit). Model selection is largely ignored (only 6 workflows use model: small out of 128), leaving significant cost-optimization potential on the table.

Critical Findings

🔴 High Priority Issues

1. engine.args, engine.env, engine.api-target — Zero Usage (9+ Consecutive Runs)
These three documented features have never been used in any production workflow. engine.env would be valuable for BYOK mode and custom debugging. engine.args enables passing custom CLI arguments for specialized scenarios. These persistent gaps suggest either poor discoverability or unclear use cases.

2. 5 Unused Custom Agent Files
.github/agents/ contains 10 files, but 5 are never referenced in any workflow: grumpy-reviewer, interactive-agent-designer, w3c-specification-writer, create-safe-output-type, custom-engine-implementation. These represent development investment that is not delivering value.

3. max-runs Severely Underused
Only 1 workflow (daily-safe-output-optimizer, max-runs: 200) uses this invocation cap. All other 127 Copilot workflows implicitly use the default of 500, meaning runaway workflows could consume 500 invocations before stopping. For daily/weekly scheduled tasks, a lower cap (20–50) is more appropriate.

🟡 Medium Priority Opportunities

4. Model Selection Gap (97/128 workflows use default model)
Only 6 workflows explicitly use model: small and 3 use model: large. With 97 workflows on the default model, there's potential for significant cost reduction by using model: small for read-only analysis tasks (architecture-guardian, breaking-change-checker, etc.).

5. engine.harness — Never Used
The custom harness script feature (engine.harness) allows replacing the built-in retry/error-handling wrapper. Zero adoption may indicate teams don't know it exists or don't need it yet.

6. 20 Copilot Workflows Missing safe-outputs
Workflows like copilot-pr-merged-report, dead-code-remover, daily-issues-report, terminal-stylist have no safe-outputs configuration — they cannot publish results back to GitHub. It's unclear if these are intentionally output-less or if they're missing a required configuration.

View Full Analysis

1️⃣ Current State Analysis

View Copilot CLI Capabilities Inventory

Copilot CLI Capabilities Inventory

Runtime Configuration

engine: copilot / engine: { id: copilot } — engine selection
engine.version — pin to specific CLI version (e.g., "0.0.422")
engine.model — override the AI model (e.g., gpt-5, gpt-5-mini, small, large)
engine.command — custom executable path
engine.args — additional CLI arguments passed to Copilot CLI
engine.env — environment variables injected at engine runtime
engine.agent — custom agent file from .github/agents/ (Copilot-exclusive)
engine.api-target — custom API endpoint (GHEC/GHES)
engine.bare — disable automatic context loading (AGENTS.md, custom instructions)
engine.harness — replace built-in retry/error-handling harness script (Copilot-exclusive)

Execution Control

max-continuations — autopilot mode with N consecutive runs (Copilot-exclusive)
max-runs — invocation cap per workflow execution (default: 500)
timeout-minutes — job timeout

Security & Sandboxing

sandbox.agent: awf — AWF (Agentic Workflow Firewall) network sandbox
sandbox.agent: srt — SRT sandbox variant
network.allowed — allowlist of domains/presets
strict: true — strict mode validation
BYOK via engine.env: COPILOT_PROVIDER_BASE_URL, COPILOT_PROVIDER_API_KEY, COPILOT_PROVIDER_BEARER_TOKEN

Tools & Integrations

tools.github — GitHub MCP server (toolsets, mode: gh-proxy)
tools.bash — shell access with glob patterns
tools.edit — file editing
tools.cli-proxy — CLI proxy tool
tools.web-fetch — HTTP fetch
tools.web-search — web search via MCP
tools.cache-memory — persistent cross-run memory
tools.playwright — browser automation

View Usage Statistics

Usage Statistics

Total Workflows: 229
Copilot Workflows: 128 (56%) — 99 simple form + 29 extended object form
model: small: 6 workflows (4.7%)
model: large: 3 workflows (2.3%)
default model: 119 workflows (93%)
sandbox (AWF): 19 workflows (15%)
bare mode: 10 workflows (8%)
max-continuations: 6 workflows (4.7%)
engine.agent (non-AWF): 7 workflows (5.5%)
cache-memory: 94 workflows (73%)
version pinning: 10 workflows (8%)
web-search: 2 workflows (1.6%)
max-runs: 1 workflow (0.8%)
engine.args: 0 workflows (0%)
engine.env: 0 workflows (0%)
engine.api-target: 0 workflows (0%)
engine.harness: 0 workflows (0%)
BYOK: 0 workflows (0%)

2️⃣ Feature Usage Matrix

Feature Category	Available Features	Used Count	Not Used
Engine Runtime	version, model, command, args, env, agent, api-target, bare, harness	5/9	args, env, api-target, command, harness
Execution Control	max-continuations, max-runs, timeout-minutes	2/3	max-runs (1 only)
Security	sandbox.awf, network.allowed, strict, bare	4/4	✅ all used
BYOK	provider base-url, api-key, bearer-token	0/3	all three
Tools	github, bash, edit, cli-proxy, web-fetch, web-search, cache-memory	6/7	playwright (separate)
Custom Agents	10 agent files available	5/10 used	grumpy-reviewer, interactive-agent-designer, w3c-spec-writer, create-safe-output-type, custom-engine-implementation

3️⃣ Missed Opportunities

View High Priority Opportunities

🔴 High Priority

Opportunity 1: `max-runs` for All Scheduled Workflows

What: The max-runs invocation cap defaults to 500 but is only configured in 1 workflow
Why It Matters: A runaway daily workflow at default 500 turns wastes significant compute and cost
Where: All 128 Copilot workflows, especially daily/weekly scheduled ones
How to Implement: Add max-runs: 30 (or appropriate value) to each scheduled workflow

Example:

engine: copilot
max-runs: 30       # Reasonable cap for a daily report workflow
timeout-minutes: 20

Opportunity 2: 5 Unused Custom Agent Files

What: grumpy-reviewer, interactive-agent-designer, w3c-specification-writer, create-safe-output-type, custom-engine-implementation exist but are never used
Why It Matters: Custom agents provide specialized behavior, system prompts, and tool configurations for specific use cases
Where: Review workflows that could benefit from specialized behavior (e.g., code reviews could use grumpy-reviewer)

How to Implement:

engine:
  id: copilot
  agent: grumpy-reviewer   # for thorough code review workflows

Action: Either create workflows using these agents, or remove unused agent files to reduce maintenance overhead

Opportunity 3: `engine.env` for BYOK and Custom Configuration

What: engine.env allows injecting environment variables to customize Copilot behavior
Why It Matters: Enables BYOK (Bring Your Own Key) for using alternative LLM providers, custom endpoints, and debug flags
Where: High-cost workflows could benefit from alternative providers; debug sessions benefit from custom env

How to Implement:

engine:
  id: copilot
  env:
    COPILOT_PROVIDER_BASE_URL: "(customllm.example.com/redacted)
    COPILOT_PROVIDER_API_KEY: ${{ secrets.CUSTOM_LLM_KEY }}

View Medium Priority Opportunities

🟡 Medium Priority

Opportunity 4: Model Selection for Cost Optimization

What: 119/128 Copilot workflows (93%) use the default model with no explicit selection
Why It Matters: model: small costs significantly less for analysis/read-only tasks; model: large for complex reasoning
Where: Read-only analysis workflows: architecture-guardian, breaking-change-checker, ci-coach, daily-syntax-error-quality, linter-miner

How to Implement:

engine: copilot
model: small    # For read-only analysis workflows

Opportunity 5: Missing `safe-outputs` on 20 Copilot Workflows

What: 20 Copilot workflows have no safe-outputs configuration
Why It Matters: Without safe-outputs, workflows cannot publish results to GitHub (issues, comments, discussions, PRs)
Affected: copilot-pr-merged-report, dead-code-remover, daily-issues-report, daily-secrets-analysis, terminal-stylist, daily-testify-uber-super-expert, mcp-inspector, etc.
Action: Audit these workflows — either add appropriate safe-outputs or document that they're intentionally output-less (e.g., they write files only)

Opportunity 6: `max-continuations` for Long-Running Workflows

What: Only 6 workflows use max-continuations despite it being a Copilot-exclusive feature
Why It Matters: Without autopilot mode, agents stop after one run and require manual re-triggering for multi-phase tasks
Where: dead-code-remover, repository-quality-improver, daily-workflow-updater, and other refactoring/improvement workflows

How to Implement:

engine:
  id: copilot
  max-continuations: 5   # Allow up to 5 consecutive runs

Opportunity 7: `engine.bare` for Self-Contained Workflows

What: Only 10 workflows use bare mode to suppress automatic context loading
Why It Matters: For research/analysis workflows that don't need the repository's custom instructions (AGENTS.md, copilot-instructions.md), bare mode reduces context overhead
Where: External-facing workflows (poem-bot, daily-news, daily-fact, constraint-solving) should use bare mode; internal repo workflows should not
Observation: Current usage seems correct — bare mode is used for external/independent tasks

View Low Priority Opportunities

🟢 Low Priority

Opportunity 8: Version Pinning for Critical Workflows

What: Version pinning (10 workflows) has rebounded from 0 last run, but critical production workflows remain unpinned
Why It Matters: Ensures reproducible behavior when a new Copilot CLI release introduces breaking changes
Where: High-frequency scheduled workflows like daily-issues-report, daily-performance-summary
Example: version: "0.0.422" in the engine block

Opportunity 9: `engine.harness` for Custom Retry Logic

What: The harness script feature has zero adoption despite being Copilot-exclusive
Why It Matters: Allows replacing the built-in CAPIError retry wrapper with custom logic
Where: Workflows with known flakiness or specific error handling requirements
Note: Low priority — most teams don't need custom harness behavior; the default handles common cases

4️⃣ Specific Workflow Recommendations

View Workflow-Specific Recommendations

Workflow: `dead-code-remover.md`

Current State: No safe-outputs, no max-continuations, 30-min timeout
Recommended Changes: Add max-continuations: 5 (dead code removal is multi-step), add push-to-pull-request-branch safe-output to surface changes
Expected Benefits: Completes the full removal cycle without manual re-triggering

Workflow: `architecture-guardian.md`

Current State: model: small ✅, safe-outputs configured ✅, 20-min timeout
Recommended Changes: Add max-runs: 20 to cap invocations for this analysis workflow
Expected Benefits: Cost control for a frequent read-only analysis

Workflow: `daily-issues-report.md`

Current State: No safe-outputs, 30-min timeout
Recommended Changes: Add discussion safe-output to publish daily report, or clarify it writes to files only
Expected Benefits: Visible output for stakeholders

Workflow: `contribution-check.md`

Current State: max-continuations: 20 ✅, uses contribution-checker agent ✅
Assessment: This workflow is the gold standard — uses custom agent + autopilot mode together. Other review workflows should follow this pattern.

Workflow: `archie.md`

Current State: Uses adr-writer custom agent ✅, imports serena-go MCP
Assessment: Great use of custom agents for specialized ADR writing. Model could be explicitly set to large for better reasoning on architectural decisions.

5️⃣ Trends & Insights

View Historical Trends (6 Runs)

Metric	May-10	May-11	May-12	May-13	May-14	May-16	Trend
Total Workflows	218	218	219	223	225	229	📈 +5%
Copilot Workflows	96	115	96	121	121	128	📈 +33%
engine.agent	?	18	7	15	25	14	🔄 fluctuating
max-continuations	2	2	4	4	4	6	📈 growing
cache-memory	89	89	10	~89	92	94	📈 stable high
bare mode	9	9	9	1	10	10	🔄 volatile
version pinning	0	2	0	0	0	10	🔄 inconsistent
web-search	0	0	0	0	0	2	🆕 new!
engine.api-target	0	0	0	0	0	0	❌ persistent gap
engine.harness	0	0	0	0	0	0	❌ persistent gap
engine.args	0	0	0	0	0	0	❌ persistent gap
BYOK	0	0	0	0	0	0	❌ persistent gap

Key Trend: Web-search adoption started this run. Version pinning rebounded strongly (0→10). The Copilot CLI ecosystem is growing steadily. Persistent zero-usage features (api-target, harness, args, BYOK) likely indicate these are enterprise/advanced features not needed for this internal repo.

6️⃣ Best Practice Guidelines

Based on this research, here are recommended best practices for Copilot workflows in this repository:

Always set max-runs: Default of 500 is too high for most scheduled workflows. Use 20–50 for analysis, 100–200 for complex refactoring.
Use model: small for read-only workflows: Architecture analysis, code review, report generation don't need the default (larger) model.
Prefer custom agents for specialized tasks: contribution-checker and adr-writer patterns show the value — encapsulate domain expertise in agent files.
Use max-continuations for multi-phase work: Workflows that improve/refactor code need multiple passes; configure autopilot mode.
Add safe-outputs or document why not: Every workflow should either publish results to GitHub or have a clear comment explaining the output mechanism.
Set strict: true: Already used by ~60% of workflows; should be standard for all new workflows.

7️⃣ Action Items

Immediate Actions (this week):

Review the 5 unused agent files — create workflows for them or remove them
Add max-runs to the top 10 highest-timeout scheduled Copilot workflows
Add safe-outputs or a comment to the 20 workflows missing it

Short-term (this month):

Add model: small to read-only analysis workflows (architecture-guardian, breaking-change-checker, ci-coach)
Add max-continuations: 3-5 to refactoring workflows (dead-code-remover, repository-quality-improver)
Document engine.args and engine.env use cases with examples from this repo

Long-term (this quarter):

Evaluate BYOK for cost-sensitive high-frequency workflows
Consider a shared import (shared/copilot-defaults.md) with standard max-runs, strict, and timeout-minutes
Track web-search adoption — confirm it's meeting expectations in the 2 early-adopter workflows

View Supporting Evidence & Methodology

📚 References

Copilot Engine Documentation: docs/src/content/docs/reference/engines.md
Copilot Engine Go Implementation: pkg/workflow/copilot_engine.go, pkg/workflow/copilot_engine_execution.go
Available Custom Agents: .github/agents/ (10 files)
Previous Research: Repo memory at memory/copilot-cli-research branch

Research Methodology

Analysis used grep and shell scripting to survey all 229 workflow markdown files in .github/workflows/. Features were counted by searching for specific YAML keys in the frontmatter of Copilot-engine workflows. The Go source code (pkg/workflow/copilot_engine*.go) was reviewed to understand available but undocumented features. Historical trends were retrieved from repo-memory (/tmp/gh-aw/repo-memory/default/). Prior research notes spanning 5 runs from 2026-05-10 to 2026-05-14 were used for trend comparison.

Generated by Copilot CLI Deep Research (Run: §25953071091)

Generated by 🔬 Copilot CLI Deep Research Agent · ● 19.8M · ◷

expires on May 17, 2026, 4:53 AM UTC

2026-05-17T05:05:51Z

github-actions[bot]
Bot May 17, 2026
Author

This discussion has been marked as outdated by Copilot CLI Deep Research Agent.

A newer discussion is available at Discussion #32749.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-cli-research] Copilot CLI Deep Research - 2026-05-16 #32544

Uh oh!

{{title}}

Uh oh!

1️⃣ Current State Analysis

Copilot CLI Capabilities Inventory

Usage Statistics

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 High Priority

Opportunity 1: `max-runs` for All Scheduled Workflows

Opportunity 2: 5 Unused Custom Agent Files

Opportunity 3: `engine.env` for BYOK and Custom Configuration

🟡 Medium Priority

Opportunity 4: Model Selection for Cost Optimization

Opportunity 5: Missing `safe-outputs` on 20 Copilot Workflows

Opportunity 6: `max-continuations` for Long-Running Workflows

Opportunity 7: `engine.bare` for Self-Contained Workflows

🟢 Low Priority

Opportunity 8: Version Pinning for Critical Workflows

Opportunity 9: `engine.harness` for Custom Retry Logic

4️⃣ Specific Workflow Recommendations

Workflow: `dead-code-remover.md`

Workflow: `architecture-guardian.md`

Workflow: `daily-issues-report.md`

Workflow: `contribution-check.md`

Workflow: `archie.md`

5️⃣ Trends & Insights

6️⃣ Best Practice Guidelines

📚 References

Research Methodology

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[copilot-cli-research] Copilot CLI Deep Research - 2026-05-16 #32544

Uh oh!

github-actions[bot] Bot May 16, 2026

📊 Executive Summary

Critical Findings

🔴 High Priority Issues

🟡 Medium Priority Opportunities

1️⃣ Current State Analysis

Copilot CLI Capabilities Inventory

Usage Statistics

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 High Priority

Opportunity 1: max-runs for All Scheduled Workflows

Opportunity 2: 5 Unused Custom Agent Files

Opportunity 3: engine.env for BYOK and Custom Configuration

🟡 Medium Priority

Opportunity 4: Model Selection for Cost Optimization

Opportunity 5: Missing safe-outputs on 20 Copilot Workflows

Opportunity 6: max-continuations for Long-Running Workflows

Opportunity 7: engine.bare for Self-Contained Workflows

🟢 Low Priority

Opportunity 8: Version Pinning for Critical Workflows

Opportunity 9: engine.harness for Custom Retry Logic

4️⃣ Specific Workflow Recommendations

Workflow: dead-code-remover.md

Workflow: architecture-guardian.md

Workflow: daily-issues-report.md

Workflow: contribution-check.md

Workflow: archie.md

5️⃣ Trends & Insights

6️⃣ Best Practice Guidelines

7️⃣ Action Items

📚 References

Research Methodology

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 17, 2026 Author

github-actions[bot]
Bot May 16, 2026

Opportunity 1: `max-runs` for All Scheduled Workflows

Opportunity 3: `engine.env` for BYOK and Custom Configuration

Opportunity 5: Missing `safe-outputs` on 20 Copilot Workflows

Opportunity 6: `max-continuations` for Long-Running Workflows

Opportunity 7: `engine.bare` for Self-Contained Workflows

Opportunity 9: `engine.harness` for Custom Retry Logic

Workflow: `dead-code-remover.md`

Workflow: `architecture-guardian.md`

Workflow: `daily-issues-report.md`

Workflow: `contribution-check.md`

Workflow: `archie.md`

github-actions[bot]
Bot May 17, 2026
Author