[copilot-cli-research] Copilot CLI Deep Research - 2026-05-17 #32749
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Copilot CLI Deep Research Agent. A newer discussion is available at Discussion #32950. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Analysis Date: 2026-05-17
Repository: github/gh-aw
Scope: 229 total workflows, 126 using Copilot engine (55%)
Previous Analysis: 2026-05-16
📊 Executive Summary
Research Topic: Copilot CLI Optimization Opportunities
Key Findings: 5 persistent feature gaps, 5 unused agent files, 0 version pinning (alarming drop from 10→0), stable growth in custom agents (14→25)
Primary Recommendation: Enable
max-runscaps across all scheduled workflows — currently only 1/229 workflows uses this guardrail, exposing the repository to runaway cost scenarios.This analysis compares the rich Copilot engine feature set available in gh-aw against actual usage across 126 Copilot workflows. The repository makes excellent use of core features (network allowlists, strict mode, cache-memory, safe-outputs, imports) but consistently leaves advanced features untouched across multiple consecutive analysis runs. Five critical features —
engine.args,engine.env,engine.api-target,engine.harness, and BYOK mode — have recorded zero usage in 10+ consecutive analysis runs, suggesting documentation visibility or discoverability issues rather than intentional avoidance.On the positive side, custom agent adoption (
engine.agent) rebounded strongly from 14 to 25 workflows this run. Bare mode ticked up to 11 workflows. The persistent blind spot remainsmax-runs— with only 1 workflow capping invocations, all scheduled workflows are exposed to cost spikes during GitHub Actions failures or loops.Critical Findings
🔴 High Priority Issues
1. Zero
max-runsguardrails on scheduled workflowsOnly
daily-safe-output-optimizer.mdusesmax-runs: 200. All 57+ scheduled workflows can loop infinitely if a bug causes repeated retriggers. A single runaway loop at 500 invocations (the default cap) × 30-minute timeout = significant cost exposure.2. Version pinning dropped from 10 → 0
Last run had 10 version-pinned workflows. This run shows 0 (
grep " version: \"" .github/workflows/*.mdreturns 18 hits, but all appear in smoke test or multi-job prompt sections, not engine config). Either pinned workflows were removed, or measurement methodology changed. This needs investigation — unpinned workflows receive automatic upgrades that could break production workflows silently.🟡 Medium Priority Opportunities
3. 5 agent files exist but are never used in workflows
grumpy-reviewer,interactive-agent-designer,w3c-specification-writer,create-safe-output-type, andcustom-engine-implementationagent files exist in.github/agents/but appear in zero workflows. These represent invested tooling going to waste.4.
max-continuationsused by only 5/126 Copilot workflowsAutopilot mode (
max-continuations) enables multi-run sessions for complex tasks. Only 5 workflows use it despite many being open-ended research/analysis tasks that could benefit from extended sessions.View Full Analysis
1️⃣ Current State Analysis
View Copilot CLI Capabilities Inventory
Copilot CLI Capabilities
engine: copilotorengine.id: copilotengine.version: "0.0.422"engine.model: gpt-5COPILOT_MODELenv varengine.agent: agent-id.github/agents/*.agent.mdengine.args: [...]engine.env: { KEY: val }COPILOT_PROVIDER_*engine.bare: trueengine.harness: script.cjsengine.api-target: hostnameengine.env.COPILOT_PROVIDER_BASE_URLmax-continuations: Nsandbox: awfsandbox: srtnetwork.allowed: [...]max-runs: Ncache-memory:safe-outputs:tools.mcp-server:tools.web-search:View Usage Statistics
Usage Statistics
engine.bare: trueengine.modeloverrideengine.agentcustomengine.versionpinnedengine.argsengine.envengine.harnessengine.api-targetmax-continuationsmax-runssandbox:strict: truecache-memorynetwork:safe-outputsimportstimeout-minutes*strict mode appears in non-Copilot workflows too
Timeout Distribution (all engines)
2️⃣ Feature Usage Matrix
3️⃣ Missed Opportunities
View High Priority Opportunities
🔴 Opportunity 1: Add
max-runsto all scheduled workflowsWhat: Set an explicit invocation cap to prevent cost spikes from runaway loops.
Why It Matters: Default cap is 500 runs. At 30-min timeouts, a bug triggering a loop could consume 250 hours of runner time before hitting the cap.
Where: All 57+ scheduled workflows.
How to Implement:
Expected Benefits: Cost predictability, protection against accidental loops, explicit documentation of expected invocation count.
🔴 Opportunity 2: Verify and restore version pinning
What: Previously 10 workflows pinned Copilot CLI versions. Now 0 are pinned.
Why It Matters: Uncontrolled upgrades can silently break workflows when the CLI changes behavior. Pinning enables reproducible, reviewable upgrades.
How to Implement:
View Medium Priority Opportunities
🟡 Opportunity 3: Deploy unused agent files in appropriate workflows
What: 5 agent files in
.github/agents/have never been used in any workflow:grumpy-reviewer.agent.md— critical code review personainteractive-agent-designer.agent.md— workflow design assistantw3c-specification-writer.agent.md— spec writingcreate-safe-output-type.agent.md— safe output type creationcustom-engine-implementation.agent.md— engine implementation helperWhy It Matters: These files represent invested tooling that shapes agent behavior. They exist for a reason.
How to Implement: Identify matching workflows and add
engine.agent: grumpy-revieweretc.🟡 Opportunity 4: Expand
max-continuationsfor complex research workflowsWhat: Autopilot mode lets Copilot continue working across multiple sessions automatically.
Current: Only 5 workflows use it (contribution-check:20, test-quality-sentinel:15, mattpocock-skills-reviewer:10, and 2 smoke tests).
Where: Complex research/analysis workflows like
ab-testing-advisor,copilot-agent-analysis,daily-agentrx-trace-optimizer,spec-enforcer.How to Implement:
🟡 Opportunity 5: Use
engine.barefor read-only analytical workflowsWhat:
engine.bare: trueprevents Copilot from loading AGENTS.md and memory files, reducing noise for workflows that don't benefit from repository context.Current: Only 11 workflows use it.
Where: Pure analysis workflows (log analyzers, metric reporters, scheduled summaries) that provide complete prompts and don't need codebase context.
Example workflows:
agent-performance-analyzer,artifacts-summary,blog-auditor,copilot-pr-merged-reportView Low Priority Opportunities
🟢 Opportunity 6: Use
engine.envfor BYOK mode experimentationWhat: BYOK mode routes requests to an external LLM provider via
COPILOT_PROVIDER_BASE_URL. This enables using GPT-5, Claude via Anthropic API, or local models.Why It Matters: Zero-cost for testing against local Ollama/vLLM instances; cost optimization for high-frequency workflows.
Example:
🟢 Opportunity 7: Use
engine.argsfor custom CLI flagsWhat:
engine.argspasses additional CLI arguments directly to the Copilot CLI binary.Use cases: Custom
--add-dirpaths for additional filesystem access, experimental flags.Note: Zero usage for 10+ consecutive runs suggests this is either not needed or undiscoverable.
🟢 Opportunity 8: Consider
engine.harnessfor retry logic customizationWhat: Copilot-only feature. Replaces the built-in
copilot_harness.cjsretry wrapper.Use case: Custom error handling, specialized retry logic, workflow-specific resilience patterns.
Note: High complexity, low priority unless specific retry issues arise.
4️⃣ Workflow-Specific Recommendations
View Workflow-Specific Recommendations
Workflows that could use
engine.agentgrumpy-reviewer.agent.md→ Add to code review workflows likecode-simplifier.mdinteractive-agent-designer.agent.md→ Add to workflow authoring workflowsw3c-specification-writer.agent.md→ Add tospec-librarian.mdorspec-enforcer.mdWorkflows that could use
engine.bareAnalytical workflows with self-contained prompts:
ab-testing-advisor.md— currently structured engine with bare already? Checkcopilot-agent-analysis.md— pure metrics analysisartifacts-summary.md— report generationWorkflows that could use
max-continuationsLong-running research or iterative improvement workflows:
daily-agentrx-trace-optimizer.md— trace analysis could benefit from multi-sessionspec-enforcer.md— iterative spec enforcementab-testing-advisor.md— A/B experiment management5️⃣ Trends & Insights
View Historical Trends
Observations:
version pinningvolatility (0→10→0) suggests conditional config (experiments?) rather than stable frontmatterengine.agentvolatility (25→14→25) suggests agents get added/removed during workflow authoring cyclesargs,env,api-target,harness, BYOK) have never been used across all analyzed runs — likely an adoption/discoverability gap6️⃣ Best Practice Guidelines
Always set
max-runs: Every scheduled workflow should cap invocations. Evenmax-runs: 10is better than the default 500. Prevents runaway cost scenarios.Use
engine.barefor analytical workflows: If the workflow prompt is self-contained and doesn't need codebase context (log analysis, metric reports, summaries), addengine.bare: trueto prevent noise from AGENTS.md and memory files.Pin versions before major releases: Use
engine.version: "X.Y.Z"when a Copilot CLI update could affect workflow behavior. Pin → test → upgrade cycle is safer than uncontrolled latest.Deploy agent files you create: The
.github/agents/directory has 5 files never referenced in any workflow. Review each and either wire them to appropriate workflows or remove them to reduce maintenance burden.Consider
max-continuationsfor open-ended tasks: Research, analysis, and iterative improvement workflows often need more than one session to complete.max-continuations: 3-5gives them room to succeed.7️⃣ Action Items
Immediate Actions (this week):
max-runs: 5(or appropriate value) to all scheduled workflowsShort-term (this month):
grumpy-reviewerandw3c-specification-writeragent files to appropriate workflowsengine.bare: trueto 5-10 pure analytical workflowsmax-continuations: 3to 3-5 long-running research workflowsLong-term (this quarter):
engine.args,engine.env,engine.api-targetView Supporting Evidence & Methodology
📚 References
Research Methodology
pkg/workflow/copilot_engine*.gofor available CLI flags and featuresdocs/src/content/docs/reference/engines.mdfor documented capabilitiesgrepacross all 229.github/workflows/*.mdfiles for feature presence/absence/tmp/gh-aw/repo-memory/default/).github/agents/vs workflowengine.agent:referencesData confidence: grep-based counts are accurate for explicit frontmatter fields. Some metrics (e.g., cache-memory) may undercount if tools are declared in imported shared configs.
Generated by Copilot CLI Deep Research Agent · Run §25981819267
Beta Was this translation helpful? Give feedback.
All reactions