-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Problem
When something goes wrong with a DevLake deployment, users must manually:
- Run
gh devlake statusand read the output - Test each connection individually
- Check pipeline logs in the Config UI
- Correlate error messages across services
There's no single command that inspects the entire stack, identifies problems, and explains what's wrong with actionable remediation steps.
Proposed Solution
Add gh devlake diagnose — an AI-powered diagnostic command that runs all health checks, connection tests, and pipeline inspections, then synthesizes a diagnosis with remediation commands.
Command surface
# Full diagnostic
gh devlake diagnose
# Focus on a specific area
gh devlake diagnose --scope connections
gh devlake diagnose --scope pipelinesHow it works
-
Gather data — run all checks programmatically (no user interaction needed):
- Ping all endpoints (backend, Config UI, Grafana)
- Test all saved connections across all plugins
- Fetch recent pipeline runs and their error messages
- Check DB connectivity
- Read state file for deployment context
-
Send to Copilot SDK — package all results into a structured context and send to the LLM with a diagnostic prompt
-
Stream diagnosis — the LLM synthesizes findings into plain-language explanation with actionable
gh devlakecommands
Example output
$ gh devlake diagnose
🔍 Running diagnostics...
✅ Backend API: http://localhost:8080 (healthy)
✅ Config UI: http://localhost:4000 (healthy)
✅ Grafana: http://localhost:3002 (healthy)
❌ Connection "GitHub - my-org" (github, id=1): 401 Unauthorized
✅ Connection "Copilot - my-ent" (gh-copilot, id=2): healthy
⚠️ Pipeline #12: FAILED (2 hours ago)
📋 Diagnosis:
Your GitHub connection "GitHub - my-org" is returning 401 Unauthorized.
This typically means the PAT has expired or been revoked.
To fix:
1. Generate a new PAT with scopes: repo, read:org, read:user
2. Update the connection:
gh devlake configure connection update --plugin github --id 1 --token ghp_NEW_TOKEN
Pipeline #12 failed because it depends on this connection.
After updating the token, re-trigger collection:
gh devlake configure project add --project-name my-team
Architecture
Reuses the internal/copilot/ package from #63. Adds diagnostic-specific tools:
// Tool: test_all_connections
// Batch-tests every connection across all plugins and returns results
var testAllConnectionsTool = copilot.DefineTool("test_all_connections",
"Test all saved DevLake connections and return pass/fail status for each",
func(params struct{}, inv copilot.ToolInvocation) (any, error) {
client := devlake.NewClient(apiURL)
var results []ConnectionTestResult
for _, def := range connectionRegistry {
conns, _ := client.ListConnections(def.Plugin)
for _, conn := range conns {
test, _ := client.TestSavedConnection(def.Plugin, conn.ID)
results = append(results, ConnectionTestResult{
Plugin: def.Plugin, ID: conn.ID, Name: conn.Name,
Healthy: test.Success, Message: test.Message,
})
}
}
return results, nil
})
// Tool: get_recent_pipeline_errors
// Fetches recent failed pipelines with error details
var getRecentPipelineErrorsTool = copilot.DefineTool("get_recent_pipeline_errors",
"Get recent failed DevLake pipeline runs with error messages and timestamps",
func(params struct{ Limit int `json:"limit,omitempty"` }, inv copilot.ToolInvocation) (any, error) {
// ... fetch pipelines, filter for failures, include error details ...
})
// Tool: check_all_endpoints
// Pings backend, Config UI, Grafana and returns status for each
var checkEndpointsTool = copilot.DefineTool("check_all_endpoints",
"Check health of all DevLake endpoints (backend API, Config UI, Grafana)",
func(params struct{}, inv copilot.ToolInvocation) (any, error) {
// ... ping each endpoint from state file or discovery ...
})Output mode
Unlike insights (which streams), diagnose uses batch mode: collect the full response, then render with the CLI's standard emoji/box-drawing formatting. This ensures the diagnostic output has consistent visual structure.
// Wait for full response instead of streaming
response, err := session.SendAndWait(ctx, copilot.MessageOptions{
Prompt: diagnosticPrompt,
})
// Format and print with standard CLI output conventionsSystem prompt for diagnosis
The system message includes:
- DevLake architecture context (three-layer model, plugin structure)
- Available
gh devlakecommands for remediation - Common failure patterns and their fixes
- The user's deployment type (local vs Azure) from the state file
Files to create/modify
| File | Change |
|---|---|
cmd/diagnose.go |
NEW — gh devlake diagnose command |
internal/copilot/tools.go |
ADD — diagnostic-specific tools (test_all_connections, get_recent_pipeline_errors, check_all_endpoints) |
internal/copilot/system.go |
ADD — diagnostic system prompt variant |
Acceptance Criteria
-
gh devlake diagnosegathers all health/connection/pipeline data and produces a synthesis -
--scope connectionslimits diagnosis to connection health only -
--scope pipelineslimits diagnosis to pipeline failures only - Diagnosis includes actionable
gh devlakecommands for remediation - Graceful error if Copilot CLI is not installed (same as
insights) - Diagnostic data gathering works even if some endpoints are down (partial results)
- Output uses batch mode with standard CLI formatting (not streaming)
-
go build ./...andgo test ./...pass - README updated
Target Version
v0.4.x — AI-powered operations.
Dependencies
- Integrate Copilot SDK (Go) —
internal/copilotpackage +gh devlake insights#63 — Copilot SDK integration (internal/copilot/package, SDK dependency) - Add
gh devlake querycommand with extensible query engine #62 — query engine (for pipeline/metric data) - Add
--jsonoutput flag to read commands #60 —--jsonoutput flag (for--jsonmode if desired)
References
- Copilot SDK (Go):
github/copilot-sdk/go—DefineTool,SendAndWait(batch mode), system messagesgo/README.md— full API referencego/definetool.go— type-safe tool definitionsgo/session.go—Send,SendAndWait, event handling
- Copilot SDK overview:
github/copilot-sdk— architecture, auth, custom tools - DevLake API patterns:
apache/incubator-devlake/AGENTS.md— plugin structure, API routes cmd/status.go— existing health check logic to reusecmd/configure_connection_test_cmd.go— existing connection test logicinternal/devlake/client.go—Health(),TestSavedConnection(),ListConnections(),GetPipeline()internal/copilot/— shared SDK client from Integrate Copilot SDK (Go) —internal/copilotpackage +gh devlake insights#63