[safe-output-health] Safe Output Health Report — 2026-05-17 #32768

2026-05-17T05:59:27Z

github-actions[bot]
Bot May 17, 2026

Executive Summary

24-hour window: 2026-05-16 05:42 UTC → 2026-05-17 05:27 UTC

Runs analyzed: 593
Safe-output jobs that actually executed (Result: true): 239
Safe-output jobs that emitted ##[error] markers: 16
Safe-output jobs that reported message-level failures (Failed: ≥1): 2
Job-level conclusion = failure: 0 (all 16 ended in success because of graceful-fallback paths)
Verdict: ⚠️ Degraded — a new, silent PR-review submission failure pattern dominates this window.

The headline finding is a new cluster of 11 silent PR-review submission failures (HTTP 422) — almost all in Test Quality Sentinel. The safe-output handler emits ##[error] and ##[warning] lines for these but does not count them in the Failed counter, so the workflow run is marked successful in the UI even though the review was never posted on the pull request.

Safe Output Job Statistics

Job Type / Tool	Observed in 24h	Logged Errors	Message-level Failures	Net effect on PR/Issue
`submit_pull_request_review` + `create_pull_request_review_comment`	many	11	0 (silently recorded as success)	Review not posted
`create_pull_request`	several	3 (workflows-perm push reject)	0 (fell back to issue)	Issue created instead of PR
`hide_comment`	rare	1	1	Comment not hidden
`update_pull_request`	several	1	1	Branch not updated
`add_comment`, `add_labels`, `noop`, `missing_*`, `report_incomplete`, `create_discussion`	many	0	0	✅ healthy

Critical Issues

🚨 Cluster 1 — PR review submit 422 (silent partial failure)

Count: 11 occurrences
Affected workflows: Test Quality Sentinel (10), Matt Pocock Skills Reviewer (1)
Root cause (sub-pattern A, 10 cases): the agent calls submit_pull_request_review with no body and no inline review comments. The handler still POSTs event=COMMENT, comments=0, bodyLength=0 to POST /repos/{owner}/{repo}/pulls/{n}/reviews, which GitHub rejects with 422 Unprocessable Entity: "".
Root cause (sub-pattern B, 1 case — run §25963355666): the agent buffered review comments at file paths/lines that are not part of the PR's diff. GitHub responds 422: "Path could not be resolved, Path could not be resolved, and Path could not be resolved".
Severity: High — silent. Workflow UI shows success; the actual review is never visible on the PR.

Sample error from §25963330141 (Test Quality Sentinel):

=== Finalizing PR Review ===
Submitting PR review on github/gh-aw#32588: event=COMMENT, comments=0, bodyLength=0
POST /repos/github/gh-aw/pulls/32588/reviews - 422 with id ... in 504ms
##[error]Failed to submit PR review: Unprocessable Entity: "" - https://docs.github.com/rest/pulls/reviews
##[warning]✗ Failed to submit PR review: Unprocessable Entity: ""

=== Processing Summary ===
Total messages: 2
Successful: 2     ← misleading
Failed: 0         ← does not reflect the 422

All 11 evidence runs

Run	Workflow	Sub-pattern
§25963330141	Test Quality Sentinel	A (empty)
§25963344677	Test Quality Sentinel	A (empty)
§25963355666	Matt Pocock Skills Reviewer	B (paths)
§25963355688	Test Quality Sentinel	A (empty)
§25964635944	Test Quality Sentinel	A (empty)
§25964660443	Test Quality Sentinel	A (empty)
§25966197731	Test Quality Sentinel	A (empty)
§25968749492	Test Quality Sentinel	A (empty)
§25969164447	Test Quality Sentinel	A (empty)
§25970175071	Test Quality Sentinel	A (empty)
§25974650377	Test Quality Sentinel	A (empty)

🟡 Cluster 2 — Real message-level failures (counted correctly)

These two are doing the right thing — the handler classifies them as Failed: 1 and the failure is visible:

hide_comment — §25959481433 (AI Moderator):
Could not resolve to a node with the global id of '4466568319' — comment was deleted before the safe-output job ran, or the id is wrong.
update_pull_request — §25964472382 (PR Sous Chef):
PUT /pulls/32550/update-branch → 422: head ref does not exist — PR branch was deleted before the safe-output job ran. Handler annotates the error as Retryable: false.

🟢 Cluster 3 — Graceful fallback (working as designed but noisy)

create_pull_request blocked by workflows permission — 3 occurrences, all "Q" workflow:
§25963425157, §25964282481, §25964500595
GraphQL push + git push both fail because the changeset includes .github/workflows/*.md. Handler falls back to creating a review issue. End-state is correct, but ##[error]Git push failed: The process '/usr/bin/git' failed with exit code 1 is alarming in the UI.

Root Cause Analysis

API-Related Issues

The dominant issue is the submit_pull_request_review finalization step:

The handler accumulates per-message submit_pull_request_review and create_pull_request_review_comment items during the processing loop, marking each as ✓ successful.
After the loop, a "Finalizing PR Review" stage POSTs the aggregated review to POST /repos/{owner}/{repo}/pulls/{n}/reviews.
If the aggregated body is empty AND there are no buffered comments, GitHub rejects with 422 "".
If buffered comments target paths that aren't in the diff, GitHub rejects with 422 "Path could not be resolved".

Both cases produce ##[error] log lines but do not roll back the success count for the original message, so Processing Summary shows Failed: 0 and the safe_outputs job exits 0.

Data Validation Issues

hide_comment: passes the comment id straight to GraphQL minimizeComment without checking the comment still exists.
update_pull_request: calls PUT /pulls/{n}/update-branch without first verifying the PR is still open and the head ref still exists.
Test Quality Sentinel agent emits submit_pull_request_review even when it has nothing to say.

Permission Issues

"Q" workflow agent regularly proposes changes to .github/workflows/*.md. The GitHub App lacks the workflows permission for those paths. The fallback path (create issue) works correctly. The noisy ##[error] line is just cosmetic.

Recommendations

Critical (immediate)

Stop submitting empty PR reviews (Cluster 1, sub-pattern A — 10/11 cases).
- Where: the "Finalizing PR Review" stage of the safe-output handler manager.
- Change: before POST /pulls/{n}/reviews, skip the call when bufferedComments.length === 0 && (reviewBody == null || reviewBody.trim() === ''). Record the original message as a no-op or Failed (preferred so it's visible), and skip the ##[error] line.
- Priority: Critical — eliminates the silent-failure mode for 10 of the 11 cases observed.
Audit Test Quality Sentinel prompt.
- Where: prompt source for Test Quality Sentinel.
- Change: add an explicit "do not call submit_pull_request_review if you have no inline comment and no review body" instruction. This stops the bad call at the source.
- Priority: Critical — Test Quality Sentinel accounts for 10/11 occurrences of this cluster.

High

Surface 422-on-finalize as a real failure.
- Where: Processing Summary aggregation in the safe-output handler manager.
- Change: when POST /pulls/{n}/reviews returns non-2xx, increment the failure counter for the message that requested the review so the job conclusion reflects the real outcome.
- Risk: flips currently-success runs to failure for the affected workflows; coordinate with Test Quality Sentinel owner before rollout.
Validate review-comment paths against the PR diff (Cluster 1, sub-pattern B).
- Where: before POST /pulls/{n}/reviews.
- Change: GET /pulls/{n}/files once; drop or annotate comments whose path isn't in the diff.

Medium

Pre-flight hide_comment and update_pull_request.
- hide_comment: REST GET /repos/{o}/{r}/issues/comments/{id} before calling GraphQL minimizeComment. On 404 treat as no-op.
- update_pull_request: GET /pulls/{n} first; if state == 'closed' or head.ref is missing, skip the update-branch call.
Downgrade non-retryable safe-output errors to warnings.
- When the error formatter sees Retryable: false, log ##[warning] instead of ##[error]. These can't be fixed by retrying and the red marker is misleading.

Low

Skip the push-and-fall-back dance for .github/workflows/* changesets.
- In create_pull_request, detect protected paths up-front and go directly to fallback-issue. Removes the ##[error]Git push failed noise for "Q".

Work Item Plans

Work Item 1: Eliminate silent PR-review submission failures

Type: Bug fix
Priority: Critical
Description: When the aggregated PR review has no body and no comments, the safe-output handler still POSTs to GitHub, takes a 422, logs ##[error], but reports Failed: 0. Make this either a no-op (preferred when also fixing the prompt) or a counted failure.
Acceptance criteria:
- safe_outputs no longer logs ##[error]Failed to submit PR review: Unprocessable Entity: "" when the review has no content.
- If a non-empty review fails with 422 (e.g., unresolvable paths), the Processing Summary increments Failed and the job concludes accordingly.
- Existing unit tests pass; add a test case for empty-review short-circuit and 422-from-paths.
Technical approach: patch the safe-output handler manager around the "Finalizing PR Review" stage. Add empty-check; on 422 increment failed count.
Effort: Small
Dependencies: none

Work Item 2: Tighten Test Quality Sentinel prompt

Type: Prompt fix
Priority: Critical
Description: Test Quality Sentinel is by far the largest emitter of empty submit_pull_request_review calls (10/11 of cluster 1). Add a guard in the prompt.
Acceptance criteria:
- Prompt explicitly tells the agent not to call submit_pull_request_review without a body or at least one buffered comment.
- After deployment, no new occurrences of empty PR review submissions from Test Quality Sentinel.
Effort: Small
Dependencies: none

Work Item 3: Pre-flight `update_pull_request` and `hide_comment`

Type: Defensive fix
Priority: Medium
Description: Both handlers blindly call the GitHub API and emit ##[error] when the target has changed underfoot. Add cheap pre-checks.
Acceptance criteria:
- update_pull_request: GET the PR first, skip update-branch when closed or head ref missing; emit ##[warning] instead of ##[error].
- hide_comment: GET the comment first; on 404 treat as no-op.
Effort: Small
Dependencies: none

Work Item 4: Quiet down "Q" workflow create_pull_request fallback

Type: Cosmetic / UX
Priority: Low
Description: Agent's create_pull_request keeps trying to modify .github/workflows/*.md, which is always blocked. Detect this up-front and skip directly to fallback-issue path.
Acceptance criteria:
- No ##[error]Git push failed line when the diff is exclusively workflow files.
- Fallback issue still created with the same content.
Effort: Small

Historical Context

Audit date	Runs analyzed	Actionable failures	Verdict
2026-05-15	57	0	healthy
2026-05-16	62	1	one actionable failure (add_labels schema bug — Scout run §25950059255)
2026-05-17	593	13	degraded — new silent PR-review cluster (11x), plus 2 real message failures

Trends:

Increase in actionable failures: 0 → 1 → 13. Most of today's volume comes from the new PR-review 422 cluster; without it the audit would be in line with yesterday.
The 2026-05-16 add_labels_wrong_field_name pattern did NOT recur today.
The new pr_review_submit_422 pattern is concentrated in a single workflow (Test Quality Sentinel), which suggests prompt-level rather than infrastructure-level remediation.

Metrics and KPIs

Safe-output job success rate (job-level): 239/239 = 100.0% (every job concluded success, including the 16 that emitted ##[error])
Safe-output success rate (message-level, including silent submit-review failures): ≈ 94.6% (13 effectively-failed messages out of ~239 jobs each typically processing 1–5 messages)
Most reliable handler types: add_comment, add_labels, noop, create_discussion (0 failures observed)
Most problematic handler type: submit_pull_request_review (11/11 of the high-severity cluster)
Most affected workflow: Test Quality Sentinel (10 silent PR-review failures in 24h)

Next Steps

Open a fix PR addressing Work Item 1 (skip empty PR review submission + count 422 as failed).
Open a prompt PR for Test Quality Sentinel addressing Work Item 2.
Re-run this audit on 2026-05-18 and confirm the cluster has dropped; if so, mark pr_review_submit_422_empty_or_unresolvable non-recurring. If it persists, escalate.
After Work Item 1, re-baseline the message-level success rate to ensure 422s are now counted.

References:

§25963330141 — sample silent PR-review 422 (Test Quality Sentinel)
§25963355666 — sample 422 with unresolvable paths (Matt Pocock Skills Reviewer)
§25964472382 — sample update_pull_request head-ref-missing (PR Sous Chef)

Generated by 🔒 Safe Output Health Monitor · ● 100.8M · ◷

expires on May 18, 2026, 5:59 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[safe-output-health] Safe Output Health Report — 2026-05-17 #32768

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[safe-output-health] Safe Output Health Report — 2026-05-17 #32768

Uh oh!

github-actions[bot] Bot May 17, 2026

Executive Summary

Safe Output Job Statistics

Critical Issues

🚨 Cluster 1 — PR review submit 422 (silent partial failure)

🟡 Cluster 2 — Real message-level failures (counted correctly)

🟢 Cluster 3 — Graceful fallback (working as designed but noisy)

Root Cause Analysis

Recommendations

Critical (immediate)

High

Medium

Low

Work Item Plans

Work Item 1: Eliminate silent PR-review submission failures

Work Item 2: Tighten Test Quality Sentinel prompt

Work Item 3: Pre-flight update_pull_request and hide_comment

Work Item 4: Quiet down "Q" workflow create_pull_request fallback

Historical Context

Metrics and KPIs

Next Steps

Replies: 0 comments

github-actions[bot]
Bot May 17, 2026

Work Item 3: Pre-flight `update_pull_request` and `hide_comment`