feat(runner): add global /feedback slash command for all sessions by jeremyeder · Pull Request #687 · ambient-code/platform

jeremyeder · 2026-02-25T04:26:58Z

Summary

Adds a /feedback slash command available in every session, backed by the submit_feedback MCP tool that logs to Langfuse
When Langfuse is not configured, feedback gracefully falls back to stdout (pod logs) instead of returning an error
Shared commands/ directory at the runner level so future runners get /feedback for free

Changes

components/runners/commands/feedback.md — shared slash command file with frontmatter metadata; instructs Claude to collect rating + comment and call submit_feedback
Dockerfile — COPY commands/ /app/commands/ to bake commands into the runner image
platform/commands.py — reusable helper that copies bundled commands into .claude/commands/ at startup
bridge.py — calls inject_platform_commands() during _setup_platform() before Claude Code launches
feedback_tool.py — MCP tool with graceful fallback: logs to stdout when Langfuse is unavailable (not enabled, missing creds, or package not installed)
prompts.py — removed LANGFUSE_ENABLED gate from feedback/correction prompt instructions (tools handle degradation themselves)
Tests — 20 tests covering command injection (5) and feedback tool behavior (15)

Architecture

components/runners/commands/feedback.md   ← shared across runners
       ↓ (COPY in Dockerfile)
/app/commands/feedback.md                 ← baked into image
       ↓ (inject_platform_commands at startup)
/app/.claude/commands/feedback.md         ← Claude Code picks up as /feedback

Test plan

pytest tests/test_feedback_tool.py tests/test_commands.py — 20/20 passing
Build runner image: make build-runner CONTAINER_ENGINE=docker
Verify command in image: docker run --rm --entrypoint="" vteam_claude_runner:latest ls /app/commands/
Create session on kind cluster, verify /feedback appears as slash command
Submit feedback with and without Langfuse configured

Closes #678

🤖 Generated with Claude Code

Introduces the `submit_feedback` MCP tool registered in every session so users can signal satisfaction or dissatisfaction with agent output. The tool logs a BOOLEAN score to Langfuse and is exposed alongside the existing `log_correction` and `restart_session` tools. - Add `feedback_tool.py` with `create_feedback_mcp_tool` factory and `_log_feedback_to_langfuse` helper (follows corrections.py pattern) - Register a `feedback` MCP server in `build_mcp_servers()` (mcp.py) - Add `FEEDBACK_INSTRUCTIONS` prompt constant and include it in the workspace context prompt when Langfuse is enabled (prompts.py) - Add 15 unit tests covering schema, tool creation, Langfuse logging, trace ID fallback, comment truncation, and error paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add shared feedback.md command file so users can type /feedback in any session. The command collects a rating and comment, then calls the submit_feedback MCP tool (added in prior commit). When Langfuse is not configured, the tool now logs feedback to stdout instead of returning an error, so feedback always succeeds from the user's perspective. - Create components/runners/commands/feedback.md (shared across runners) - COPY commands/ into runner image via Dockerfile - Add platform/commands.py helper to inject commands at startup - Call inject_platform_commands() from bridge._setup_platform() - Remove LANGFUSE_ENABLED gate from prompt instructions - Update tests for fallback behavior Closes #678 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-02-25T04:32:00Z

Claude Code Review

Summary

This PR introduces a /feedback slash command and submit_feedback MCP tool available in every session, with graceful Langfuse fallback to stdout. The architecture is clean and the feedback tool itself is well-implemented. However, there is one blocker: the simultaneous removal of the Langfuse gate from CORRECTION_DETECTION_INSTRUCTIONS without updating corrections.py to have matching graceful degradation creates a regression on non-Langfuse deployments.

Issues by Severity

🚫 Blocker Issues

1. Corrections tool regression: log_correction returns isError: True on non-Langfuse deployments

corrections.py:_log_correction_to_langfuse returns False, "Langfuse not enabled." when LANGFUSE_ENABLED is unset — it does not fall back to stdout like the new feedback_tool.py does:

# corrections.py:407-408
if not langfuse_enabled:
    return False, "Langfuse not enabled."

This causes create_correction_mcp_tool to propagate isError: True back to Claude. But this PR's prompts.py now always includes CORRECTION_DETECTION_INSTRUCTIONS ("BEFORE acting on user feedback... ALWAYS... Call log_correction FIRST"). On non-Langfuse deployments:

Claude receives a CRITICAL instruction to call log_correction
Claude calls the tool
Tool returns isError: True with "Langfuse not enabled."
Claude stalls, reports the error to the user, or retries

feedback_tool.py correctly uses _log_feedback_fallback() to log to stdout and return True, None. corrections.py was not updated to match. The PR description says "tools handle degradation themselves" but corrections does not.

Fix options:

Option A (preferred): Add a stdout fallback to _log_correction_to_langfuse — replace return False, "Langfuse not enabled." with a _log_correction_fallback(reason, ...) -> (True, None) helper matching the feedback pattern.
Option B: Keep the Langfuse gate specifically for CORRECTION_DETECTION_INSTRUCTIONS (restore the if langfuse_enabled: block) while keeping FEEDBACK_INSTRUCTIONS always active.

🔴 Critical Issues

2. Rating value not validated before boolean conversion

feedback_tool.py:221-224 extracts rating but never checks it against FEEDBACK_RATINGS:

rating = args.get("rating", "")
# ...
value = rating == "positive"  # "thumbs_up", "ok", "" all silently map to False

The JSON schema enum constrains the model, but defensive validation in the tool itself is standard practice. An unexpected value silently records a False/negative score with no log warning.

Fix: Add a warning before the conversion:

if rating not in FEEDBACK_RATINGS:
    logger.warning("Unexpected rating value %r; treating as negative", rating)

🟡 Major Issues

3. Tautological assertion in test_tool_description_content always passes

test_feedback_tool.py:834-837:

assert (
    "submit_feedback" not in description or FEEDBACK_TOOL_DESCRIPTION in description
)

description is mock_decorator.call_args[0][1], which is literally FEEDBACK_TOOL_DESCRIPTION. So the second clause (FEEDBACK_TOOL_DESCRIPTION in FEEDBACK_TOOL_DESCRIPTION) is always True, making the whole expression always True regardless of implementation.

Fix: Replace with direct content assertions:

assert "positive" in description
assert "rating" in description
assert "comment" in description

4. No test invokes the returned async tool function end-to-end

All 15 test_feedback_tool.py tests either check schema constants, assert decorator call args, or call _log_feedback_to_langfuse directly. None actually calls the async submit_feedback_tool(args) returned by create_feedback_mcp_tool. A regression in the MCP response structure (missing content key, wrong format) would go undetected.

Fix: Add at least one async test that calls the returned tool function and validates content[0]["type"] == "text" and absence of isError.

5. inject_platform_commands copies all file types despite docstring promising .md only

The module header says "Copy bundled command .md files" but the implementation has no extension filter. A .sh or .txt accidentally added to commands/ would be silently injected into .claude/commands/. Minor today, potential footgun as the directory grows.

🔵 Minor Issues

6. Redundant if feedback_tool: guard

mcp.py:107: create_feedback_mcp_tool always returns a decorated function, never None. The if feedback_tool: guard is always True. The corrections block at line 94 has the same pattern. Both are unnecessarily defensive without a comment explaining why.

7. Raw exception string in isError response

feedback_tool.py:249: f"Feedback noted but could not be recorded: {error}" where error = str(e) from a bare except Exception. A Langfuse API error could embed endpoint URLs or auth detail visible to Claude (and potentially surfaced to the user). Consider a generic user-facing message here.

Positive Highlights

Excellent graceful fallback design in feedback_tool.py — three-level fallback (obs client → ad-hoc Langfuse client → stdout) is robust and covers all degradation scenarios.
Strong test coverage — 20 tests across two files, covering: no obs, no credentials, no trace ID, last_trace_id fallback, comment truncation, ImportError.
Clean separation of concerns — commands.py is a standalone, testable utility with its own test file; zero coupling to the bridge.
feedback.md slash command is well-crafted — clear frontmatter metadata, simple 3-step user flow, correctly references the MCP tool by name.
Follows established corrections.py patterns — factory function, _obs/_session_id closure captures, Langfuse client resolution, create_score + flush — all consistent.
Extensible architecture — new platform commands only require dropping a .md file in components/runners/commands/; zero code changes.
Dockerfile build context is consistent — COPY commands/ /app/commands/ aligns with the existing COPY claude-code-runner /app/claude-runner pattern; build context was already components/runners/.

Recommendations

[Blocker] Fix corrections.py:_log_correction_to_langfuse to fall back to stdout (True, None) instead of False, "Langfuse not enabled." — or restore the prompt gate for correction instructions only.
[Critical] Add rating validation with a logger.warning before value = rating == "positive".
[Major] Replace the tautological assertion in test_tool_description_content with meaningful content checks.
[Major] Add an async end-to-end test that invokes the returned tool function and validates the MCP response structure.
[Minor] Add .md extension filter in inject_platform_commands or update the docstring to match actual behavior.

🤖 Reviewed by Claude Code (claude-sonnet-4-6)

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

CLAUDE.md - Master project instructions, development standards
backend-development.md - Go backend, K8s integration patterns
frontend-development.md - NextJS, Shadcn UI, React Query patterns
security-standards.md - Auth, RBAC, token handling
k8s-client-usage.md - User token vs service account patterns
error-handling.md - Consistent error patterns
react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Amber Agent and others added 2 commits February 24, 2026 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runner): add global /feedback slash command for all sessions#687

feat(runner): add global /feedback slash command for all sessions#687
jeremyeder wants to merge 2 commits intomainfrom
amber/issue-678-add-global-feedback-sdk-tool-available-in-every-se

jeremyeder commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026 •

edited

Loading

What Amber Loaded for Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeremyeder commented Feb 25, 2026

Summary

Changes

Architecture

Test plan

Uh oh!

github-actions bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Summary

Issues by Severity

🚫 Blocker Issues

🔴 Critical Issues

🟡 Major Issues

🔵 Minor Issues

Positive Highlights

Recommendations

What Amber Loaded for Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 25, 2026 •

edited

Loading