Skip to content

feat(runner): add global /feedback slash command for all sessions#687

Open
jeremyeder wants to merge 2 commits intomainfrom
amber/issue-678-add-global-feedback-sdk-tool-available-in-every-se
Open

feat(runner): add global /feedback slash command for all sessions#687
jeremyeder wants to merge 2 commits intomainfrom
amber/issue-678-add-global-feedback-sdk-tool-available-in-every-se

Conversation

@jeremyeder
Copy link
Collaborator

Summary

  • Adds a /feedback slash command available in every session, backed by the submit_feedback MCP tool that logs to Langfuse
  • When Langfuse is not configured, feedback gracefully falls back to stdout (pod logs) instead of returning an error
  • Shared commands/ directory at the runner level so future runners get /feedback for free

Changes

  • components/runners/commands/feedback.md — shared slash command file with frontmatter metadata; instructs Claude to collect rating + comment and call submit_feedback
  • DockerfileCOPY commands/ /app/commands/ to bake commands into the runner image
  • platform/commands.py — reusable helper that copies bundled commands into .claude/commands/ at startup
  • bridge.py — calls inject_platform_commands() during _setup_platform() before Claude Code launches
  • feedback_tool.py — MCP tool with graceful fallback: logs to stdout when Langfuse is unavailable (not enabled, missing creds, or package not installed)
  • prompts.py — removed LANGFUSE_ENABLED gate from feedback/correction prompt instructions (tools handle degradation themselves)
  • Tests — 20 tests covering command injection (5) and feedback tool behavior (15)

Architecture

components/runners/commands/feedback.md   ← shared across runners
       ↓ (COPY in Dockerfile)
/app/commands/feedback.md                 ← baked into image
       ↓ (inject_platform_commands at startup)
/app/.claude/commands/feedback.md         ← Claude Code picks up as /feedback

Test plan

  • pytest tests/test_feedback_tool.py tests/test_commands.py — 20/20 passing
  • Build runner image: make build-runner CONTAINER_ENGINE=docker
  • Verify command in image: docker run --rm --entrypoint="" vteam_claude_runner:latest ls /app/commands/
  • Create session on kind cluster, verify /feedback appears as slash command
  • Submit feedback with and without Langfuse configured

Closes #678

🤖 Generated with Claude Code

Amber Agent and others added 2 commits February 24, 2026 22:37
Introduces the `submit_feedback` MCP tool registered in every session
so users can signal satisfaction or dissatisfaction with agent output.
The tool logs a BOOLEAN score to Langfuse and is exposed alongside the
existing `log_correction` and `restart_session` tools.

- Add `feedback_tool.py` with `create_feedback_mcp_tool` factory and
  `_log_feedback_to_langfuse` helper (follows corrections.py pattern)
- Register a `feedback` MCP server in `build_mcp_servers()` (mcp.py)
- Add `FEEDBACK_INSTRUCTIONS` prompt constant and include it in the
  workspace context prompt when Langfuse is enabled (prompts.py)
- Add 15 unit tests covering schema, tool creation, Langfuse logging,
  trace ID fallback, comment truncation, and error paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add shared feedback.md command file so users can type /feedback in any
session. The command collects a rating and comment, then calls the
submit_feedback MCP tool (added in prior commit). When Langfuse is not
configured, the tool now logs feedback to stdout instead of returning an
error, so feedback always succeeds from the user's perspective.

- Create components/runners/commands/feedback.md (shared across runners)
- COPY commands/ into runner image via Dockerfile
- Add platform/commands.py helper to inject commands at startup
- Call inject_platform_commands() from bridge._setup_platform()
- Remove LANGFUSE_ENABLED gate from prompt instructions
- Update tests for fallback behavior

Closes #678

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 25, 2026

Claude Code Review

Summary

This PR introduces a /feedback slash command and submit_feedback MCP tool available in every session, with graceful Langfuse fallback to stdout. The architecture is clean and the feedback tool itself is well-implemented. However, there is one blocker: the simultaneous removal of the Langfuse gate from CORRECTION_DETECTION_INSTRUCTIONS without updating corrections.py to have matching graceful degradation creates a regression on non-Langfuse deployments.


Issues by Severity

🚫 Blocker Issues

1. Corrections tool regression: log_correction returns isError: True on non-Langfuse deployments

corrections.py:_log_correction_to_langfuse returns False, "Langfuse not enabled." when LANGFUSE_ENABLED is unset — it does not fall back to stdout like the new feedback_tool.py does:

# corrections.py:407-408
if not langfuse_enabled:
    return False, "Langfuse not enabled."

This causes create_correction_mcp_tool to propagate isError: True back to Claude. But this PR's prompts.py now always includes CORRECTION_DETECTION_INSTRUCTIONS ("BEFORE acting on user feedback... ALWAYS... Call log_correction FIRST"). On non-Langfuse deployments:

  1. Claude receives a CRITICAL instruction to call log_correction
  2. Claude calls the tool
  3. Tool returns isError: True with "Langfuse not enabled."
  4. Claude stalls, reports the error to the user, or retries

feedback_tool.py correctly uses _log_feedback_fallback() to log to stdout and return True, None. corrections.py was not updated to match. The PR description says "tools handle degradation themselves" but corrections does not.

Fix options:

  • Option A (preferred): Add a stdout fallback to _log_correction_to_langfuse — replace return False, "Langfuse not enabled." with a _log_correction_fallback(reason, ...) -> (True, None) helper matching the feedback pattern.
  • Option B: Keep the Langfuse gate specifically for CORRECTION_DETECTION_INSTRUCTIONS (restore the if langfuse_enabled: block) while keeping FEEDBACK_INSTRUCTIONS always active.

🔴 Critical Issues

2. Rating value not validated before boolean conversion

feedback_tool.py:221-224 extracts rating but never checks it against FEEDBACK_RATINGS:

rating = args.get("rating", "")
# ...
value = rating == "positive"  # "thumbs_up", "ok", "" all silently map to False

The JSON schema enum constrains the model, but defensive validation in the tool itself is standard practice. An unexpected value silently records a False/negative score with no log warning.

Fix: Add a warning before the conversion:

if rating not in FEEDBACK_RATINGS:
    logger.warning("Unexpected rating value %r; treating as negative", rating)

🟡 Major Issues

3. Tautological assertion in test_tool_description_content always passes

test_feedback_tool.py:834-837:

assert (
    "submit_feedback" not in description or FEEDBACK_TOOL_DESCRIPTION in description
)

description is mock_decorator.call_args[0][1], which is literally FEEDBACK_TOOL_DESCRIPTION. So the second clause (FEEDBACK_TOOL_DESCRIPTION in FEEDBACK_TOOL_DESCRIPTION) is always True, making the whole expression always True regardless of implementation.

Fix: Replace with direct content assertions:

assert "positive" in description
assert "rating" in description
assert "comment" in description

4. No test invokes the returned async tool function end-to-end

All 15 test_feedback_tool.py tests either check schema constants, assert decorator call args, or call _log_feedback_to_langfuse directly. None actually calls the async submit_feedback_tool(args) returned by create_feedback_mcp_tool. A regression in the MCP response structure (missing content key, wrong format) would go undetected.

Fix: Add at least one async test that calls the returned tool function and validates content[0]["type"] == "text" and absence of isError.

5. inject_platform_commands copies all file types despite docstring promising .md only

The module header says "Copy bundled command .md files" but the implementation has no extension filter. A .sh or .txt accidentally added to commands/ would be silently injected into .claude/commands/. Minor today, potential footgun as the directory grows.


🔵 Minor Issues

6. Redundant if feedback_tool: guard

mcp.py:107: create_feedback_mcp_tool always returns a decorated function, never None. The if feedback_tool: guard is always True. The corrections block at line 94 has the same pattern. Both are unnecessarily defensive without a comment explaining why.

7. Raw exception string in isError response

feedback_tool.py:249: f"Feedback noted but could not be recorded: {error}" where error = str(e) from a bare except Exception. A Langfuse API error could embed endpoint URLs or auth detail visible to Claude (and potentially surfaced to the user). Consider a generic user-facing message here.


Positive Highlights

  • Excellent graceful fallback design in feedback_tool.py — three-level fallback (obs client → ad-hoc Langfuse client → stdout) is robust and covers all degradation scenarios.
  • Strong test coverage — 20 tests across two files, covering: no obs, no credentials, no trace ID, last_trace_id fallback, comment truncation, ImportError.
  • Clean separation of concernscommands.py is a standalone, testable utility with its own test file; zero coupling to the bridge.
  • feedback.md slash command is well-crafted — clear frontmatter metadata, simple 3-step user flow, correctly references the MCP tool by name.
  • Follows established corrections.py patterns — factory function, _obs/_session_id closure captures, Langfuse client resolution, create_score + flush — all consistent.
  • Extensible architecture — new platform commands only require dropping a .md file in components/runners/commands/; zero code changes.
  • Dockerfile build context is consistentCOPY commands/ /app/commands/ aligns with the existing COPY claude-code-runner /app/claude-runner pattern; build context was already components/runners/.

Recommendations

  1. [Blocker] Fix corrections.py:_log_correction_to_langfuse to fall back to stdout (True, None) instead of False, "Langfuse not enabled." — or restore the prompt gate for correction instructions only.
  2. [Critical] Add rating validation with a logger.warning before value = rating == "positive".
  3. [Major] Replace the tautological assertion in test_tool_description_content with meaningful content checks.
  4. [Major] Add an async end-to-end test that invokes the returned tool function and validates the MCP response structure.
  5. [Minor] Add .md extension filter in inject_platform_commands or update the docstring to match actual behavior.

🤖 Reviewed by Claude Code (claude-sonnet-4-6)


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add global /feedback SDK tool available in every session

1 participant