Skip to content

feat(runner): add workflow context to Langfuse traces#726

Open
bobbravo2 wants to merge 4 commits intomainfrom
feat/langfuse-workflow-context
Open

feat(runner): add workflow context to Langfuse traces#726
bobbravo2 wants to merge 4 commits intomainfrom
feat/langfuse-workflow-context

Conversation

@bobbravo2
Copy link
Contributor

@bobbravo2 bobbravo2 commented Feb 27, 2026

Customers want to be able to get data about which workflows are being used the most. This PR adds support for tracking which workflows are being used, and emits data into Langfuse.

Summary

  • Adds workflow_url, workflow_branch, and workflow_path parameters to ObservabilityManager.initialize() in observability.py
  • Wires the three ACTIVE_WORKFLOW_* env vars (already set by the operator on runner pods) through ClaudeBridge._setup_observability() into Langfuse trace metadata and tags
  • Derives a human-readable workflow name from the git URL and adds it as a workflow:{name} tag for easy filtering in the Langfuse UI

Motivation

The operator already propagates spec.activeWorkflow from the AgenticSession CR into env vars on runner pods, and workspace.py reads them to set up the working directory. However, the observability layer never consumed these values, so Langfuse traces could not be filtered or grouped by workflow.

Changes

File Change
ambient_runner/observability.py Extended initialize() signature; added workflow metadata + tag
ambient_runner/bridges/claude/bridge.py Reads ACTIVE_WORKFLOW_* env vars and passes to obs.initialize()

Result

Langfuse traces now include:

  • metadata: { ..., workflow_name, workflow_url, workflow_branch, workflow_path }
  • tags: ["...", "workflow:bug-fix-workflow"]

Test plan

  • Verify no regressions when no workflow is active (env vars empty/unset) -- fields simply omitted
  • Verify workflow metadata and tag appear in Langfuse when ACTIVE_WORKFLOW_GIT_URL is set
  • Verify derived name correctly strips .git suffix and trailing slashes
  • Confirm existing Langfuse functionality (turns, tools, privacy masking) is unaffected

Made with Cursor

The operator already sets ACTIVE_WORKFLOW_GIT_URL, ACTIVE_WORKFLOW_BRANCH,
and ACTIVE_WORKFLOW_PATH env vars on runner pods, but the observability
layer never read them. This adds workflow_url, workflow_branch, and
workflow_path parameters to ObservabilityManager.initialize() and wires
them from ClaudeBridge._setup_observability().

Langfuse traces now include:
- metadata: workflow_name, workflow_url, workflow_branch, workflow_path
- tags: workflow:{derived_name}

This enables filtering and grouping traces by which concrete workflow
was invoked.

Made-with: Cursor
@github-actions

This comment has been minimized.

Covers the new workflow metadata and tag propagation:
- Full workflow URL/branch/path/derived name in metadata and tags
- No workflow fields when workflow_url is empty or whitespace
- Name derivation: strips .git suffix, handles trailing slashes
- Optional fields (branch, path) omitted when empty

Made-with: Cursor
@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

Claude Code Review

Summary

PR #726 adds workflow context to Langfuse traces by extending ObservabilityManager.initialize() with three new parameters and wiring the already-present ACTIVE_WORKFLOW_* env vars through ClaudeBridge._setup_observability(). The change is small (30 new lines of production code), well-motivated, and ships with solid test coverage (214 lines, 6 scenarios). Overall this is a clean, focused addition, but there is one security consistency issue worth addressing before merge.


Issues by Severity

Blocker Issues

None.

Critical Issues

1. Workflow fields written to Langfuse without sanitization, inconsistent with model

The existing code explicitly sanitizes the model name before metadata/tag storage (observability.py:257-260):

# SECURITY: Model name is sanitized via sanitize_model_name() to prevent injection attacks.
# Only alphanumeric chars and allowed separators (-, _, :, @, ., /) are permitted.
# This prevents malicious tag values from disrupting Langfuse API or metadata storage.

The new workflow fields skip this step (observability.py:278-288):

  • metadata["workflow_url"] = workflow_url — unsanitized
  • metadata["workflow_branch"] = workflow_branch.strip() — unsanitized
  • metadata["workflow_path"] = workflow_path.strip() — unsanitized
  • tags.append(f"workflow:{derived_name}") — unsanitized tag value

The values originate from operator-set env vars (limited attack surface), but the existing security comment targets Langfuse API/metadata disruption — that concern applies equally here. A repo named with special characters would produce an unsanitized tag.

Recommendation: Apply sanitize_model_name() or equivalent to derived_name before tag append. Strip newlines from the metadata strings to prevent log injection (per security-standards.md).

Major Issues

2. URL-to-name derivation logic is duplicated

bridge.py:223 (existing get_mcp_status):

workflow_name = active_workflow_url.split("/")[-1].removesuffix(".git")

observability.py:278 (new code, more correct):

derived_name = workflow_url.rstrip("/").split("/")[-1].removesuffix(".git").strip()

Both exist independently and can diverge. Recommend extracting a shared workflow_name_from_url(url: str) -> str helper in ambient_runner/platform/.

Minor Issues

3. Test boilerplate repeated across all 6 new test cases

Every test in TestWorkflowContextTracking constructs the same mock context manager. This belongs in a @pytest.fixture.

4. Missing edge-case test: empty derived_name

The code has metadata["workflow_name"] = derived_name or "unknown" and if derived_name: tags.append(...) — but no test exercises the empty-derived-name branch (e.g., workflow_url="https://github.com/org/"). Both the "unknown" fallback and tag-omission are untested.

5. model: str = None type annotation (pre-existing)

observability.py:158 should be model: str | None = None. Pre-existing, but the signature is being modified here.


Positive Highlights

  • Correctly uses self._context.get_env() in the bridge — consistent with how all other env vars are read there.
  • Graceful degradation — empty/whitespace workflow_url cleanly omits all workflow fields from metadata and tags.
  • Well-placed in the call chain — env vars → bridge params → initialize() → Langfuse; observability.py stays env-var-agnostic.
  • .rstrip("/").split("/")[-1].removesuffix(".git").strip() correctly handles trailing slashes and .git, confirmed by dedicated tests.
  • Strong test coverage for happy path and key negative cases (no workflow, empty URL, absent optional fields).
  • Removed stale inline comment cleanly without losing intent.

Recommendations

  1. (Critical) Sanitize derived_name before tag append; strip newlines from workflow_branch/workflow_path metadata values — consistent with existing security posture.
  2. (Major) Extract workflow_name_from_url() shared utility used by both observability.py and bridge.py:get_mcp_status().
  3. (Minor) Factor mock-propagate setup into a @pytest.fixture in TestWorkflowContextTracking.
  4. (Minor) Add test for derived_name == "" code paths.

Reviewed by Claude Code (claude-sonnet-4-6)


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

…cement

Address review feedback from PR #726:

1. Sanitize derived_name through sanitize_model_name() before using it
   in the Langfuse tag, consistent with how model names are handled.
2. Sanitize workflow_url, workflow_branch, and workflow_path through
   validate_and_sanitize_for_logging() to strip control characters
   and prevent log injection.
3. Move log message inside the if-block so it only claims a tag was
   added when derived_name is non-empty; log a different message for
   the fallback case.
4. Add tests for empty derived_name edge case and control character
   sanitization.

Made-with: Cursor
@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

Claude Code Review

Summary

This PR wires the three ACTIVE_WORKFLOW_* env vars (already populated by the operator onto runner pods) into Langfuse trace metadata and tags, enabling filtering/grouping of traces by the active workflow. The changes are minimal and focused: 3 lines in bridge.py, ~40 lines in observability.py, and 288 lines of new tests.

The implementation follows existing patterns well, applies proper sanitization throughout, and handles all the key edge cases. The iterative commit history shows good responsiveness to review feedback.

Issues by Severity

Blocker Issues

None.

Critical Issues

None.

Major Issues

1. bridge.py changes have no test coverage

The 3-line diff in bridge.py -- reading ACTIVE_WORKFLOW_GIT_URL, ACTIVE_WORKFLOW_BRANCH, and ACTIVE_WORKFLOW_PATH from the runner context and forwarding them to obs.initialize() -- is entirely untested. All 288 new test lines cover observability.py directly. A test for ClaudeBridge._setup_observability() that mocks ObservabilityManager.initialize and asserts the three env vars are passed through via context.get_env() would close this gap and guard against future regressions in the wiring.

Minor Issues

1. sanitize_model_name() reused for workflow name -- semantic mismatch

sanitize_model_name() was designed for model identifiers, but is reused here to sanitize the derived workflow name before adding it to a Langfuse tag. This works correctly in practice (the allowed charset a-zA-Z0-9@.:/_- covers all typical GitHub/GitLab repo names), and the comment explains the intent. However, the function name creates a readability gap for future maintainers. Consider an inline comment (e.g. # reuse model-name sanitizer: same allowed charset covers repo names) or extracting a sanitize_tag_value() alias when this pattern appears a third time.

2. Workflow metadata fields have no explicit length cap

initial_prompt is truncated to 200 chars before entering metadata (prompt[:200] if len(prompt) > 200). The three workflow fields rely on validate_and_sanitize_for_logging()'s default 1000-char cap. Not a bug, but inconsistent with the existing truncation pattern. Worth aligning the caps or adding a note to the docstring.

3. Mock setup boilerplate repeated across 8 test methods

Every test in TestWorkflowContextTracking reconstructs the same mock_ctx with __enter__/__exit__ mocks. Extracting this into an autouse fixture on the class would halve the repetition without affecting coverage.

4. SSH-style URLs stored as-is in workflow_url metadata

ACTIVE_WORKFLOW_GIT_URL could be set to an SSH URL like git@github.com:org/repo.git. Name derivation handles this correctly (splits on /, strips .git => repo). However, the sanitized value is stored verbatim in the workflow_url metadata field where HTTP URLs are expected. This is not a security issue, but Langfuse UI users may find it surprising. Consider documenting the accepted formats or renaming the field to workflow_source.

Positive Highlights

  • Minimal, well-scoped diff -- 3 lines in bridge.py and ~40 lines in observability.py. No over-engineering.
  • Proper reuse of security utilities -- sanitize_model_name() for the tag value and validate_and_sanitize_for_logging() for metadata fields. Defense-in-depth against control character injection applied to all three workflow fields.
  • Excellent graceful degradation -- when no workflow is active, zero workflow fields appear in metadata or tags. No sentinel values pollute traces.
  • Strong edge-case test coverage -- the 8 new test cases exercise: no URL, empty/whitespace URL, URL without .git, trailing slash, .git-only segment (produces workflow_name='unknown' and no tag), optional fields omitted when empty, and control character injection in branch/path.
  • Iterative improvement -- the third commit shows the author incorporated prior review feedback cleanly (sanitization, log placement fix), with a detailed commit message.
  • context.get_env() usage in bridge.py -- consistent with the rest of _setup_observability() and the broader runner pattern.

Recommendations

  1. (Major) Add a unit test for ClaudeBridge._setup_observability() that mocks ObservabilityManager.initialize and asserts the three ACTIVE_WORKFLOW_* env vars are forwarded. This closes the only real coverage gap.
  2. (Minor) Add a brief inline comment on the sanitize_model_name(raw_name) call explaining why this sanitizer applies to workflow names.
  3. (Minor) Align the workflow field length cap with the 200-char limit used for initial_prompt, or document the 1000-char cap.
  4. (Style) Refactor the repeated mock setup in TestWorkflowContextTracking into a shared fixture.

Overall: clean, low-risk feature addition that follows the project's patterns. The Major gap (missing bridge.py test) is the only item worth addressing before merge.


Review generated by Claude Code -- claude-sonnet-4-6


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Tests ClaudeBridge._setup_observability() to verify the three
ACTIVE_WORKFLOW_* env vars are read from RunnerContext and forwarded
to ObservabilityManager.initialize(). Covers both the populated and
absent cases.

Made-with: Cursor
@bobbravo2 bobbravo2 marked this pull request as ready for review February 27, 2026 22:21
@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

Claude Code Review

Summary

PR #726 adds workflow context (ACTIVE_WORKFLOW_GIT_URL, ACTIVE_WORKFLOW_BRANCH, ACTIVE_WORKFLOW_PATH) to Langfuse traces in the Claude Code runner. Scoped entirely to the Python runner — no CRD, operator, backend, or frontend changes. Extends ObservabilityManager.initialize() with three new optional parameters and wires them from ClaudeBridge._setup_observability(). The commit history shows good iterative development: initial feature → tests → security review fixes → bridge-level tests.

Verdict: Approved with minor notes. This is a clean, well-tested observability enhancement.


Issues by Severity

🚫 Blocker Issues

None.

🔴 Critical Issues

None.

🟡 Major Issues

1. Credential leakage risk in workflow_url metadata storage (observability.py:282)

validate_and_sanitize_for_logging strips control characters only (lenient validation). If a workflow URL ever contains embedded credentials — https://user:token@github.com/org/repo.git — the token would be stored verbatim in Langfuse metadata.

ACTIVE_WORKFLOW_GIT_URL is operator-set and unlikely to carry embedded credentials in practice, but Langfuse metadata is visible in the UI/API. Worth an explicit credential-scrub before storage. Note: urlparse is already imported at line 49, so this is a ~3-line fix:

parsed = urlparse(workflow_url)
safe_url = urlunparse(parsed._replace(netloc=parsed.hostname or ""))
metadata["workflow_url"] = validate_and_sanitize_for_logging(safe_url)

2. sanitize_model_name repurposed for workflow name sanitization (observability.py:280)

sanitize_model_name was designed for model identifiers (allows @, :, ., /). Reusing it as a tag sanitizer for git repo names works in practice (common repo name chars [a-zA-Z0-9_.-] are all preserved), but the semantic mismatch will confuse future readers. A short comment noting the intentional reuse would make the intent clear:

# sanitize_model_name() is reused here: its allowed charset is a superset
# of valid git repo-name characters, so it works as a tag sanitizer.
derived_name = sanitize_model_name(raw_name) or ""

🔵 Minor Issues

3. Test setup boilerplate repeated seven times (test_observability.py:578-863)

The mock_langfuse_class / mock_ctx / env_vars / patch.dict pattern is copy-pasted into every test in TestWorkflowContextTracking. A shared pytest fixture would reduce ~50 lines of duplication and make individual tests more readable.

4. model: str = None type annotation (pre-existing, not introduced here)

The existing function signature uses model: str = None instead of model: str | None = None. The PR correctly uses str = "" defaults for the new params, but worth a follow-up cleanup on the existing annotation.

5. Test partial-match assertion (test_observability.py:617)

assert "Workflow 'bug-fix-workflow' added to session metadata" in caplog.text

This is a substring match that happens to pass but doesn't verify "and tags". Low risk — just worth noting if the log message ever changes.


Positive Highlights

  • Excellent test coverage ratio: 358 lines of new tests for 40 lines of new production code. Edge cases covered: empty URL, whitespace-only URL, trailing slash, missing .git suffix, .git-only segment, missing optional fields, control-character injection via branch/path.
  • Security-first approach: All three new values are sanitized before use — validate_and_sanitize_for_logging for URLs/paths, sanitize_model_name for the Langfuse tag. Correctly reuses the existing security_utils.py patterns.
  • Fully backward compatible: New initialize() params default to "", all existing callers are unaffected.
  • Follows project patterns precisely: Env var reads use self._context.get_env(...) with empty defaults; the observability module structure is preserved; the operator's existing ACTIVE_WORKFLOW_* convention is respected without any operator changes.
  • Focused commit history: Later commits respond directly to review feedback (sanitization + log placement fixes, bridge-level tests) — this is exactly what good iterative PR development looks like.
  • Zero scope creep: CRD, operator, backend, frontend are untouched. Correct blast radius for an observability-only feature.

Recommendations

  1. (Recommended before merge) Strip URL userinfo before storing workflow_url in Langfuse metadata — urlparse/urlunparse are already imported.
  2. (Optional / follow-up) Add a clarifying comment on the sanitize_model_name reuse for workflow name sanitization.
  3. (Optional / follow-up) Extract the repeated Langfuse env/mock setup in TestWorkflowContextTracking into a shared pytest fixture.

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant