Skip to content

test "kustomize build" for overlays before merge#841

Open
ktdreyer wants to merge 2 commits intomainfrom
kdreyer/kustomize-ci
Open

test "kustomize build" for overlays before merge#841
ktdreyer wants to merge 2 commits intomainfrom
kdreyer/kustomize-ci

Conversation

@ktdreyer
Copy link
Contributor

@ktdreyer ktdreyer commented Mar 6, 2026

Three of our five overlays (kind, kind-local, local-dev) had no CI validation. They can easily break silently if we change something in shared base resources.

Add basic syntax checking.

Closes #840

@ktdreyer ktdreyer force-pushed the kdreyer/kustomize-ci branch from cc9062b to 33f30f1 Compare March 6, 2026 22:06
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@ktdreyer
Copy link
Contributor Author

ktdreyer commented Mar 6, 2026

I decided to go with a hard-coded overlay list because we should not be changing those without notice. Several things depend on them (ie make files, other GHA workflows, etc)

@ktdreyer ktdreyer changed the title catch broken kustomize overlays before merge test "kustomize build" for overlays before merge Mar 6, 2026
@ktdreyer ktdreyer force-pushed the kdreyer/kustomize-ci branch 3 times, most recently from 5b354e1 to d2aceb6 Compare March 9, 2026 12:17
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

PR 841 adds kustomize-check.yml CI workflow. Review below.

BLOCKER: None. CRITICAL: None. MAJOR: None.

MINOR 1 - Non-deterministic oc version (.github/workflows/kustomize-check.yml:18): oc: latest picks up new releases silently on each run. A future oc version changing kustomize behavior could break CI without any manifest change. Pin to a specific version (e.g. 4.17) for reproducible builds.

MINOR 2 - Floating action tag (.github/workflows/kustomize-check.yml:16): redhat-actions/openshift-tools-installer@v1 is a mutable tag. Pin to an immutable commit SHA for supply-chain hygiene. Standard: security-standards.md.

POSITIVE HIGHLIGHTS:
Three overlays (kind, kind-local, local-dev) had zero prior CI coverage -- this fills that gap directly (closes 840). The failed=0 accumulator pattern runs all overlays before exiting so authors see the full picture, not just the first failure. Path filter (components/manifests/**) avoids noise on unrelated PRs. 39 lines, minimal, focused.

RECOMMENDATIONS: (1) Pin oc to a specific version. (2) Pin installer action to a commit SHA. Both minor. PR is ready to merge.

-- Review by Claude Code (Amber review system)

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

test body no special chars

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Claude Code Review -- PR 841 kustomize-check.yml

SUMMARY: PR 841 adds a GitHub Actions workflow to validate kustomize build for all five deployment overlays (production, e2e, kind, kind-local, local-dev). This closes a real CI gap where three overlays had no automated syntax checking. Functionally correct, but several inconsistencies with repo conventions should be addressed.

BLOCKER ISSUES: None

CRITICAL ISSUES: None

MAJOR ISSUES:

  1. actions/checkout version inconsistency (.github/workflows/kustomize-check.yml line 14)
    The new workflow uses actions/checkout@v4, but every other workflow in this repo (lint.yml, unit-tests.yml, e2e.yml, etc.) has standardized on actions/checkout@v6. Suggested fix: change to uses: actions/checkout@v6

  2. Unpinned oc tool version (.github/workflows/kustomize-check.yml line 18)
    Using oc: latest means a silent version bump could cause valid kustomize configs to fail without any code change in this repo. The production and e2e overlays are now also under this workflow, adding unnecessary flake risk to previously stable checks. Suggested fix: pin to a specific release, e.g. oc: 4.17

  3. Missing concurrency block (.github/workflows/kustomize-check.yml workflow level)
    Every other PR-targeting workflow in this repo defines a concurrency group to cancel stale in-progress runs when a PR is updated. Without it, rapid pushes to a PR branch can queue up redundant runs. All other workflows use a pattern like: concurrency: group: -PR-OR-REF, cancel-in-progress: true

MINOR ISSUES:

  1. Missing workflow_dispatch trigger: All other workflows include workflow_dispatch so maintainers can manually re-run checks without pushing a new commit. Consider adding it for consistency.

  2. Workflow not self-referencing in paths filter: Other workflows include their own file path in the trigger paths so changes to the workflow file also trigger CI. Suggested addition to the paths list: .github/workflows/kustomize-check.yml

POSITIVE HIGHLIGHTS:

  • Fail-at-end design: the shell script collects all overlay failures before calling exit 1, surfacing all broken overlays in a single run rather than stopping at the first failure.
  • Precise path scoping: triggering only on components/manifests/** avoids unnecessary CI runs for unrelated changes.
  • Complete overlay coverage: all five overlays are validated, not just the three previously missing.
  • Readable output: per-overlay OK/FAILED banners make the CI log easy to scan.

RECOMMENDATIONS (priority order):

  1. [Major] Change actions/checkout@v4 to @v6 to match the repo convention.
  2. [Major] Pin oc to a specific version instead of latest to avoid flaky failures from silent tool upgrades.
  3. [Major] Add a concurrency block to cancel stale in-progress runs on PR updates.
  4. [Minor] Add workflow_dispatch trigger and self-reference in paths filter to match repo conventions.

Review performed by Claude Code using repository standards from the Amber memory system.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Ambient Code Platform

Kubernetes-native AI automation platform that orchestrates agentic sessions through containerized microservices. Built with Go (backend, operator), NextJS + Shadcn (frontend), Python (runner), and Kubernetes CRDs.

Technical artifacts still use "vteam" for backward compatibility.

Structure

  • components/backend/ - Go REST API (Gin), manages K8s Custom Resources with multi-tenant project isolation
  • components/frontend/ - NextJS web UI for session management and monitoring
  • components/operator/ - Go Kubernetes controller, watches CRDs and creates Jobs
  • components/runners/ambient-runner/ - Python runner executing Claude Code CLI in Job pods
  • components/ambient-cli/ - Go CLI (acpctl), manages agentic sessions from the command line
  • components/public-api/ - Stateless HTTP gateway, proxies to backend (no direct K8s access)
  • components/manifests/ - Kustomize-based deployment manifests and overlays
  • e2e/ - Cypress end-to-end tests
  • docs/ - Astro Starlight documentation site

Key Files

  • CRD definitions: components/manifests/base/crds/agenticsessions-crd.yaml, projectsettings-crd.yaml
  • Session lifecycle: components/backend/handlers/sessions.go, components/operator/internal/handlers/sessions.go
  • Auth & RBAC middleware: components/backend/handlers/middleware.go
  • K8s client init: components/operator/internal/config/config.go
  • Runner entry point: components/runners/ambient-runner/main.py
  • Route registration: components/backend/routes.go
  • Frontend API layer: components/frontend/src/services/api/, src/services/queries/

Session Flow

User Creates Session → Backend Creates CR → Operator Spawns Job →
Pod Runs Claude CLI → Results Stored in CR → UI Displays Progress

Commands

make build-all                # Build all container images
make deploy                   # Deploy to cluster
make test                     # Run tests
make lint                     # Lint code
make kind-up                  # Start local Kind cluster
make test-e2e-local           # Run E2E tests against Kind

Per-Component

# Backend / Operator (Go)
cd components/backend && gofmt -l . && go vet ./... && golangci-lint run
cd components/operator && gofmt -l . && go vet ./... && golangci-lint run

# Frontend
cd components/frontend && npm run build  # Must pass with 0 errors, 0 warnings

# Runner (Python)
cd components/runners/ambient-runner && uv venv && uv pip install -e .

# Docs
cd docs && npm run dev  # http://localhost:4321

Critical Context

  • User token auth required: All user-facing API ops use GetK8sClientsForRequest(c), never the backend service account
  • OwnerReferences on all child resources: Jobs, Secrets, PVCs must have controller owner refs
  • No panic() in production: Return explicit fmt.Errorf with context
  • No any types in frontend: Use proper types, unknown, or generic constraints
  • Conventional commits: Squashed on merge to main

Pre-commit Hooks

The project uses the pre-commit framework to run linters locally before every commit. Configuration lives in .pre-commit-config.yaml.

Install

make setup-hooks

What Runs

On every git commit:

Hook Scope
trailing-whitespace, end-of-file-fixer, check-yaml, check-added-large-files, check-merge-conflict, detect-private-key All files
ruff-format, ruff (check + fix) Python (runners, scripts)
gofmt, go vet, golangci-lint Go (backend, operator, public-api — per-module)
eslint Frontend TypeScript/JavaScript
branch-protection Blocks commits to main/master/production

On every git push:

Hook Scope
push-protection Blocks pushes to main/master/production

Run Manually

make lint                                    # All hooks, all files
pre-commit run gofmt-check --all-files       # Single hook
pre-commit run --files path/to/file.go       # Single file

Skip Hooks

git commit --no-verify    # Skip pre-commit hooks
git push --no-verify      # Skip pre-push hooks

Notes

  • Go and ESLint wrappers (scripts/pre-commit/) skip gracefully if the toolchain is not installed
  • tsc --noEmit and npm run build are not included (slow; CI gates on them)
  • Branch/push protection scripts remain in scripts/git-hooks/ and are invoked by pre-commit

Testing

  • Frontend unit tests: cd components/frontend && npx vitest run --coverage (466 tests, ~74% coverage). See components/frontend/README.md.
  • E2E tests: cd e2e && npx cypress run --browser chrome (58 tests, mock SDK). See e2e/README.md.
  • Runner tests: cd components/runners/ambient-runner && python -m pytest tests/
  • Backend tests: cd components/backend && make test. See components/backend/TEST_GUIDE.md.

More Info

See BOOKMARKS.md for architecture decisions, development context, code patterns, and component-specific guides.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Deleting test comments - please ignore the previous 3 comments on this PR.

@ktdreyer ktdreyer requested a review from kahowell March 9, 2026 18:17
ktdreyer and others added 2 commits March 9, 2026 14:17
Three of our five overlays (kind, kind-local, local-dev) had no CI
validation. They can easily break silently if we change something in
shared base resources.

Add basic syntax checking.

Closes #840

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Match patterns used by other workflows: checkout@v6,
concurrency group, workflow_dispatch trigger, and
self-referencing paths filter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ktdreyer ktdreyer force-pushed the kdreyer/kustomize-ci branch from af25ec3 to d748c83 Compare March 9, 2026 18:17
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Claude Code Review

Summary

This PR adds a new GitHub Actions workflow to validate kustomize build for all five manifest overlays (production, e2e, kind, kind-local, local-dev) on every manifests change. It closes a real coverage gap where three overlays had no CI validation. The overall structure is clean, consistent with existing workflows, and solves the stated problem well. One usability issue is worth addressing before merge.

Issues by Severity

Blocker Issues

None

Critical Issues

None

Major Issues

Kustomize error output is suppressed, making failures undiagnosable in CI

.github/workflows/kustomize-check.yml — line 34

if oc kustomize "components/manifests/overlays/$overlay" > /dev/null; then

Redirecting stdout to /dev/null means when an overlay fails to build, the CI log only shows FAILED with no indication of what went wrong. A developer will have to reproduce the failure locally to diagnose it. The entire point of running this in CI is to surface breakage early — hiding the diagnostic output defeats that goal.

Suggested fix:

for overlay in production e2e kind kind-local local-dev; do
  echo "=== $overlay ==="
  if oc kustomize "components/manifests/overlays/$overlay"; then
    echo "OK"
  else
    echo "FAILED"
    failed=1
  fi
  echo ""
done

Removing the redirect lets kustomize output stream directly to the CI log, which is suppressed on success anyway (GitHub Actions collapses passing step output) and visible when the step fails.

Minor Issues

oc: latest is unpinned

.github/workflows/kustomize-check.yml — line 27

oc: latest

Using latest for the oc version means the workflow will silently adopt new major versions of the OpenShift CLI. While oc kustomize is a stable subcommand, pinning to a known-good version (e.g., oc: "4.18") is consistent with the project's general preference for reproducible builds and avoids surprise breakage from upstream changes.

Positive Highlights

  • All five overlays are covered: The overlay list (production e2e kind kind-local local-dev) exactly matches the directories that exist in components/manifests/overlays/. No overlays are missed or invented.
  • Concurrency control is correct: The cancel-in-progress: true pattern and the PR-number-aware group key exactly match the convention used in lint.yml and other workflows — no drift.
  • Path filters are precise: The workflow only triggers on actual manifest or workflow-file changes, keeping unrelated PRs from accumulating noise.
  • workflow_dispatch included: Allows engineers to manually re-run the check for debugging, consistent with the rest of the repo's workflow style.
  • oc kustomize instead of kustomize build: Using the OpenShift CLI's built-in kustomize avoids installing a separate binary and ensures parity with how the manifests are applied in production.

Recommendations

  1. Fix the output suppression (Major) — remove the > /dev/null redirect so error messages are visible in failing CI runs.
  2. Pin the oc version (Minor) — replace oc: latest with a specific version string to prevent silent upgrades from breaking the workflow.

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Claude Code Review

Summary

This PR adds a GitHub Actions workflow to validate kustomize build for all five manifest overlays (production, e2e, kind, kind-local, local-dev) before merge. The implementation is clean, the scope is well-targeted, and it closes a real CI coverage gap. Two issues worth addressing before merge: a non-pinned tool version and a hardcoded overlay list that will silently miss future additions.

Issues by Severity

Blocker Issues

None.

Critical Issues

None.

Major Issues

1. oc: latest is unpinned — non-deterministic builds
File: .github/workflows/kustomize-check.yml, line 28

Using latest for the oc installer means any new oc release can silently change behavior or break the workflow without notice. Violates the reproducibility principle for CI.

Suggested fix:

- name: Install oc
  uses: redhat-actions/openshift-tools-installer@v1
  with:
    oc: "4.17"   # pin to the version the project currently targets

2. Hardcoded overlay list will silently miss future overlays
File: .github/workflows/kustomize-check.yml, line 33

The loop hardcodes production e2e kind kind-local local-dev. If a new overlay is added under components/manifests/overlays/, this check will not cover it — the exact silent-failure scenario this PR is trying to prevent.

Suggested fix — discover overlays dynamically:

for overlay in components/manifests/overlays/*/; do
  overlay_name=$(basename "$overlay")
  echo "=== $overlay_name ==="
  if oc kustomize "$overlay" > /dev/null; then

Minor Issues

1. No explicit permissions block
File: .github/workflows/kustomize-check.yml (workflow level)

GitHub Actions security best practice is to declare minimal explicit permissions. A read-only kustomize check only needs contents: read.

permissions:
  contents: read

2. Double-run on PR branches

With both pull_request and push using the same path filters, the workflow runs twice for each push to a PR branch. Consider restricting the push trigger to main:

push:
  branches:
    - main
  paths:
    - components/manifests/**
    - .github/workflows/kustomize-check.yml

Positive Highlights

  • Closes a real gap: three of five overlays (kind, kind-local, local-dev) had zero CI validation — this is a meaningful improvement to merge safety.
  • Shell script logic is correct: The failed=0 accumulator pattern properly runs all overlays before exiting, giving a full report rather than stopping on first failure.
  • Excellent path filtering: The paths: trigger ensures the workflow only fires on relevant changes — no wasted CI cycles.
  • cancel-in-progress: true: Good practice; prevents pile-up of stale runs when manifest changes are pushed rapidly.

Recommendations

  1. (Major) Pin the oc version to a specific release for reproducible builds.
  2. (Major) Replace the hardcoded overlay list with dynamic discovery so future overlays are covered automatically.
  3. (Minor) Add a top-level permissions: contents: read block.
  4. (Minor) Scope the push trigger to main to avoid double-running on PR branches.

🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add CI coverage for kustomize changes

1 participant