diff --git a/COMMIT_MSG.txt b/COMMIT_MSG.txt new file mode 100644 index 000000000..04716dfe8 --- /dev/null +++ b/COMMIT_MSG.txt @@ -0,0 +1,43 @@ +feat(manifests): production overlay for OpenShift (LVM, SCC, credentials, DB init) + +Add and wire production overlay resources and patches for deploying the +Ambient Code Platform on OpenShift (e.g. chadsno2026) with LVM Storage, +restricted SCC, and shared PostgreSQL. + +Storage (LVM) +- Add PVC patches for backend, postgresql, minio, and ambient-api-server-db + to use storageClassName: lvms-vg1 (LVM Storage) instead of default. +- See README-storage.md for LVMCluster and volume setup. + +Credentials +- postgresql-credentials: db.host/postgresql, postgres/postgres123, db.name postgres. +- minio-credentials: MinIO credentials for state sync. +- unleash-credentials: DATABASE_URL to shared postgresql/unleash, API tokens, + default admin password, database-ssl. + +ServiceAccounts and SCC (nonroot) +- postgresql-sa, minio-sa, ambient-api-server-db-sa: dedicated SAs for stateful + workloads so they can use nonroot SCC (no seccomp in patch; nonroot forbids it). +- postgresql-fsgroup-patch, minio-fsgroup-patch: fsGroup and runAsUser for RHEL + compatibility and volume permissions. +- ambient-api-server-db-scc-patch: runAsUser and securityContext for + ambient-api-server-db to run under nonroot. +- README-SCC.md: oc adm policy add-scc-to-user nonroot and rollout restart steps. + +ambient-api-server +- ambient-api-server-wait-db-patch: add wait-for-db init container (pg_isready + loop) and keep migration init so API server starts only after DB is ready. + +Unleash +- unleash-init-db-patch: init container (RHEL postgresql-16 image) that waits + for shared PostgreSQL, creates database "unleash" if missing, then verifies + connectivity to the unleash database (retries) before main container starts. +- kustomization: register credentials resources and all patches (PVC, SA, SCC, + wait-db, fsgroup, unleash-init-db); postgresql-json-patch removed (use + postgres:16 with fsGroup/runAsUser instead of RHEL image for shared Postgres). + +Docs +- README-vertex.md: Vertex/Google Cloud notes (unchanged from existing content). + +Files: 18 changed (README-SCC, README-vertex, *-sa, *-scc-patch, *-wait-db-patch, +kustomization, *-credentials, *-fsgroup-patch, pvc-patch-*, unleash-init-db-patch). diff --git a/components/manifests/overlays/production/README-SCC.md b/components/manifests/overlays/production/README-SCC.md new file mode 100644 index 000000000..79cc3d0a0 --- /dev/null +++ b/components/manifests/overlays/production/README-SCC.md @@ -0,0 +1,19 @@ +# OpenShift Security Context Constraints (SCC) + +The **postgresql**, **minio**, and **ambient-api-server-db** deployments run as fixed UIDs (999 or 1000) for their data volumes. OpenShift's default **restricted-v2** SCC only allows UIDs in the namespace range, so these pods must use the **nonroot** SCC. + +## One-time grant (cluster-admin) + +After deploying, grant the nonroot SCC to all three service accounts: + +```bash +oc adm policy add-scc-to-user nonroot -z postgresql -n ambient-code +oc adm policy add-scc-to-user nonroot -z minio -n ambient-code +oc adm policy add-scc-to-user nonroot -z ambient-api-server-db -n ambient-code +``` + +Then restart the deployments so pods are recreated with the new SCC: + +```bash +oc rollout restart deployment/postgresql deployment/minio deployment/ambient-api-server-db -n ambient-code +``` diff --git a/components/manifests/overlays/production/README-vertex.md b/components/manifests/overlays/production/README-vertex.md new file mode 100644 index 000000000..444d0a554 --- /dev/null +++ b/components/manifests/overlays/production/README-vertex.md @@ -0,0 +1,76 @@ +# Vertex AI (ANTHROPIC_VERTEX_PROJECT_ID) on OpenShift + +The production overlay uses Vertex AI by default (`USE_VERTEX=1`). You must set your GCP project ID and provide credentials. + +## 1. Set your project ID and region + +Patch the operator ConfigMap (replace with your values): + +```bash +export KUBECONFIG=~/.kube/kubeconfig-noingress +oc patch configmap operator-config -n ambient-code --type merge -p '{ + "data": { + "ANTHROPIC_VERTEX_PROJECT_ID": "YOUR_GCP_PROJECT_ID", + "CLOUD_ML_REGION": "us-central1" + } +}' +``` + +- **ANTHROPIC_VERTEX_PROJECT_ID**: Your Google Cloud project ID where Claude is enabled on Vertex AI. +- **CLOUD_ML_REGION**: Vertex AI region (e.g. `us-central1`, `europe-west1`, or `global` for some setups). + +## 2. Create the GCP credentials secret + +The operator and runner need a GCP service account key (or Application Default Credentials file) to call Vertex AI. + +**Option A – Application Default Credentials (e.g. from your laptop):** + +```bash +gcloud auth application-default login +oc create secret generic ambient-vertex -n ambient-code \ + --from-file=ambient-code-key.json="$HOME/.config/gcloud/application_default_credentials.json" \ + --dry-run=client -o yaml | oc apply -f - +``` + +**Option B – Service account key file:** + +```bash +oc create secret generic ambient-vertex -n ambient-code \ + --from-file=ambient-code-key.json=/path/to/your-service-account-key.json +``` + +The key file must be the JSON for a service account that has Vertex AI (and optionally Model Garden) permissions. + +## 3. Restart the operator + +So it picks up the updated ConfigMap and uses the new project ID for new sessions: + +```bash +oc rollout restart deployment/agentic-operator -n ambient-code +oc rollout status deployment/agentic-operator -n ambient-code --timeout=120s +``` + +## 4. Verify + +- In the UI, create or open a session; it should use Vertex AI for that project. +- Check operator logs: + `oc logs -l app=agentic-operator -n ambient-code --tail=50 | grep -i vertex` + +## Optional: change the default in the overlay + +To bake your project ID into the overlay (e.g. for Git-managed deploys), edit: + +- `overlays/production/operator-config-openshift.yaml` + +Set `ANTHROPIC_VERTEX_PROJECT_ID` and `CLOUD_ML_REGION` to your values, then re-apply the overlay. + +## Disable Vertex AI (use direct Anthropic API) + +To use an Anthropic API key instead of Vertex: + +```bash +oc patch configmap operator-config -n ambient-code --type merge -p '{"data":{"USE_VERTEX":"0"}}' +oc rollout restart deployment/agentic-operator -n ambient-code +``` + +Then configure `ANTHROPIC_API_KEY` in the UI (Settings → Runner Secrets) per project. diff --git a/components/manifests/overlays/production/ambient-api-server-db-sa.yaml b/components/manifests/overlays/production/ambient-api-server-db-sa.yaml new file mode 100644 index 000000000..2dd48f836 --- /dev/null +++ b/components/manifests/overlays/production/ambient-api-server-db-sa.yaml @@ -0,0 +1,10 @@ +# ServiceAccount for ambient-api-server-db (PostgreSQL sidecar). +# Grant nonroot SCC so it can run as UID 999 (postgres): +# oc adm policy add-scc-to-user nonroot -z ambient-api-server-db -n ambient-code +apiVersion: v1 +kind: ServiceAccount +metadata: + name: ambient-api-server-db + labels: + app: ambient-api-server + component: database diff --git a/components/manifests/overlays/production/ambient-api-server-db-scc-patch.yaml b/components/manifests/overlays/production/ambient-api-server-db-scc-patch.yaml new file mode 100644 index 000000000..0a44bc45a --- /dev/null +++ b/components/manifests/overlays/production/ambient-api-server-db-scc-patch.yaml @@ -0,0 +1,22 @@ +# Use dedicated SA (granted nonroot SCC) and add seccomp for PodSecurity. +# After deploy: oc adm policy add-scc-to-user nonroot -z ambient-api-server-db -n ambient-code +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ambient-api-server-db +spec: + template: + spec: + serviceAccountName: ambient-api-server-db + securityContext: + runAsNonRoot: true + fsGroup: 999 + containers: + - name: postgresql + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + runAsNonRoot: true + runAsUser: 999 diff --git a/components/manifests/overlays/production/ambient-api-server-wait-db-patch.yaml b/components/manifests/overlays/production/ambient-api-server-wait-db-patch.yaml new file mode 100644 index 000000000..2cf9cec79 --- /dev/null +++ b/components/manifests/overlays/production/ambient-api-server-wait-db-patch.yaml @@ -0,0 +1,43 @@ +# Add init container to wait for ambient-api-server-db before running migration. +# Uses postgres image for pg_isready; DB user from secret (default: ambient). +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ambient-api-server +spec: + template: + spec: + initContainers: + - name: wait-for-db + image: postgres:16 + command: + - sh + - -c + - | + until pg_isready -h ambient-api-server-db -p 5432 -U ambient 2>/dev/null; do + echo "waiting for ambient-api-server-db..." + sleep 2 + done + echo "db ready" + - name: migration + image: quay.io/ambient_code/vteam_api_server:latest + imagePullPolicy: IfNotPresent + command: + - /usr/local/bin/ambient-api-server + - migrate + - --db-host-file=/secrets/db/db.host + - --db-port-file=/secrets/db/db.port + - --db-user-file=/secrets/db/db.user + - --db-password-file=/secrets/db/db.password + - --db-name-file=/secrets/db/db.name + - --alsologtostderr + - -v=10 + volumeMounts: + - name: db-secrets + mountPath: /secrets/db + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: false + capabilities: + drop: + - ALL diff --git a/components/manifests/overlays/production/kustomization.yaml b/components/manifests/overlays/production/kustomization.yaml index 07b5829ef..7d64b3960 100644 --- a/components/manifests/overlays/production/kustomization.yaml +++ b/components/manifests/overlays/production/kustomization.yaml @@ -12,14 +12,21 @@ namespace: ambient-code # Manage this secret separately: oc apply -f github-app-secret.yaml -n ambient-code resources: - ../../base +- postgresql-credentials.yaml +- minio-credentials.yaml +- unleash-credentials.yaml - ambient-api-server-route.yaml - route.yaml - backend-route.yaml - public-api-route.yaml - unleash-route.yaml - operator-config-openshift.yaml +- ambient-api-server-db-sa.yaml +- postgresql-sa.yaml +- minio-sa.yaml # Patches for production environment +# Storage: PVC patches use OpenShift local storage (lvms-vg1). See README-storage.md. # Unleash: init container to create database (RHEL doesn't support init scripts) # PostgreSQL: use RHEL image with proper env vars and mount paths (JSON patch) patches: @@ -27,6 +34,34 @@ patches: target: kind: Namespace name: ambient-code +- path: pvc-patch-backend.yaml + target: + kind: PersistentVolumeClaim + name: backend-state-pvc +- path: pvc-patch-postgresql.yaml + target: + kind: PersistentVolumeClaim + name: postgresql-data +- path: pvc-patch-minio.yaml + target: + kind: PersistentVolumeClaim + name: minio-data +- path: pvc-patch-ambient-api-server-db.yaml + target: + kind: PersistentVolumeClaim + name: ambient-api-server-db-data +- path: ambient-api-server-db-scc-patch.yaml + target: + group: apps + kind: Deployment + name: ambient-api-server-db + version: v1 +- path: ambient-api-server-wait-db-patch.yaml + target: + group: apps + kind: Deployment + name: ambient-api-server + version: v1 - path: frontend-oauth-deployment-patch.yaml target: kind: Deployment @@ -35,12 +70,19 @@ patches: target: kind: Service name: frontend-service -- path: postgresql-json-patch.yaml +# RHEL postgresql-json-patch removed: RHEL image needs /var/lib/pgsql created as root; use standard postgres:16 (runAsUser 999 + fsGroup 999) instead. +- path: postgresql-fsgroup-patch.yaml target: group: apps kind: Deployment name: postgresql version: v1 +- path: minio-fsgroup-patch.yaml + target: + group: apps + kind: Deployment + name: minio + version: v1 - path: unleash-init-db-patch.yaml target: group: apps diff --git a/components/manifests/overlays/production/minio-credentials.yaml b/components/manifests/overlays/production/minio-credentials.yaml new file mode 100644 index 000000000..f50d8e727 --- /dev/null +++ b/components/manifests/overlays/production/minio-credentials.yaml @@ -0,0 +1,12 @@ +# MinIO credentials for production +# Change root-password and secret-key in production +apiVersion: v1 +kind: Secret +metadata: + name: minio-credentials +type: Opaque +stringData: + root-user: "admin" + root-password: "changeme123" + access-key: "admin" + secret-key: "changeme123" diff --git a/components/manifests/overlays/production/minio-fsgroup-patch.yaml b/components/manifests/overlays/production/minio-fsgroup-patch.yaml new file mode 100644 index 000000000..933889091 --- /dev/null +++ b/components/manifests/overlays/production/minio-fsgroup-patch.yaml @@ -0,0 +1,21 @@ +# OpenShift: SA (nonroot SCC) + fsGroup + PodSecurity (seccomp, allowPrivilegeEscalation, capabilities, runAsNonRoot) +# Grant nonroot SCC: oc adm policy add-scc-to-user nonroot -z minio -n ambient-code +apiVersion: apps/v1 +kind: Deployment +metadata: + name: minio +spec: + template: + spec: + serviceAccountName: minio + securityContext: + fsGroup: 1000 + containers: + - name: minio + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + runAsNonRoot: true + runAsUser: 1000 diff --git a/components/manifests/overlays/production/minio-sa.yaml b/components/manifests/overlays/production/minio-sa.yaml new file mode 100644 index 000000000..4b955ccb7 --- /dev/null +++ b/components/manifests/overlays/production/minio-sa.yaml @@ -0,0 +1,8 @@ +# ServiceAccount for MinIO (runs as UID 1000). +# Grant nonroot SCC: oc adm policy add-scc-to-user nonroot -z minio -n ambient-code +apiVersion: v1 +kind: ServiceAccount +metadata: + name: minio + labels: + app: minio diff --git a/components/manifests/overlays/production/postgresql-credentials.yaml b/components/manifests/overlays/production/postgresql-credentials.yaml new file mode 100644 index 000000000..2a08e31c2 --- /dev/null +++ b/components/manifests/overlays/production/postgresql-credentials.yaml @@ -0,0 +1,16 @@ +# PostgreSQL credentials for production +# Change db.password and ensure unleash-credentials database-url matches +apiVersion: v1 +kind: Secret +metadata: + name: postgresql-credentials + labels: + app: postgresql + app.kubernetes.io/name: postgresql +type: Opaque +stringData: + db.host: "postgresql" + db.port: "5432" + db.name: "postgres" + db.user: "postgres" + db.password: "postgres123" diff --git a/components/manifests/overlays/production/postgresql-fsgroup-patch.yaml b/components/manifests/overlays/production/postgresql-fsgroup-patch.yaml new file mode 100644 index 000000000..ccf8509fd --- /dev/null +++ b/components/manifests/overlays/production/postgresql-fsgroup-patch.yaml @@ -0,0 +1,21 @@ +# OpenShift: SA (nonroot SCC) + fsGroup + PodSecurity (seccomp, allowPrivilegeEscalation, capabilities, runAsNonRoot) +# Grant nonroot SCC: oc adm policy add-scc-to-user nonroot -z postgresql -n ambient-code +apiVersion: apps/v1 +kind: Deployment +metadata: + name: postgresql +spec: + template: + spec: + serviceAccountName: postgresql + securityContext: + fsGroup: 999 + containers: + - name: postgresql + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + runAsNonRoot: true + runAsUser: 999 diff --git a/components/manifests/overlays/production/postgresql-sa.yaml b/components/manifests/overlays/production/postgresql-sa.yaml new file mode 100644 index 000000000..4495ce21c --- /dev/null +++ b/components/manifests/overlays/production/postgresql-sa.yaml @@ -0,0 +1,9 @@ +# ServiceAccount for shared PostgreSQL (runs as UID 999). +# Grant nonroot SCC: oc adm policy add-scc-to-user nonroot -z postgresql -n ambient-code +apiVersion: v1 +kind: ServiceAccount +metadata: + name: postgresql + labels: + app: postgresql + app.kubernetes.io/name: postgresql diff --git a/components/manifests/overlays/production/pvc-patch-ambient-api-server-db.yaml b/components/manifests/overlays/production/pvc-patch-ambient-api-server-db.yaml new file mode 100644 index 000000000..19c1f69e5 --- /dev/null +++ b/components/manifests/overlays/production/pvc-patch-ambient-api-server-db.yaml @@ -0,0 +1,6 @@ +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: ambient-api-server-db-data +spec: + storageClassName: lvms-vg1 diff --git a/components/manifests/overlays/production/pvc-patch-backend.yaml b/components/manifests/overlays/production/pvc-patch-backend.yaml new file mode 100644 index 000000000..0a2502918 --- /dev/null +++ b/components/manifests/overlays/production/pvc-patch-backend.yaml @@ -0,0 +1,6 @@ +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: backend-state-pvc +spec: + storageClassName: lvms-vg1 diff --git a/components/manifests/overlays/production/pvc-patch-minio.yaml b/components/manifests/overlays/production/pvc-patch-minio.yaml new file mode 100644 index 000000000..e7e1e6c9b --- /dev/null +++ b/components/manifests/overlays/production/pvc-patch-minio.yaml @@ -0,0 +1,6 @@ +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: minio-data +spec: + storageClassName: lvms-vg1 diff --git a/components/manifests/overlays/production/pvc-patch-postgresql.yaml b/components/manifests/overlays/production/pvc-patch-postgresql.yaml new file mode 100644 index 000000000..634dd7cdb --- /dev/null +++ b/components/manifests/overlays/production/pvc-patch-postgresql.yaml @@ -0,0 +1,6 @@ +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: postgresql-data +spec: + storageClassName: lvms-vg1 diff --git a/components/manifests/overlays/production/unleash-credentials.yaml b/components/manifests/overlays/production/unleash-credentials.yaml new file mode 100644 index 000000000..986922321 --- /dev/null +++ b/components/manifests/overlays/production/unleash-credentials.yaml @@ -0,0 +1,22 @@ +# Unleash credentials for production overlay +# Uses shared PostgreSQL; database-url password must match postgresql-credentials +# Change default-admin-password and API tokens in production +apiVersion: v1 +kind: Secret +metadata: + name: unleash-credentials + labels: + app: unleash + app.kubernetes.io/name: unleash +type: Opaque +stringData: + database-url: "postgres://postgres:postgres123@postgresql:5432/unleash" + database-ssl: "false" + unleash-url: "http://unleash:4242/api" + unleash-admin-url: "http://unleash:4242" + admin-api-token: "*:*.unleash-admin-token" + client-api-token: "default:development.unleash-client-token" + frontend-api-token: "default:development.unleash-frontend-token" + default-admin-password: "unleash-dev-password" + unleash-project: "default" + unleash-environment: "development" diff --git a/components/manifests/overlays/production/unleash-init-db-patch.yaml b/components/manifests/overlays/production/unleash-init-db-patch.yaml index e716e5949..a3cf3758d 100644 --- a/components/manifests/overlays/production/unleash-init-db-patch.yaml +++ b/components/manifests/overlays/production/unleash-init-db-patch.yaml @@ -37,6 +37,21 @@ spec: psql -h "$PGHOST" -U "$PGUSER" -c "CREATE DATABASE unleash;" echo "Database 'unleash' created successfully" fi + + echo "Verifying database 'unleash' is visible..." + for i in 1 2 3 4 5; do + if psql -h "$PGHOST" -U "$PGUSER" -d unleash -c "SELECT 1" >/dev/null 2>&1; then + echo "Database 'unleash' verified (attempt $i)" + break + fi + echo "Database not yet visible, retry $i/5..." + sleep 2 + done + if ! psql -h "$PGHOST" -U "$PGUSER" -d unleash -c "SELECT 1" >/dev/null 2>&1; then + echo "ERROR: Could not connect to database 'unleash' after retries" + exit 1 + fi + echo "Init-db complete" env: - name: PGHOST valueFrom: