-
Notifications
You must be signed in to change notification settings - Fork 54
Description
Problem
Adding a new model to ACP requires code changes across 5+ files in 3 components (frontend dropdown, runner model map, tests), a CI build, and a release. This has happened repeatedly:
- feat: add Claude Opus 4.6 model support #581 — Added Opus 4.6 (frontend + runner + tests)
- fix: map Opus 4.6 to @default for Vertex AI #611 — Fixed Opus 4.6 Vertex ID mapping
- [Amber] Fix: [Amber] add opus-4.5 to the list of models available to ACP users. #398 — Added Opus 4.5 (via Amber bot, same pattern)
IT controls which models are available in our Vertex AI project. When they enable a new model, we shouldn't need a code change to surface it to users. Model availability should be a runtime configuration concern, not a build artifact.
Requirements
1. Automated model discovery
A GitHub Action (daily cron + manual trigger) that probes our Vertex AI project to determine which models are currently accessible. Google does not provide a "list available publisher models" API — the only reliable method is probing inference endpoints with minimal requests:
| Publisher | Endpoint | Available | Not available |
|---|---|---|---|
| Anthropic (Claude) | publishers/anthropic/models/{id}:rawPredict |
HTTP 200 or 400 | HTTP 404 |
| Google (Gemini) | publishers/google/models/{id}:generateContent |
HTTP 200 or 400 | HTTP 404 |
| Google (Imagen) | publishers/google/models/{id}:predict |
HTTP 200 or 400 | HTTP 404 |
This technique was validated against project gcp-jboyer-san-gemini / region us-east5. Cost is negligible (~$0.001 per full scan). The probe payloads are minimal (5 tokens max output).
The list of model IDs to probe should be maintained in a single manifest file. When new model families are announced, this file is the only thing that needs updating — and it does not require a release.
2. Feature-flag-gated model availability
Discovered models should be tied to Unleash feature flags following our existing convention (#653):
- Flag naming:
models.<model-slug>.enabled - Tagged
scope: workspaceso workspace admins can enable/disable per-workspace - Newly discovered models should be disabled by default (opt-in, not opt-out)
- The GHA should auto-create flags for newly discovered models via the Unleash Admin API
3. Dynamic model serving (no hardcoded lists)
The frontend model dropdown and runner model-to-Vertex-ID mapping must be driven by runtime configuration, not hardcoded arrays. The backend should serve available models by combining:
- A model registry (metadata: display name, Vertex AI ID, publisher, sort order)
- Unleash flag evaluation (is this model enabled for this workspace?)
The frontend and runner consume this API instead of maintaining their own static lists.
4. Zero-release model enablement
After this is implemented, the lifecycle for a new model should be:
- IT enables model in Vertex AI → nothing happens yet
- GHA runs (daily) → discovers model → creates disabled Unleash flag + updates registry
- Admin enables flag in Unleash → model appears in ACP
- No code changes. No releases. No PRs.
Scope
- In scope: Claude models (runner supports these today), plus discovery of Gemini/Imagen/embeddings for future use
- Out of scope: Runner support for non-Claude models (Gemini, Llama, Mistral) — these should be discovered but flagged off by default
- Out of scope: Multi-region probing — single configurable region for now
- Runner constraint: Only Claude models work via Claude Agent SDK. Non-Claude models in the dropdown without runner support would confuse users. Use
defaultEnabled: falsefor non-Claude.
Key Architecture Decisions to Make
- Where to store model metadata (Vertex ID, display name, publisher): ConfigMap, database, Unleash variants, or a committed JSON file? The registry needs to be updatable without a release.
- How the GHA authenticates to GCP (service account), Unleash (admin token), and optionally K8s (if updating a ConfigMap directly).
- How the runner resolves Vertex IDs dynamically — it currently uses a hardcoded dict. Options: fetch from backend API at startup, read from a mounted ConfigMap, or accept the full Vertex ID from the operator.
References
- docs: add Unleash feature flags documentation and deployment scripts #653 — Unleash feature flags infrastructure
- Claude/daily sdk update action #661 — Daily GHA pattern (Claude Agent SDK update) — good template for the workflow structure
- feat: add Claude Opus 4.6 model support #581, fix: map Opus 4.6 to @default for Vertex AI #611, [Amber] Fix: [Amber] add opus-4.5 to the list of models available to ACP users. #398 — Examples of the current manual model addition process
- Vertex AI model versions docs — canonical list of model IDs and version suffixes
Acceptance Criteria
- A GHA runs daily, probes Vertex AI, and reports which models are available
- New models discovered by the GHA get Unleash flags created automatically (disabled by default)
- Frontend model dropdown is populated from backend API, not hardcoded
- Runner resolves model-to-Vertex-ID mapping dynamically, not from a hardcoded dict
- Enabling a model flag in Unleash makes it appear in the frontend without any code change or deployment
- Disabling a model flag removes it from the frontend
- Existing models continue to work throughout the migration (backward compatible)
- The probe manifest (list of model IDs to check) is a single maintainable file