Skip to content

feat: automated Vertex AI model discovery with feature-flag-gated availability #667

@jeremyeder

Description

@jeremyeder

Problem

Adding a new model to ACP requires code changes across 5+ files in 3 components (frontend dropdown, runner model map, tests), a CI build, and a release. This has happened repeatedly:

IT controls which models are available in our Vertex AI project. When they enable a new model, we shouldn't need a code change to surface it to users. Model availability should be a runtime configuration concern, not a build artifact.

Requirements

1. Automated model discovery

A GitHub Action (daily cron + manual trigger) that probes our Vertex AI project to determine which models are currently accessible. Google does not provide a "list available publisher models" API — the only reliable method is probing inference endpoints with minimal requests:

Publisher Endpoint Available Not available
Anthropic (Claude) publishers/anthropic/models/{id}:rawPredict HTTP 200 or 400 HTTP 404
Google (Gemini) publishers/google/models/{id}:generateContent HTTP 200 or 400 HTTP 404
Google (Imagen) publishers/google/models/{id}:predict HTTP 200 or 400 HTTP 404

This technique was validated against project gcp-jboyer-san-gemini / region us-east5. Cost is negligible (~$0.001 per full scan). The probe payloads are minimal (5 tokens max output).

The list of model IDs to probe should be maintained in a single manifest file. When new model families are announced, this file is the only thing that needs updating — and it does not require a release.

2. Feature-flag-gated model availability

Discovered models should be tied to Unleash feature flags following our existing convention (#653):

  • Flag naming: models.<model-slug>.enabled
  • Tagged scope: workspace so workspace admins can enable/disable per-workspace
  • Newly discovered models should be disabled by default (opt-in, not opt-out)
  • The GHA should auto-create flags for newly discovered models via the Unleash Admin API

3. Dynamic model serving (no hardcoded lists)

The frontend model dropdown and runner model-to-Vertex-ID mapping must be driven by runtime configuration, not hardcoded arrays. The backend should serve available models by combining:

  • A model registry (metadata: display name, Vertex AI ID, publisher, sort order)
  • Unleash flag evaluation (is this model enabled for this workspace?)

The frontend and runner consume this API instead of maintaining their own static lists.

4. Zero-release model enablement

After this is implemented, the lifecycle for a new model should be:

  1. IT enables model in Vertex AI → nothing happens yet
  2. GHA runs (daily) → discovers model → creates disabled Unleash flag + updates registry
  3. Admin enables flag in Unleash → model appears in ACP
  4. No code changes. No releases. No PRs.

Scope

  • In scope: Claude models (runner supports these today), plus discovery of Gemini/Imagen/embeddings for future use
  • Out of scope: Runner support for non-Claude models (Gemini, Llama, Mistral) — these should be discovered but flagged off by default
  • Out of scope: Multi-region probing — single configurable region for now
  • Runner constraint: Only Claude models work via Claude Agent SDK. Non-Claude models in the dropdown without runner support would confuse users. Use defaultEnabled: false for non-Claude.

Key Architecture Decisions to Make

  • Where to store model metadata (Vertex ID, display name, publisher): ConfigMap, database, Unleash variants, or a committed JSON file? The registry needs to be updatable without a release.
  • How the GHA authenticates to GCP (service account), Unleash (admin token), and optionally K8s (if updating a ConfigMap directly).
  • How the runner resolves Vertex IDs dynamically — it currently uses a hardcoded dict. Options: fetch from backend API at startup, read from a mounted ConfigMap, or accept the full Vertex ID from the operator.

References

Acceptance Criteria

  • A GHA runs daily, probes Vertex AI, and reports which models are available
  • New models discovered by the GHA get Unleash flags created automatically (disabled by default)
  • Frontend model dropdown is populated from backend API, not hardcoded
  • Runner resolves model-to-Vertex-ID mapping dynamically, not from a hardcoded dict
  • Enabling a model flag in Unleash makes it appear in the frontend without any code change or deployment
  • Disabling a model flag removes it from the frontend
  • Existing models continue to work throughout the migration (backward compatible)
  • The probe manifest (list of model IDs to check) is a single maintainable file

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions