Skip to content

feat(go/plugins/vertexai/modelgarden): add Claude Opus 4.5, Llama, shared scaffolding#5296

Draft
cabljac wants to merge 3 commits into
mainfrom
feat/go-modelgarden-compat-oai
Draft

feat(go/plugins/vertexai/modelgarden): add Claude Opus 4.5, Llama, shared scaffolding#5296
cabljac wants to merge 3 commits into
mainfrom
feat/go-modelgarden-compat-oai

Conversation

@cabljac
Copy link
Copy Markdown
Contributor

@cabljac cabljac commented May 11, 2026

Extends the Vertex AI Model Garden Go plugin with new Anthropic Claude models, a new Meta Llama plugin, and shared scaffolding used by all modelgarden plugins.

This PR is the first half of the work originally proposed in #5175. The Mistral plugin (which requires a different transport approach because Vertex does not serve Mistral via the OpenAI-compatible endpoint) is stacked on top of this PR in a follow-up.

Models

  • Anthropic (existing plugin, more models): claude-opus-4-5@20251101, claude-opus-4-1-20250805, claude-sonnet-4-5-20250929, claude-haiku-4-5-20251001, plus the previously supported Claude 3.5 / 3.7 / Opus 4 / Sonnet 4 entries.
  • Meta Llama (MaaS, new plugin): meta/llama-4-maverick-17b-128e-instruct-maas, meta/llama-4-scout-17b-16e-instruct-maas, meta/llama-3.3-70b-instruct-maas. Llama 4 variants register as Multimodal; Llama 3.3 70B is text-only. Routed through compat_oai against the Vertex MaaS OpenAI-compatible endpoint.

Plugin scaffolding

  • New Llama plugin type alongside the existing Anthropic, with Init, Name, DefineModel(name, opts), and a top-level LlamaModel(g, id) lookup mirroring the Anthropic shape.
  • Shared resolveVertexMaasEnv helper for GOOGLE_CLOUD_PROJECT / GCLOUD_PROJECT / GOOGLE_CLOUD_LOCATION / GOOGLE_CLOUD_REGION resolution.
  • Shared provider = "vertexai" constant so every modelgarden plugin registers models under the same vertexai/<model> namespace.
  • Anthropic.DefineModel now takes the same mutex and initted guard as Llama.DefineModel, so calling it before Init returns an explicit "not initialized" error instead of using a zero-value client.

Sample

go/samples/modelgarden registers two flows for Dev UI smoke testing:

  • opus45Flowclaude-opus-4-5@20251101
  • llamaFlowmeta/llama-3.3-70b-instruct-maas

Each plugin is constructed with an explicit Location because Vertex MaaS regional availability differs per publisher (Anthropic in us-east5, Llama in us-central1).

Tests

  • define_model_test.go — pre-Init error path for both plugins.
  • internal_test.go — nil ai.ModelOptions branch of DefineModel for both plugins.
  • models_test.goresolveVertexMaasEnv covered for explicit args, primary / secondary env fallback, and panic paths.
  • llama_live_test.go — basic + streaming generation against Llama 3.3 70B (credential-gated).
  • anthropic_live_test.go — Opus 4.5 subtest (credential-gated).

Test plan

  • go test ./plugins/vertexai/modelgarden/... (unit, no creds)
  • Live: GOOGLE_CLOUD_PROJECT=… GOOGLE_CLOUD_LOCATION=… go test -v -run 'TestAnthropicLive|TestLlamaLive' ./plugins/vertexai/modelgarden/...
  • Dev UI smoke: cd go/samples/modelgarden && GOOGLE_CLOUD_PROJECT=<project> genkit start -- go run ., then run each flow from Dev UI.

…ared scaffolding

Extends the Vertex AI Model Garden Go plugin with new Anthropic Claude
models and a new Meta Llama plugin, plus shared scaffolding used by all
modelgarden plugins.

Models:
- Anthropic: claude-opus-4-5@20251101, claude-opus-4-1-20250805,
  claude-sonnet-4-5-20250929, claude-haiku-4-5-20251001, plus the
  previously supported 3.5 / 3.7 / Opus 4 / Sonnet 4 entries.
- Meta Llama (MaaS, new plugin): meta/llama-4-maverick-17b-128e-
  instruct-maas, meta/llama-4-scout-17b-16e-instruct-maas,
  meta/llama-3.3-70b-instruct-maas. Llama 4 variants register as
  Multimodal; Llama 3.3 70B is text-only. Routed through compat_oai
  against the Vertex MaaS OpenAI-compatible endpoint.

Plugin scaffolding:
- New Llama plugin alongside the existing Anthropic, with Init, Name,
  DefineModel(name, opts), and a top-level LlamaModel(g, id) lookup
  mirroring the Anthropic shape.
- Shared resolveVertexMaasEnv helper for GOOGLE_CLOUD_PROJECT /
  GCLOUD_PROJECT / GOOGLE_CLOUD_LOCATION / GOOGLE_CLOUD_REGION
  resolution.
- Shared provider = "vertexai" constant so every modelgarden plugin
  registers models under the same vertexai/<model> namespace.
- Anthropic.DefineModel now takes the same mutex and initted guard as
  Llama.DefineModel, so calling it before Init returns an explicit
  "not initialized" error instead of using a zero-value client.

Sample:
- go/samples/modelgarden registers an Anthropic and a Llama flow for
  Dev UI smoke testing. Each plugin is constructed with an explicit
  Location because Vertex MaaS regional availability differs per
  publisher (Anthropic in us-east5, Llama in us-central1).

Tests:
- define_model_test.go covers the pre-Init error path for both plugins.
- internal_test.go covers the nil ai.ModelOptions branch of DefineModel
  for both plugins.
- models_test.go covers resolveVertexMaasEnv for explicit args, primary
  and secondary env fallback, and panic paths.
- llama_live_test.go exercises basic + streaming generation against
  meta/llama-3.3-70b-instruct-maas (credential-gated).
- anthropic_live_test.go adds a claude-opus-4-5@20251101 subtest
  (credential-gated).
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Llama plugin for Vertex AI Model Garden, which leverages OpenAI-compatible endpoints and Google OAuth2 authentication. It also refactors environment variable resolution into a shared helper function used by both the Anthropic and Llama plugins. Additionally, the Anthropic plugin was updated with initialization checks in DefineModel and support for a new model version. Comprehensive unit, white-box, and live tests were added to ensure the reliability of the new features. Feedback was provided regarding the Llama plugin's initialization, specifically suggesting the use of context.Background() for the OAuth2 client to prevent issues with token refreshes if the initial context is short-lived.

Comment thread go/plugins/vertexai/modelgarden/llama.go Outdated
Match the existing per-plugin convention seen in go/samples/anthropic
and go/samples/compat_oai/{anthropic,custom,openai}. The existing
go/samples/modelgarden sample stays single-flow Anthropic-only; a new
go/samples/modelgarden-llama sample exercises the new Llama plugin in
isolation.
@cabljac cabljac marked this pull request as draft May 11, 2026 16:25
…a oauth2 client

The oauth2.NewClient and TokenSource outlive Init's ctx because every
later generate call goes through them. If a caller passes a short-lived
ctx to Init (e.g. one with a timeout, or one cancelled after plugin
setup), token refresh on later calls would fail with the original ctx
cancelled. Bind both to context.Background().

Reported by gemini-code-assist on #5296.
@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 11, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Llama plugin for Vertex AI Model Garden, leveraging OpenAI-compatible endpoints, and refactors the Anthropic plugin to share environment variable resolution logic. It also adds initialization guards and mutex locking to model definition methods. Feedback highlights a potential issue with the Llama baseURL construction regarding missing publisher segments and suggests using safe type assertions when registering models to prevent potential panics.

Comment thread go/plugins/vertexai/modelgarden/llama.go
Comment thread go/plugins/vertexai/modelgarden/llama.go
@cabljac
Copy link
Copy Markdown
Contributor Author

cabljac commented May 12, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Llama plugin for Vertex AI Model Garden, leveraging OpenAI-compatible endpoints, and refactors environment variable resolution into a shared utility used by both Llama and Anthropic plugins. Additionally, it updates the Anthropic model list with Claude 4.5 Opus and adds several test suites and a usage sample. Feedback suggests improving the Llama plugin's initialization by using context.WithoutCancel to preserve context values and wrapping errors with more descriptive information for better diagnosability.

Comment on lines +74 to +78
ts, err := google.DefaultTokenSource(context.Background(), "https://www.googleapis.com/auth/cloud-platform")
if err != nil {
panic(fmt.Errorf("modelgarden llama: obtaining default Google token source: %w", err))
}
httpClient := oauth2.NewClient(context.Background(), ts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a non-cancelled context for the DefaultTokenSource and oauth2.NewClient is appropriate as these objects must outlive the Init context. To better align with project standards, consider using context.WithoutCancel(ctx) instead of context.Background(). This ensures that while the context is not cancelled, it still carries potentially useful values (like trace IDs) from the parent context. Additionally, ensure any errors from these calls are wrapped with contextual information (e.g., 'failed to initialize vertex modelgarden client') to improve diagnosability.

References
  1. When performing critical background operations that must survive client disconnection, use a context that is not cancelled (e.g., context.WithoutCancel) and wrap resulting errors with contextual information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant