feat(go/plugins/vertexai/modelgarden): add Claude Opus 4.5, Llama, shared scaffolding#5296
feat(go/plugins/vertexai/modelgarden): add Claude Opus 4.5, Llama, shared scaffolding#5296cabljac wants to merge 3 commits into
Conversation
…ared scaffolding Extends the Vertex AI Model Garden Go plugin with new Anthropic Claude models and a new Meta Llama plugin, plus shared scaffolding used by all modelgarden plugins. Models: - Anthropic: claude-opus-4-5@20251101, claude-opus-4-1-20250805, claude-sonnet-4-5-20250929, claude-haiku-4-5-20251001, plus the previously supported 3.5 / 3.7 / Opus 4 / Sonnet 4 entries. - Meta Llama (MaaS, new plugin): meta/llama-4-maverick-17b-128e- instruct-maas, meta/llama-4-scout-17b-16e-instruct-maas, meta/llama-3.3-70b-instruct-maas. Llama 4 variants register as Multimodal; Llama 3.3 70B is text-only. Routed through compat_oai against the Vertex MaaS OpenAI-compatible endpoint. Plugin scaffolding: - New Llama plugin alongside the existing Anthropic, with Init, Name, DefineModel(name, opts), and a top-level LlamaModel(g, id) lookup mirroring the Anthropic shape. - Shared resolveVertexMaasEnv helper for GOOGLE_CLOUD_PROJECT / GCLOUD_PROJECT / GOOGLE_CLOUD_LOCATION / GOOGLE_CLOUD_REGION resolution. - Shared provider = "vertexai" constant so every modelgarden plugin registers models under the same vertexai/<model> namespace. - Anthropic.DefineModel now takes the same mutex and initted guard as Llama.DefineModel, so calling it before Init returns an explicit "not initialized" error instead of using a zero-value client. Sample: - go/samples/modelgarden registers an Anthropic and a Llama flow for Dev UI smoke testing. Each plugin is constructed with an explicit Location because Vertex MaaS regional availability differs per publisher (Anthropic in us-east5, Llama in us-central1). Tests: - define_model_test.go covers the pre-Init error path for both plugins. - internal_test.go covers the nil ai.ModelOptions branch of DefineModel for both plugins. - models_test.go covers resolveVertexMaasEnv for explicit args, primary and secondary env fallback, and panic paths. - llama_live_test.go exercises basic + streaming generation against meta/llama-3.3-70b-instruct-maas (credential-gated). - anthropic_live_test.go adds a claude-opus-4-5@20251101 subtest (credential-gated).
There was a problem hiding this comment.
Code Review
This pull request introduces a new Llama plugin for Vertex AI Model Garden, which leverages OpenAI-compatible endpoints and Google OAuth2 authentication. It also refactors environment variable resolution into a shared helper function used by both the Anthropic and Llama plugins. Additionally, the Anthropic plugin was updated with initialization checks in DefineModel and support for a new model version. Comprehensive unit, white-box, and live tests were added to ensure the reliability of the new features. Feedback was provided regarding the Llama plugin's initialization, specifically suggesting the use of context.Background() for the OAuth2 client to prevent issues with token refreshes if the initial context is short-lived.
Match the existing per-plugin convention seen in go/samples/anthropic
and go/samples/compat_oai/{anthropic,custom,openai}. The existing
go/samples/modelgarden sample stays single-flow Anthropic-only; a new
go/samples/modelgarden-llama sample exercises the new Llama plugin in
isolation.
…a oauth2 client The oauth2.NewClient and TokenSource outlive Init's ctx because every later generate call goes through them. If a caller passes a short-lived ctx to Init (e.g. one with a timeout, or one cancelled after plugin setup), token refresh on later calls would fail with the original ctx cancelled. Bind both to context.Background(). Reported by gemini-code-assist on #5296.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new Llama plugin for Vertex AI Model Garden, leveraging OpenAI-compatible endpoints, and refactors the Anthropic plugin to share environment variable resolution logic. It also adds initialization guards and mutex locking to model definition methods. Feedback highlights a potential issue with the Llama baseURL construction regarding missing publisher segments and suggests using safe type assertions when registering models to prevent potential panics.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new Llama plugin for Vertex AI Model Garden, leveraging OpenAI-compatible endpoints, and refactors environment variable resolution into a shared utility used by both Llama and Anthropic plugins. Additionally, it updates the Anthropic model list with Claude 4.5 Opus and adds several test suites and a usage sample. Feedback suggests improving the Llama plugin's initialization by using context.WithoutCancel to preserve context values and wrapping errors with more descriptive information for better diagnosability.
| ts, err := google.DefaultTokenSource(context.Background(), "https://www.googleapis.com/auth/cloud-platform") | ||
| if err != nil { | ||
| panic(fmt.Errorf("modelgarden llama: obtaining default Google token source: %w", err)) | ||
| } | ||
| httpClient := oauth2.NewClient(context.Background(), ts) |
There was a problem hiding this comment.
Using a non-cancelled context for the DefaultTokenSource and oauth2.NewClient is appropriate as these objects must outlive the Init context. To better align with project standards, consider using context.WithoutCancel(ctx) instead of context.Background(). This ensures that while the context is not cancelled, it still carries potentially useful values (like trace IDs) from the parent context. Additionally, ensure any errors from these calls are wrapped with contextual information (e.g., 'failed to initialize vertex modelgarden client') to improve diagnosability.
References
- When performing critical background operations that must survive client disconnection, use a context that is not cancelled (e.g., context.WithoutCancel) and wrap resulting errors with contextual information.
Extends the Vertex AI Model Garden Go plugin with new Anthropic Claude models, a new Meta Llama plugin, and shared scaffolding used by all modelgarden plugins.
This PR is the first half of the work originally proposed in #5175. The Mistral plugin (which requires a different transport approach because Vertex does not serve Mistral via the OpenAI-compatible endpoint) is stacked on top of this PR in a follow-up.
Models
claude-opus-4-5@20251101,claude-opus-4-1-20250805,claude-sonnet-4-5-20250929,claude-haiku-4-5-20251001, plus the previously supported Claude 3.5 / 3.7 / Opus 4 / Sonnet 4 entries.meta/llama-4-maverick-17b-128e-instruct-maas,meta/llama-4-scout-17b-16e-instruct-maas,meta/llama-3.3-70b-instruct-maas. Llama 4 variants register asMultimodal; Llama 3.3 70B is text-only. Routed throughcompat_oaiagainst the Vertex MaaS OpenAI-compatible endpoint.Plugin scaffolding
Llamaplugin type alongside the existingAnthropic, withInit,Name,DefineModel(name, opts), and a top-levelLlamaModel(g, id)lookup mirroring the Anthropic shape.resolveVertexMaasEnvhelper forGOOGLE_CLOUD_PROJECT/GCLOUD_PROJECT/GOOGLE_CLOUD_LOCATION/GOOGLE_CLOUD_REGIONresolution.provider = "vertexai"constant so every modelgarden plugin registers models under the samevertexai/<model>namespace.Anthropic.DefineModelnow takes the same mutex andinittedguard asLlama.DefineModel, so calling it beforeInitreturns an explicit"not initialized"error instead of using a zero-value client.Sample
go/samples/modelgardenregisters two flows for Dev UI smoke testing:opus45Flow—claude-opus-4-5@20251101llamaFlow—meta/llama-3.3-70b-instruct-maasEach plugin is constructed with an explicit
Locationbecause Vertex MaaS regional availability differs per publisher (Anthropic inus-east5, Llama inus-central1).Tests
define_model_test.go— pre-Init error path for both plugins.internal_test.go— nilai.ModelOptionsbranch ofDefineModelfor both plugins.models_test.go—resolveVertexMaasEnvcovered for explicit args, primary / secondary env fallback, and panic paths.llama_live_test.go— basic + streaming generation against Llama 3.3 70B (credential-gated).anthropic_live_test.go— Opus 4.5 subtest (credential-gated).Test plan
go test ./plugins/vertexai/modelgarden/...(unit, no creds)GOOGLE_CLOUD_PROJECT=… GOOGLE_CLOUD_LOCATION=… go test -v -run 'TestAnthropicLive|TestLlamaLive' ./plugins/vertexai/modelgarden/...cd go/samples/modelgarden && GOOGLE_CLOUD_PROJECT=<project> genkit start -- go run ., then run each flow from Dev UI.