feat(go/plugins/vertexai/modelgarden): Support Claude Opus 4.5, Llama, Mistral#5175
feat(go/plugins/vertexai/modelgarden): Support Claude Opus 4.5, Llama, Mistral#5175cabljac wants to merge 10 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for Meta Llama and Mistral/Codestral models in the Vertex AI Model Garden plugin by implementing new Llama and Mistral structures that utilize OpenAI-compatible endpoints. It also updates the Anthropic model list, including the addition of Claude 4.5 Opus and the deprecation of older versions. Key feedback includes addressing ineffective pointer receiver assignments in Init methods, resolving undefined provider variables that may cause compilation errors, and mitigating potential panics from risky type assertions when registering models.
1f71e45 to
93d4ed3
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for Meta Llama and Mistral/Codestral models to the Vertex AI Model Garden plugin. It includes new plugin implementations for Llama and Mistral that utilize OpenAI-compatible endpoints, along with corresponding live tests and model definitions. Additionally, the Anthropic model list was updated to include Claude 4.5 Opus and mark Claude 3 Haiku as deprecated. Review feedback identified a missing definition for the provider variable in the new files and a naming inconsistency between the plugin provider and the model namespace that could affect model discovery.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request expands the Vertex AI Model Garden plugin by adding support for Meta Llama and Mistral/Codestral models, implemented via OpenAI-compatible endpoints. It also updates the Anthropic model definitions to include Claude 4.5 Opus and marks Claude 3 Haiku as deprecated. Review feedback suggests correcting the capability flags for the Llama 3.3 70B model, which is currently incorrectly marked as multimodal, and refactoring the environment variable resolution logic to eliminate duplication across the new plugin files.
… Mistral support - Add claude-opus-4-5@20251101 to the existing Anthropic Model Garden list. - Add a new Llama plugin covering the three Meta Llama MaaS models (llama-4-maverick, llama-4-scout, llama-3.3-70b). The plugin reuses compat_oai.OpenAICompatible with an oauth2-wrapped HTTP client against the Vertex MaaS OpenAI-compatible endpoint. - Add a new Mistral plugin covering mistral-medium-3, mistral-ocr-2505, mistral-small-2503 and codestral-2. The JS parity path uses the native @mistralai/mistralai-gcp SDK, which has no Go equivalent, so this plugin also routes through the Vertex MaaS OpenAI-compatible endpoint. - Live tests for each new provider skip unless GOOGLE_CLOUD_PROJECT and GOOGLE_CLOUD_LOCATION are set.
… check
Reassigning a pointer receiver inside Init does not update the caller's
reference, so the block was a no-op that misleadingly implied nil-safety.
Plugins are always registered via a non-nil pointer
(genkit.WithPlugins(&Llama{...})), so removing it is safe.
… compat_oai namespaces Move the shared provider = "vertexai" constant out of anthropic.go, where it was coupled to an unrelated plugin, and into models.go alongside the other package-level shared state. Expand the comment to document the shared namespace design, the collision with googlegenai.VertexAI, and the next-major-version TODO. Also set l.oai.Provider / m.oai.Provider to the shared provider value rather than each plugin's own Name(). The embedded compat_oai is never registered directly, so changing its Provider has no external effect, but it makes compat_oai.ListActions / ResolveAction consistent with the static DefineModel(provider, ...) registrations.
a8e4f88 to
f41ad13
Compare
…nv helper, guard Anthropic.DefineModel - Llama 3.3 70B Instruct is text-only on Vertex MaaS; switch its Supports to BasicText. Llama 4 Maverick/Scout remain Multimodal. - Centralise resolveVertexMaasEnv in models.go and use it from anthropic.go, llama.go, and mistral.go (Anthropic's inline ~20-line block collapses to one call). - Anthropic.DefineModel now acquires the mutex and checks initted, matching Llama/Mistral. Calling DefineModel before Init returned a zero-valued client; it now returns a "not initialized" error. - Add define_model_test.go covering DefineModel-before-Init for all three plugins.
…redict + Dev UI sample Vertex AI does not serve Mistral via the OpenAI-compatible /endpoints/openapi/chat/completions endpoint that compat_oai targets; that endpoint returns 400 FAILED_PRECONDITION even with a subscribed project. Mistral lives at /publishers/mistralai/models/<id>:rawPredict (or :streamRawPredict for SSE) and the rawPredict response is already OpenAI-shaped, so only the URL needs to change. Add mistralVertexTransport, an http.RoundTripper wrapping the oauth2 transport. It intercepts /chat/completions requests, reads the model field and stream flag from the body, and rewrites the URL to the per-model rawPredict / streamRawPredict path while leaving the body bytes intact. The inner oauth2 transport still adds the Bearer token, and compat_oai's existing OpenAI-shaped request building, SSE parsing, tool-call handling, and response conversion all keep working. Re-key MistralModels with bare ids (mistral-small-2503, codestral-2, mistral-medium-3, mistral-ocr-2505) so the body the openai-go SDK emits already carries the bare id that rawPredict expects. MistralModel and Mistral.DefineModel strip an optional "mistralai/" prefix so callers used to publisher-qualified ids keep working without code changes. Add a Dev UI sample under go/samples/modelgarden that exercises Claude 3.5 Sonnet v2, Claude Opus 4.5, Llama 3.3 70B, Mistral Small 2503, and Codestral 2. The Anthropic flows now pass MaxTokens (required by the Anthropic plugin), and the Claude 3.5 model id includes its required @20241022 stamp. Tests: - mistral_transport_test.go: rawPredict and streamRawPredict URL rewrite, publisher-prefix strip, non-chat passthrough, missing-model error, body field preservation (tools, response_format), and GetBody population so openai-go retries replay the request. - mistral_live_test.go: add bare-id lookup subtest and a streaming subtest covering :streamRawPredict end-to-end.
…rawPredict path A model id from the request body lands directly in the rawPredict URL path. Without escaping, an id containing "/", "?", or "#" could inject extra path segments, query strings, or fragments into the outbound request. Pass it through url.PathEscape and add a test pinning that behavior.
…k edge cases, and nil opts Adds three cheap unit tests that push package coverage from 32.6% to 48.8% without touching live (credential-gated) paths: - models_test.go: resolveVertexMaasEnv now exercised for explicit args, primary env fallback, secondary env fallback, and the panic paths when neither project nor location can be resolved. (0% to 100%.) - mistral_transport_test.go: peekModelAndStream now covered for empty bodies and malformed JSON. (66.7% to 100%.) - internal_test.go: each plugin's DefineModel exercised on the nil ai.ModelOptions branch. (50-57% to 75-85%.) Remaining 0% functions (Init, Name, AnthropicModel/LlamaModel/ MistralModel) require GCP credentials and are covered by the live tests.
…e Anthropic flow Vertex AI MaaS regional availability differs per publisher: Claude models live in us-east5 / europe-west4 while Llama and Mistral live in us-central1. The old sample assumed a single GOOGLE_CLOUD_LOCATION env which forced users to pick one region. Pass each plugin its own Location so all four flows work simultaneously with just GOOGLE_CLOUD_PROJECT set. Drop the second Anthropic flow (claude-sonnet-4-5-20250929) since most projects ship with only Claude Opus 4.5 enabled. Comment documents how to add more variants once they are enabled.
|
Happy to split out the mistral stuff if needed |
…ead error Previously a failure from io.ReadAll in mistralVertexTransport.RoundTrip returned the error without closing req.Body, leaking the underlying reader. The caller had no way to recover the body because GetBody was not yet set on the rewritten request. Close unconditionally, and add a test using a Reader that returns an error to verify Close runs in that path.
|
Closing in favour of a split into two stacked PRs for easier review:
Branch |
Extends the Vertex AI Model Garden Go plugin with new Anthropic, Meta Llama, and Mistral / Codestral models, plus a Dev UI sample that exercises them end-to-end.
Models
claude-opus-4-5@20251101,claude-opus-4-1-20250805,claude-sonnet-4-5-20250929,claude-haiku-4-5-20251001, plus the previously supported Claude 3.5 / 3.7 / Opus 4 / Sonnet 4 entries.meta/llama-4-maverick-17b-128e-instruct-maas,meta/llama-4-scout-17b-16e-instruct-maas,meta/llama-3.3-70b-instruct-maas. Llama 4 variants register asMultimodal; Llama 3.3 70B is text-only. Routed throughcompat_oaiagainst the Vertex MaaS OpenAI-compatible endpoint.mistral-small-2503,mistral-medium-3,mistral-ocr-2505,codestral-2. No maintained Mistral-GCP Go SDK exists, and Vertex does not serve Mistral via the OpenAI-compatible chat completions endpoint — see Routing notes below.Plugin scaffolding
LlamaandMistralplugin types alongside the existingAnthropic, each withInit,Name,DefineModel(name, opts), and a top-levelLlamaModel(g, id)/MistralModel(g, id)lookup, mirroring the existing Anthropic shape.resolveVertexMaasEnvhelper forGOOGLE_CLOUD_PROJECT/GCLOUD_PROJECT/GOOGLE_CLOUD_LOCATION/GOOGLE_CLOUD_REGIONresolution.provider = "vertexai"constant so every modelgarden plugin registers models under the samevertexai/<model>namespace.Anthropic.DefineModelnow takes the same mutex andinittedguard asLlama.DefineModel/Mistral.DefineModel, so calling it beforeInitreturns an explicit"not initialized"error instead of using a zero-value client.Routing notes
Vertex's
/endpoints/openapi/chat/completionsworks for Meta Llama but not for Mistral — Vertex returns400 FAILED_PRECONDITIONeven with a subscribed project. Mistral is only served at per-model…/publishers/mistralai/models/<id>:rawPredict(and:streamRawPredictfor SSE). The rawPredict response is already OpenAI-shaped JSON, so only the URL needs to change.mistral_transport.gointroduces anhttp.RoundTripperthat wraps the oauth2 transport, intercepts outbound/chat/completionsrequests, readsmodel+streamfrom the body, rewrites the URL to the per-model rawPredict (or streamRawPredict) path, and delegates back. The body bytes are preserved (incl.tools,response_format,stream_options),GetBodyis set for openai-go retries, and the model id isurl.PathEscape-encoded to prevent path/query injection.MistralModelskeys are bare ids (e.g.mistral-small-2503), so the openai-go SDK already serialises the form rawPredict expects.MistralModel(g, id)andMistral.DefineModel(name, opts)strip an optionalmistralai/prefix as a convenience so callers used to publisher-qualified ids keep working.Llama is unchanged from the OpenAI-compat path; only Mistral takes the rawPredict detour.
Sample
go/samples/modelgardenregisters four flows for Dev UI smoke testing:opus45Flow—claude-opus-4-5@20251101llamaFlow—meta/llama-3.3-70b-instruct-maasmistralFlow—mistral-small-2503codestralFlow—codestral-2Each plugin is constructed with an explicit
Locationbecause Vertex MaaS regional availability differs per publisher (Anthropic inus-east5, Llama / Mistral inus-central1), so all four flows work simultaneously with justGOOGLE_CLOUD_PROJECTset.Tests
mistral_transport_test.go— pure in-process tests covering rawPredict / streamRawPredict URL rewrites, publisher-prefix strip, non-chat passthrough, missing-model error, body field preservation (tools, response_format),GetBodypopulation, and URL escaping of malformed model ids.models_test.go—resolveVertexMaasEnvexercised for explicit args, primary / secondary env fallback, and panic paths.internal_test.go—DefineModelon Anthropic / Llama / Mistral covered for the nilai.ModelOptionsbranch (in addition to the existing pre-Initerror path indefine_model_test.go).mistral_transport.RoundTrip89.7%,peekModelAndStream100%,resolveVertexMaasEnv100%. Package coverage rose from 32.6% to 48.8%; remaining 0% functions (Init,Name, model lookups) require GCP credentials and are exercised by the live tests.