Skip to content

feat(go/plugins/vertexai/modelgarden): Support Claude Opus 4.5, Llama, Mistral#5175

Closed
cabljac wants to merge 10 commits into
mainfrom
feat/go-modelgarden-new-models
Closed

feat(go/plugins/vertexai/modelgarden): Support Claude Opus 4.5, Llama, Mistral#5175
cabljac wants to merge 10 commits into
mainfrom
feat/go-modelgarden-new-models

Conversation

@cabljac

@cabljac cabljac commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Extends the Vertex AI Model Garden Go plugin with new Anthropic, Meta Llama, and Mistral / Codestral models, plus a Dev UI sample that exercises them end-to-end.

Models

  • Anthropic (existing plugin, more models): claude-opus-4-5@20251101, claude-opus-4-1-20250805, claude-sonnet-4-5-20250929, claude-haiku-4-5-20251001, plus the previously supported Claude 3.5 / 3.7 / Opus 4 / Sonnet 4 entries.
  • Meta Llama (MaaS, new plugin): meta/llama-4-maverick-17b-128e-instruct-maas, meta/llama-4-scout-17b-16e-instruct-maas, meta/llama-3.3-70b-instruct-maas. Llama 4 variants register as Multimodal; Llama 3.3 70B is text-only. Routed through compat_oai against the Vertex MaaS OpenAI-compatible endpoint.
  • Mistral / Codestral (MaaS, new plugin): mistral-small-2503, mistral-medium-3, mistral-ocr-2505, codestral-2. No maintained Mistral-GCP Go SDK exists, and Vertex does not serve Mistral via the OpenAI-compatible chat completions endpoint — see Routing notes below.

Plugin scaffolding

  • New Llama and Mistral plugin types alongside the existing Anthropic, each with Init, Name, DefineModel(name, opts), and a top-level LlamaModel(g, id) / MistralModel(g, id) lookup, mirroring the existing Anthropic shape.
  • Shared resolveVertexMaasEnv helper for GOOGLE_CLOUD_PROJECT / GCLOUD_PROJECT / GOOGLE_CLOUD_LOCATION / GOOGLE_CLOUD_REGION resolution.
  • Shared provider = "vertexai" constant so every modelgarden plugin registers models under the same vertexai/<model> namespace.
  • Anthropic.DefineModel now takes the same mutex and initted guard as Llama.DefineModel / Mistral.DefineModel, so calling it before Init returns an explicit "not initialized" error instead of using a zero-value client.

Routing notes

Vertex's /endpoints/openapi/chat/completions works for Meta Llama but not for Mistral — Vertex returns 400 FAILED_PRECONDITION even with a subscribed project. Mistral is only served at per-model …/publishers/mistralai/models/<id>:rawPredict (and :streamRawPredict for SSE). The rawPredict response is already OpenAI-shaped JSON, so only the URL needs to change.

mistral_transport.go introduces an http.RoundTripper that wraps the oauth2 transport, intercepts outbound /chat/completions requests, reads model + stream from the body, rewrites the URL to the per-model rawPredict (or streamRawPredict) path, and delegates back. The body bytes are preserved (incl. tools, response_format, stream_options), GetBody is set for openai-go retries, and the model id is url.PathEscape-encoded to prevent path/query injection.

MistralModels keys are bare ids (e.g. mistral-small-2503), so the openai-go SDK already serialises the form rawPredict expects. MistralModel(g, id) and Mistral.DefineModel(name, opts) strip an optional mistralai/ prefix as a convenience so callers used to publisher-qualified ids keep working.

Llama is unchanged from the OpenAI-compat path; only Mistral takes the rawPredict detour.

Sample

go/samples/modelgarden registers four flows for Dev UI smoke testing:

  • opus45Flowclaude-opus-4-5@20251101
  • llamaFlowmeta/llama-3.3-70b-instruct-maas
  • mistralFlowmistral-small-2503
  • codestralFlowcodestral-2

Each plugin is constructed with an explicit Location because Vertex MaaS regional availability differs per publisher (Anthropic in us-east5, Llama / Mistral in us-central1), so all four flows work simultaneously with just GOOGLE_CLOUD_PROJECT set.

Tests

  • mistral_transport_test.go — pure in-process tests covering rawPredict / streamRawPredict URL rewrites, publisher-prefix strip, non-chat passthrough, missing-model error, body field preservation (tools, response_format), GetBody population, and URL escaping of malformed model ids.
  • models_test.goresolveVertexMaasEnv exercised for explicit args, primary / secondary env fallback, and panic paths.
  • internal_test.goDefineModel on Anthropic / Llama / Mistral covered for the nil ai.ModelOptions branch (in addition to the existing pre-Init error path in define_model_test.go).
  • Coverage on the new code: mistral_transport.RoundTrip 89.7%, peekModelAndStream 100%, resolveVertexMaasEnv 100%. Package coverage rose from 32.6% to 48.8%; remaining 0% functions (Init, Name, model lookups) require GCP credentials and are exercised by the live tests.
Screenshot 2026-05-11 at 16 46 47

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Meta Llama and Mistral/Codestral models in the Vertex AI Model Garden plugin by implementing new Llama and Mistral structures that utilize OpenAI-compatible endpoints. It also updates the Anthropic model list, including the addition of Claude 4.5 Opus and the deprecation of older versions. Key feedback includes addressing ineffective pointer receiver assignments in Init methods, resolving undefined provider variables that may cause compilation errors, and mitigating potential panics from risky type assertions when registering models.

Comment thread go/plugins/vertexai/modelgarden/llama.go Outdated
Comment thread go/plugins/vertexai/modelgarden/llama.go
Comment thread go/plugins/vertexai/modelgarden/mistral.go Outdated
Comment thread go/plugins/vertexai/modelgarden/mistral.go
Comment thread go/plugins/vertexai/modelgarden/llama.go
@cabljac cabljac force-pushed the feat/go-modelgarden-new-models branch from 1f71e45 to 93d4ed3 Compare April 23, 2026 10:19
@cabljac

cabljac commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Meta Llama and Mistral/Codestral models to the Vertex AI Model Garden plugin. It includes new plugin implementations for Llama and Mistral that utilize OpenAI-compatible endpoints, along with corresponding live tests and model definitions. Additionally, the Anthropic model list was updated to include Claude 4.5 Opus and mark Claude 3 Haiku as deprecated. Review feedback identified a missing definition for the provider variable in the new files and a naming inconsistency between the plugin provider and the model namespace that could affect model discovery.

Comment thread go/plugins/vertexai/modelgarden/llama.go
Comment thread go/plugins/vertexai/modelgarden/llama.go Outdated
@cabljac

cabljac commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request expands the Vertex AI Model Garden plugin by adding support for Meta Llama and Mistral/Codestral models, implemented via OpenAI-compatible endpoints. It also updates the Anthropic model definitions to include Claude 4.5 Opus and marks Claude 3 Haiku as deprecated. Review feedback suggests correcting the capability flags for the Llama 3.3 70B model, which is currently incorrectly marked as multimodal, and refactoring the environment variable resolution logic to eliminate duplication across the new plugin files.

Comment thread go/plugins/vertexai/modelgarden/models.go
Comment thread go/plugins/vertexai/modelgarden/llama.go Outdated
cabljac added 3 commits May 11, 2026 12:40
… Mistral support

- Add claude-opus-4-5@20251101 to the existing Anthropic Model Garden list.
- Add a new Llama plugin covering the three Meta Llama MaaS models
  (llama-4-maverick, llama-4-scout, llama-3.3-70b). The plugin reuses
  compat_oai.OpenAICompatible with an oauth2-wrapped HTTP client against
  the Vertex MaaS OpenAI-compatible endpoint.
- Add a new Mistral plugin covering mistral-medium-3, mistral-ocr-2505,
  mistral-small-2503 and codestral-2. The JS parity path uses the
  native @mistralai/mistralai-gcp SDK, which has no Go equivalent, so
  this plugin also routes through the Vertex MaaS OpenAI-compatible
  endpoint.
- Live tests for each new provider skip unless GOOGLE_CLOUD_PROJECT and
  GOOGLE_CLOUD_LOCATION are set.
… check

Reassigning a pointer receiver inside Init does not update the caller's
reference, so the block was a no-op that misleadingly implied nil-safety.
Plugins are always registered via a non-nil pointer
(genkit.WithPlugins(&Llama{...})), so removing it is safe.
… compat_oai namespaces

Move the shared provider = "vertexai" constant out of anthropic.go, where
it was coupled to an unrelated plugin, and into models.go alongside the
other package-level shared state. Expand the comment to document the
shared namespace design, the collision with googlegenai.VertexAI, and
the next-major-version TODO.

Also set l.oai.Provider / m.oai.Provider to the shared provider value
rather than each plugin's own Name(). The embedded compat_oai is never
registered directly, so changing its Provider has no external effect,
but it makes compat_oai.ListActions / ResolveAction consistent with
the static DefineModel(provider, ...) registrations.
@cabljac cabljac force-pushed the feat/go-modelgarden-new-models branch from a8e4f88 to f41ad13 Compare May 11, 2026 11:41
cabljac added 6 commits May 11, 2026 12:58
…nv helper, guard Anthropic.DefineModel

- Llama 3.3 70B Instruct is text-only on Vertex MaaS; switch its
  Supports to BasicText. Llama 4 Maverick/Scout remain Multimodal.
- Centralise resolveVertexMaasEnv in models.go and use it from
  anthropic.go, llama.go, and mistral.go (Anthropic's inline ~20-line
  block collapses to one call).
- Anthropic.DefineModel now acquires the mutex and checks initted,
  matching Llama/Mistral. Calling DefineModel before Init returned a
  zero-valued client; it now returns a "not initialized" error.
- Add define_model_test.go covering DefineModel-before-Init for all
  three plugins.
…redict + Dev UI sample

Vertex AI does not serve Mistral via the OpenAI-compatible
/endpoints/openapi/chat/completions endpoint that compat_oai targets;
that endpoint returns 400 FAILED_PRECONDITION even with a subscribed
project. Mistral lives at /publishers/mistralai/models/<id>:rawPredict
(or :streamRawPredict for SSE) and the rawPredict response is already
OpenAI-shaped, so only the URL needs to change.

Add mistralVertexTransport, an http.RoundTripper wrapping the oauth2
transport. It intercepts /chat/completions requests, reads the model
field and stream flag from the body, and rewrites the URL to the
per-model rawPredict / streamRawPredict path while leaving the body
bytes intact. The inner oauth2 transport still adds the Bearer token,
and compat_oai's existing OpenAI-shaped request building, SSE parsing,
tool-call handling, and response conversion all keep working.

Re-key MistralModels with bare ids (mistral-small-2503, codestral-2,
mistral-medium-3, mistral-ocr-2505) so the body the openai-go SDK
emits already carries the bare id that rawPredict expects. MistralModel
and Mistral.DefineModel strip an optional "mistralai/" prefix so
callers used to publisher-qualified ids keep working without code
changes.

Add a Dev UI sample under go/samples/modelgarden that exercises Claude
3.5 Sonnet v2, Claude Opus 4.5, Llama 3.3 70B, Mistral Small 2503, and
Codestral 2. The Anthropic flows now pass MaxTokens (required by the
Anthropic plugin), and the Claude 3.5 model id includes its required
@20241022 stamp.

Tests:
- mistral_transport_test.go: rawPredict and streamRawPredict URL
  rewrite, publisher-prefix strip, non-chat passthrough, missing-model
  error, body field preservation (tools, response_format), and GetBody
  population so openai-go retries replay the request.
- mistral_live_test.go: add bare-id lookup subtest and a streaming
  subtest covering :streamRawPredict end-to-end.
…rawPredict path

A model id from the request body lands directly in the rawPredict URL
path. Without escaping, an id containing "/", "?", or "#" could inject
extra path segments, query strings, or fragments into the outbound
request. Pass it through url.PathEscape and add a test pinning that
behavior.
…k edge cases, and nil opts

Adds three cheap unit tests that push package coverage from 32.6% to
48.8% without touching live (credential-gated) paths:

- models_test.go: resolveVertexMaasEnv now exercised for explicit args,
  primary env fallback, secondary env fallback, and the panic paths
  when neither project nor location can be resolved. (0% to 100%.)
- mistral_transport_test.go: peekModelAndStream now covered for empty
  bodies and malformed JSON. (66.7% to 100%.)
- internal_test.go: each plugin's DefineModel exercised on the nil
  ai.ModelOptions branch. (50-57% to 75-85%.)

Remaining 0% functions (Init, Name, AnthropicModel/LlamaModel/
MistralModel) require GCP credentials and are covered by the live
tests.
…e Anthropic flow

Vertex AI MaaS regional availability differs per publisher: Claude
models live in us-east5 / europe-west4 while Llama and Mistral live in
us-central1. The old sample assumed a single GOOGLE_CLOUD_LOCATION env
which forced users to pick one region. Pass each plugin its own
Location so all four flows work simultaneously with just
GOOGLE_CLOUD_PROJECT set.

Drop the second Anthropic flow (claude-sonnet-4-5-20250929) since most
projects ship with only Claude Opus 4.5 enabled. Comment documents how
to add more variants once they are enabled.
@cabljac

cabljac commented May 11, 2026

Copy link
Copy Markdown
Contributor Author

Happy to split out the mistral stuff if needed

…ead error

Previously a failure from io.ReadAll in mistralVertexTransport.RoundTrip
returned the error without closing req.Body, leaking the underlying
reader. The caller had no way to recover the body because GetBody was
not yet set on the rewritten request. Close unconditionally, and add a
test using a Reader that returns an error to verify Close runs in that
path.
@cabljac

cabljac commented May 11, 2026

Copy link
Copy Markdown
Contributor Author

Closing in favour of a split into two stacked PRs for easier review:

Branch feat/go-modelgarden-new-models stays around as the historical record (it is the source the two new branches were cut from).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant