new: add gemmaembedding-300m #592

joein · 2025-12-15T12:29:46Z

Add support for gemmaembedding-300m

There is a version of gemma embedding containing weights specifically designed for sentence transformers and retrieval.
https://huggingface.co/google/embeddinggemma-300m#usage

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

It applies 2 projection layers after the pooling, and then puts the pooled result into sentence_embedding instead of last_hidden_state. Which is the second output of the onnx model, and we usually take the first one.

I expect to see more models to come up with this design, thus introducing a new class here.

…ence embedding

coderabbitai · 2025-12-15T12:35:27Z

📝 Walkthrough

Walkthrough

This PR introduces a new builtin sentence embedding model implementation for Google's EmbeddingGemma-300m. It adds a new module fastembed/text/builtin_sentence_embedding.py containing BuiltinSentenceEmbedding and BuiltinSentenceEmbeddingWorker classes that extend existing ONNX-based embedding infrastructure. The base OnnxTextModel is refactored with a new _run_model internal method to encapsulate ONNX model execution. The new embedding class is registered in the embeddings registry, and corresponding test data and a new test_query_embedding test function are added to validate query-specific embeddings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes span multiple files with different concerns: new class definitions with overridden methods, a method extraction refactor, registry updates, and new test logic
The BuiltinSentenceEmbedding class overrides _run_model to return index [1] instead of [0] — verify this output indexing aligns with the actual ONNX model structure
The _post_process_onnx_output method returns raw output without casting — confirm this matches expected behavior for this model
Test additions include conditional skip logic and vector validation against canonical values — verify test data accuracy for google/embeddinggemma-300m

Possibly related PRs

new: preserve embeddings in a type set by their model #492: Related to ONNX output post-processing changes that remove float32 casting from _post_process_onnx_output implementations

Suggested reviewers

hh-space-invader

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title 'add gemmaembedding-300m' is a minor typo/inconsistency: it should be 'embeddinggemma-300m' per the actual model name (google/embeddinggemma-300m) used throughout the codebase and PR description.	Update the title to 'add embeddinggemma-300m' to match the actual model name used in the implementation and align with standard naming conventions.
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description clearly explains the motivation for adding embeddinggemma-300m support, the model's architecture, and why a new class was introduced, directly addressing the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gemma-embed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

fastembed/text/onnx_text_model.py (1)

95-103: Good abstraction of ONNX execution into _run_model

Routing onnx_embed through _run_model and defaulting to the first ONNX output keeps existing behavior while giving subclasses a clean override point for multi-output models (like the new built-in sentence embeddings). The OnnxOutputContext wiring still exposes both attention_mask and input_ids after preprocessing, which is useful for downstream pooling logic. A brief docstring on _run_model (e.g., that it must return the batch-major tensor used as model_output) would make the contract clearer for future subclasses, but the current implementation looks correct.

Also applies to: 105-109
fastembed/text/builtin_sentence_embedding.py (1)
46-49: Silence Ruff ARG002 for unused kwargs in _post_process_onnx_output

The kwargs parameter is required to match the base signature, but it’s unused here, which triggers Ruff’s ARG002. You can keep the interface and satisfy Ruff by discarding or renaming the argument, e.g.:
-    def _post_process_onnx_output(
-        self, output: OnnxOutputContext, **kwargs: Any
-    ) -> Iterable[NumpyArray]:
-        return output.model_output
+    def _post_process_onnx_output(
+        self, output: OnnxOutputContext, **_kwargs: Any
+    ) -> Iterable[NumpyArray]:
+        return output.model_output
(or add an explicit del kwargs inside the method).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 685fd9b and e14914a.

📒 Files selected for processing (4)

fastembed/text/builtin_sentence_embedding.py (1 hunks)
fastembed/text/onnx_text_model.py (1 hunks)
fastembed/text/text_embedding.py (2 hunks)
tests/test_text_onnx_embeddings.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

fastembed/text/onnx_text_model.py (1)

fastembed/common/onnx_model.py (1)

OnnxOutputContext (20-23)

tests/test_text_onnx_embeddings.py (4)

fastembed/text/text_embedding.py (3)

TextEmbedding (17-230)

_list_supported_models (38-42)

query_embed (191-202)

fastembed/late_interaction/late_interaction_text_embedding.py (2)

_list_supported_models (45-49)

query_embed (141-153)

fastembed/text/onnx_embedding.py (1)

_list_supported_models (190-197)

tests/utils.py (1)

should_test_model (42-67)

fastembed/text/text_embedding.py (1)

fastembed/text/builtin_sentence_embedding.py (1)

BuiltinSentenceEmbedding (30-54)

🪛 Ruff (0.14.8)

fastembed/text/builtin_sentence_embedding.py

47-47: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Python 3.13.x on ubuntu-latest test
GitHub Check: Python 3.11.x on ubuntu-latest test
GitHub Check: Python 3.13.x on windows-latest test
GitHub Check: Python 3.10.x on ubuntu-latest test
GitHub Check: Python 3.13.x on macos-latest test
GitHub Check: Python 3.12.x on ubuntu-latest test

🔇 Additional comments (4)

fastembed/text/text_embedding.py (1)

11-11: Registry wiring for BuiltinSentenceEmbedding looks correct

Importing BuiltinSentenceEmbedding and adding it to EMBEDDINGS_REGISTRY before CustomTextEmbedding ensures google/embeddinggemma-300m is resolved to the new backend without affecting existing models or user-defined custom models.

Also applies to: 24-26

tests/test_text_onnx_embeddings.py (2)

71-87: Canonical vectors and prefixes for google/embeddinggemma-300m are consistent

Adding a CANONICAL_VECTOR_VALUES entry plus DOC_PREFIXES/QUERY_PREFIXES and CANONICAL_QUERY_VECTOR_VALUES for google/embeddinggemma-300m keeps the test suite aligned with the model’s documented prompt format. Using "title: none | text: " and "task: search result | query: " here mirrors the description in the model metadata, so both document and query tests exercise the recommended usage.

138-140: Embedding and query tests correctly respect model-specific prefixes and env gating

Conditionally prefixing docs and queries based on DOC_PREFIXES/QUERY_PREFIXES ensures the new model is tested with its required prompt templates without impacting other models. Reusing should_test_model in test_query_embedding matches the existing CI/manual/local split and avoids pulling in the heavy google/embeddinggemma-300m model except when explicitly allowed, while still validating shape and canonical values when it is tested. The structure is consistent with test_embedding and looks solid.

Also applies to: 151-181

fastembed/text/builtin_sentence_embedding.py (1)

10-27: Builtin sentence embedding integration is structurally sound

The DenseModelDescription for google/embeddinggemma-300m plus the BuiltinSentenceEmbedding / BuiltinSentenceEmbeddingWorker pair cleanly plug this model into the existing ONNX text stack. Overriding _run_model to take the second ONNX output and letting _post_process_onnx_output pass output.model_output through means the built-in pooling/normalization stays inside the ONNX graph while the rest of the pipeline (batching, workers, registry) remains unchanged. Using threads=1 in the worker is also consistent with other single-session worker patterns.

Also applies to: 30-55, 57-69

joein added 2 commits December 15, 2025 18:59

new: add gemma embed

8f04f57

refactor: rename builtin pooling normalized embedding to builtin sent…

e14914a

…ence embedding

joein requested a review from tbung December 15, 2025 12:29

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

new: add gemmaembedding-300m #592

new: add gemmaembedding-300m #592

Uh oh!

joein commented Dec 15, 2025

Uh oh!

coderabbitai bot commented Dec 15, 2025

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

new: add gemmaembedding-300m #592

Are you sure you want to change the base?

new: add gemmaembedding-300m #592

Uh oh!

Conversation

joein commented Dec 15, 2025

Uh oh!

coderabbitai bot commented Dec 15, 2025

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants