-
Notifications
You must be signed in to change notification settings - Fork 168
new: add gemmaembedding-300m #592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR introduces a new builtin sentence embedding model implementation for Google's EmbeddingGemma-300m. It adds a new module Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
fastembed/text/onnx_text_model.py (1)
95-103: Good abstraction of ONNX execution into_run_modelRouting
onnx_embedthrough_run_modeland defaulting to the first ONNX output keeps existing behavior while giving subclasses a clean override point for multi-output models (like the new built-in sentence embeddings). TheOnnxOutputContextwiring still exposes bothattention_maskandinput_idsafter preprocessing, which is useful for downstream pooling logic. A brief docstring on_run_model(e.g., that it must return the batch-major tensor used asmodel_output) would make the contract clearer for future subclasses, but the current implementation looks correct.Also applies to: 105-109
fastembed/text/builtin_sentence_embedding.py (1)
46-49: Silence Ruff ARG002 for unusedkwargsin_post_process_onnx_outputThe
kwargsparameter is required to match the base signature, but it’s unused here, which triggers Ruff’s ARG002. You can keep the interface and satisfy Ruff by discarding or renaming the argument, e.g.:- def _post_process_onnx_output( - self, output: OnnxOutputContext, **kwargs: Any - ) -> Iterable[NumpyArray]: - return output.model_output + def _post_process_onnx_output( + self, output: OnnxOutputContext, **_kwargs: Any + ) -> Iterable[NumpyArray]: + return output.model_output(or add an explicit
del kwargsinside the method).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
fastembed/text/builtin_sentence_embedding.py(1 hunks)fastembed/text/onnx_text_model.py(1 hunks)fastembed/text/text_embedding.py(2 hunks)tests/test_text_onnx_embeddings.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
fastembed/text/onnx_text_model.py (1)
fastembed/common/onnx_model.py (1)
OnnxOutputContext(20-23)
tests/test_text_onnx_embeddings.py (4)
fastembed/text/text_embedding.py (3)
TextEmbedding(17-230)_list_supported_models(38-42)query_embed(191-202)fastembed/late_interaction/late_interaction_text_embedding.py (2)
_list_supported_models(45-49)query_embed(141-153)fastembed/text/onnx_embedding.py (1)
_list_supported_models(190-197)tests/utils.py (1)
should_test_model(42-67)
fastembed/text/text_embedding.py (1)
fastembed/text/builtin_sentence_embedding.py (1)
BuiltinSentenceEmbedding(30-54)
🪛 Ruff (0.14.8)
fastembed/text/builtin_sentence_embedding.py
47-47: Unused method argument: kwargs
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Python 3.13.x on ubuntu-latest test
- GitHub Check: Python 3.11.x on ubuntu-latest test
- GitHub Check: Python 3.13.x on windows-latest test
- GitHub Check: Python 3.10.x on ubuntu-latest test
- GitHub Check: Python 3.13.x on macos-latest test
- GitHub Check: Python 3.12.x on ubuntu-latest test
🔇 Additional comments (4)
fastembed/text/text_embedding.py (1)
11-11: Registry wiring forBuiltinSentenceEmbeddinglooks correctImporting
BuiltinSentenceEmbeddingand adding it toEMBEDDINGS_REGISTRYbeforeCustomTextEmbeddingensuresgoogle/embeddinggemma-300mis resolved to the new backend without affecting existing models or user-defined custom models.Also applies to: 24-26
tests/test_text_onnx_embeddings.py (2)
71-87: Canonical vectors and prefixes forgoogle/embeddinggemma-300mare consistentAdding a CANONICAL_VECTOR_VALUES entry plus DOC_PREFIXES/QUERY_PREFIXES and CANONICAL_QUERY_VECTOR_VALUES for
google/embeddinggemma-300mkeeps the test suite aligned with the model’s documented prompt format. Using"title: none | text: "and"task: search result | query: "here mirrors the description in the model metadata, so both document and query tests exercise the recommended usage.
138-140: Embedding and query tests correctly respect model-specific prefixes and env gatingConditionally prefixing
docsandqueriesbased on DOC_PREFIXES/QUERY_PREFIXES ensures the new model is tested with its required prompt templates without impacting other models. Reusingshould_test_modelintest_query_embeddingmatches the existing CI/manual/local split and avoids pulling in the heavygoogle/embeddinggemma-300mmodel except when explicitly allowed, while still validating shape and canonical values when it is tested. The structure is consistent withtest_embeddingand looks solid.Also applies to: 151-181
fastembed/text/builtin_sentence_embedding.py (1)
10-27: Builtin sentence embedding integration is structurally soundThe DenseModelDescription for
google/embeddinggemma-300mplus theBuiltinSentenceEmbedding/BuiltinSentenceEmbeddingWorkerpair cleanly plug this model into the existing ONNX text stack. Overriding_run_modelto take the second ONNX output and letting_post_process_onnx_outputpassoutput.model_outputthrough means the built-in pooling/normalization stays inside the ONNX graph while the rest of the pipeline (batching, workers, registry) remains unchanged. Usingthreads=1in the worker is also consistent with other single-session worker patterns.Also applies to: 30-55, 57-69
Add support for gemmaembedding-300m
There is a version of gemma embedding containing weights specifically designed for sentence transformers and retrieval.
https://huggingface.co/google/embeddinggemma-300m#usage
It applies 2 projection layers after the pooling, and then puts the pooled result into
sentence_embeddinginstead oflast_hidden_state. Which is the second output of the onnx model, and we usually take the first one.I expect to see more models to come up with this design, thus introducing a new class here.