-
Notifications
You must be signed in to change notification settings - Fork 168
feat: Add Jina Embeddings v3 with task-specific LoRA support #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add support for jinaai/jina-embeddings-v3, a multilingual embedding model with 1024 dimensions supporting 89+ languages and task-specific LoRA adapters. Features: - Task-specific embeddings via LoRA adapters (retrieval.query, retrieval.passage, classification, text-matching, separation) - Automatic task_id handling for ONNX inference - Default to text-matching task for general purpose use - query_embed() and passage_embed() methods for retrieval tasks - Matryoshka dimensions support (32-1024) - 8,192 token context window Model specs: - 570M parameters - 2.29 GB ONNX model - Apache 2.0 license Implementation: - Added model configuration with additional_files for model.onnx_data - Load lora_adaptations from config.json - Preprocess ONNX input to add task_id parameter - Override query_embed/passage_embed for automatic task selection - Added comprehensive multi-task test with canonical vectors Following the pattern from PR qdrant#561 but using task_id instead of text prefixes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Jina Embeddings v3 Implementation SummaryOverviewSuccessfully added support for jinaai/jina-embeddings-v3 to fastembed, following the pattern from PR #561. Model Specifications
Files Modified1.
|
Enhance the Jina v3 model configuration to expose all available LoRA tasks: - Add 'available_tasks' list with all 5 LoRA adapters - Add 'default_task' for explicit default behavior - Update _preprocess_onnx_input to use default_task from model description - Maintain backward compatibility with existing task selection logic This makes the model's capabilities more discoverable and allows users to see all available task types via list_supported_models(). Available tasks: - retrieval.query (for search queries) - retrieval.passage (for documents/passages) - separation (for clustering) - classification (for text classification) - text-matching (for semantic similarity, default) Co-Authored-By: Claude <[email protected]>
Update: Added Comprehensive Task MetadataEnhanced the model description to expose all available LoRA tasks in the tasks={
"query_task": "retrieval.query",
"passage_task": "retrieval.passage",
"default_task": "text-matching",
"available_tasks": [
"retrieval.query",
"retrieval.passage",
"separation",
"classification",
"text-matching",
],
}Benefits✅ Users can discover all available tasks via Usage Examplemodels = TextEmbedding.list_supported_models()
jina_v3 = [m for m in models if 'jina-embeddings-v3' in m['model']][0]
print(jina_v3['tasks']['available_tasks'])
# ['retrieval.query', 'retrieval.passage', 'separation', 'classification', 'text-matching']
print(jina_v3['tasks']['default_task'])
# 'text-matching'All tests still passing ✅ |
|
Warning Rate limit exceeded@aaronspring has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 18 minutes and 33 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR introduces task-aware embedding support with LoRA adapters, primarily for the Jina v3 embeddings model. Changes include loading LoRA adapter configurations from config.json, injecting task_id into preprocessing based on task type, adding query_embed and passage_embed methods to route embeddings through task-specific paths, and registering jinaai/jina-embeddings-v3 as a new supported model with task mappings. Tests validate multi-task embedding behavior and ensure query and passage embeddings differ appropriately for the same input. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes The changes introduce new logic for LoRA adapter configuration loading and task-aware embedding routing across multiple areas of the codebase. While the changes are somewhat focused, they involve: structural modifications to model initialization and preprocessing, new public methods with task-aware routing, a new model registry entry with specific configuration, and test coverage with a duplicate test function definition that requires clarification. The heterogeneous nature of logic changes (config loading, task routing, registry updates) alongside the test duplication concern warrants careful verification of correctness and integration. Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
fastembed/text/onnx_embedding.py (1)
331-350: Ensure task_id matches batch shape and reject unknown task types.Scalar task_id may not match ONNX input shape; unknown task_type is silently ignored. Build a [batch]-shaped vector and raise on invalid task_type.
Apply this diff:
- # Handle task-specific embeddings for models with LoRA adapters - if self.lora_adaptations: - task_type = kwargs.get("task_type") - - # If no task specified, use default (text-matching for general purpose) - if not task_type: - # Default to text-matching if available, otherwise first task - task_type = "text-matching" if "text-matching" in self.lora_adaptations else self.lora_adaptations[0] - - if task_type in self.lora_adaptations: - task_id = np.array(self.lora_adaptations.index(task_type), dtype=np.int64) - onnx_input["task_id"] = task_id + # Handle task-specific embeddings for models with LoRA adapters + if self.lora_adaptations: + task_type = kwargs.get("task_type") + + # Default to text-matching if available, otherwise first task + if not task_type: + task_type = ( + "text-matching" + if "text-matching" in self.lora_adaptations + else self.lora_adaptations[0] + ) + + # Map to index or fail fast + try: + idx = self.lora_adaptations.index(task_type) + except ValueError as e: + raise ValueError( + f"Unsupported task_type '{task_type}'. " + f"Valid: {self.lora_adaptations}" + ) from e + + # Match ONNX batch dimension + batch_size = None + for k in ("input_ids", "attention_mask"): + arr = onnx_input.get(k) + if arr is not None and hasattr(arr, "shape") and len(arr.shape) >= 1: + batch_size = int(arr.shape[0]) + break + if batch_size is None: + batch_size = 1 + + onnx_input["task_id"] = np.full((batch_size,), idx, dtype=np.int64)
🧹 Nitpick comments (3)
fastembed/text/onnx_embedding.py (2)
375-394: Minor: reduce duplication by delegating to base after setting task_type.Set the task_type and yield from super().query_embed to keep behavior centralized.
Apply this diff:
- # Use task-specific embedding for models with LoRA adapters - if self.model_description.tasks and "query_task" in self.model_description.tasks: - kwargs["task_type"] = self.model_description.tasks["query_task"] - - if isinstance(query, str): - yield from self.embed([query], **kwargs) - else: - yield from self.embed(query, **kwargs) + if self.model_description.tasks and "query_task" in self.model_description.tasks: + kwargs.setdefault("task_type", self.model_description.tasks["query_task"]) + yield from super().query_embed(query, **kwargs)
395-414: Minor: mirror the refactor for passage_embed.Same simplification as query_embed.
Apply this diff:
- # Use task-specific embedding for models with LoRA adapters - if self.model_description.tasks and "passage_task" in self.model_description.tasks: - kwargs["task_type"] = self.model_description.tasks["passage_task"] - - if isinstance(texts, str): - yield from self.embed([texts], **kwargs) - else: - yield from self.embed(texts, **kwargs) + if self.model_description.tasks and "passage_task" in self.model_description.tasks: + kwargs.setdefault("task_type", self.model_description.tasks["passage_task"]) + yield from super().passage_embed(texts, **kwargs)tests/test_text_onnx_embeddings.py (1)
181-239: Strengthen query vs passage difference check and cover parallel workers.Use cosine similarity to avoid flakiness and add a parallel path to ensure task routing works with workers.
Apply this diff:
- query_emb = np.array(list(model.query_embed([test_text]))) - passage_emb = np.array(list(model.passage_embed([test_text]))) - - # They should not be identical (different task adapters) - assert not np.allclose(query_emb, passage_emb, atol=1e-6), \ - f"Query and passage embeddings should differ for {model_name}" + query_emb = np.stack(list(model.query_embed([test_text])), axis=0) # (1, dim) + passage_emb = np.stack(list(model.passage_embed([test_text])), axis=0) + # cosine similarity + def _cos(a, b): + a = a[0]; b = b[0] + return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b) + 1e-12)) + cos_sim = _cos(query_emb, passage_emb) + assert cos_sim < 0.999, f"Adapters should produce distinct vectors (cos={cos_sim:.6f}) for {model_name}" + + # Parallel path to verify task propagation works in worker processes + query_emb_p = np.stack(list(model.query_embed([test_text], parallel=2)), axis=0) + passage_emb_p = np.stack(list(model.passage_embed([test_text], parallel=2)), axis=0) + cos_sim_p = _cos(query_emb_p, passage_emb_p) + assert cos_sim_p < 0.999, f"[parallel] Adapters should produce distinct vectors (cos={cos_sim_p:.6f}) for {model_name}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fastembed/text/onnx_embedding.py(5 hunks)tests/test_text_onnx_embeddings.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/test_text_onnx_embeddings.py (4)
fastembed/text/text_embedding.py (5)
TextEmbedding(16-214)_list_supported_models(36-40)query_embed(189-200)passage_embed(202-214)embed(165-187)fastembed/text/onnx_embedding.py (4)
_list_supported_models(212-219)query_embed(375-393)passage_embed(395-413)embed(291-325)fastembed/text/multitask_embedding.py (4)
_list_supported_models(59-60)query_embed(86-87)passage_embed(89-90)embed(73-84)tests/utils.py (1)
delete_model_cache(11-39)
fastembed/text/onnx_embedding.py (3)
fastembed/common/model_description.py (1)
DenseModelDescription(35-40)fastembed/text/text_embedding_base.py (3)
query_embed(46-61)embed(22-29)passage_embed(31-44)fastembed/text/multitask_embedding.py (3)
query_embed(86-87)embed(73-84)passage_embed(89-90)
🔇 Additional comments (2)
fastembed/text/onnx_embedding.py (1)
187-204: Jina v3 model registration looks good; confirm no duplicate registration path.Entry is consistent (extra onnx_data listed, tasks mapping provided). Please verify that no other embedding class (e.g., JinaEmbeddingV3) also lists "jinaai/jina-embeddings-v3"; otherwise TextEmbedding may pick a different implementation depending on registry order.
tests/test_text_onnx_embeddings.py (1)
70-71: Canonical vector: pin provider or relax tolerance to avoid ORT/provider drift.Embedding numerics can differ across onnxruntime versions/providers. Consider pinning CPUExecutionProvider for canonical checks or widening atol slightly for v3.
| # Load LoRA adaptations for models that support task-specific embeddings (e.g., Jina v3) | ||
| self.lora_adaptations: Optional[list[str]] = None | ||
| config_path = Path(self._model_dir) / "config.json" | ||
| if config_path.exists(): | ||
| with open(config_path, "r") as f: | ||
| config = json.load(f) | ||
| self.lora_adaptations = config.get("lora_adaptations") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate lora_adaptations from config.json and fail fast for Jina v3 if missing.
Currently, non-list/empty values silently pass, which can lead to runtime shape/key errors later. Add minimal validation and a clear error for this model.
Apply this diff:
self.lora_adaptations: Optional[list[str]] = None
config_path = Path(self._model_dir) / "config.json"
if config_path.exists():
with open(config_path, "r") as f:
config = json.load(f)
- self.lora_adaptations = config.get("lora_adaptations")
+ la = config.get("lora_adaptations")
+ if isinstance(la, list) and all(isinstance(x, str) for x in la):
+ self.lora_adaptations = la
+ else:
+ self.lora_adaptations = None
+
+ # Fail fast when Jina v3 is selected but LoRA metadata is unavailable
+ if (
+ self.model_description.model.lower() == "jinaai/jina-embeddings-v3"
+ and not self.lora_adaptations
+ ):
+ raise ValueError(
+ "Missing or invalid 'lora_adaptations' in config.json for jinaai/jina-embeddings-v3."
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Load LoRA adaptations for models that support task-specific embeddings (e.g., Jina v3) | |
| self.lora_adaptations: Optional[list[str]] = None | |
| config_path = Path(self._model_dir) / "config.json" | |
| if config_path.exists(): | |
| with open(config_path, "r") as f: | |
| config = json.load(f) | |
| self.lora_adaptations = config.get("lora_adaptations") | |
| # Load LoRA adaptations for models that support task-specific embeddings (e.g., Jina v3) | |
| self.lora_adaptations: Optional[list[str]] = None | |
| config_path = Path(self._model_dir) / "config.json" | |
| if config_path.exists(): | |
| with open(config_path, "r") as f: | |
| config = json.load(f) | |
| la = config.get("lora_adaptations") | |
| if isinstance(la, list) and all(isinstance(x, str) for x in la): | |
| self.lora_adaptations = la | |
| else: | |
| self.lora_adaptations = None | |
| # Fail fast when Jina v3 is selected but LoRA metadata is unavailable | |
| if ( | |
| self.model_description.model.lower() == "jinaai/jina-embeddings-v3" | |
| and not self.lora_adaptations | |
| ): | |
| raise ValueError( | |
| "Missing or invalid 'lora_adaptations' in config.json for jinaai/jina-embeddings-v3." | |
| ) |
🤖 Prompt for AI Agents
In fastembed/text/onnx_embedding.py around lines 280 to 287, the code reads
lora_adaptations from config.json but doesn't validate it; add validation to
ensure config.get("lora_adaptations") is a non-empty list of strings and, if
not, raise a clear ValueError (fail fast) when this model requires task-specific
LoRA (e.g., Jina v3); specifically: after loading config, verify the key exists,
is a list, and each item is a string; set self.lora_adaptations to the validated
list, and if validation fails for a model that requires it, raise a descriptive
error explaining that lora_adaptations in config.json must be a non-empty list
of strings.
Add robust validation for lora_adaptations loaded from config.json to fail fast with clear error messages: Validation checks: - Verify lora_adaptations is a list (not string, dict, etc.) - Ensure list is non-empty - Validate each item is a string - Raise ValueError if model requires LoRA but config is missing/invalid Benefits: - Fail fast with descriptive errors instead of cryptic failures later - Clear error messages guide users to fix config issues - Protects against malformed config files - Validates contract between model description and config.json Error examples: - "'lora_adaptations' must be a list, got str" - "'lora_adaptations' must be a non-empty list" - "'lora_adaptations[1]' must be a string, got int" - "Model requires task-specific LoRA adapters, but 'lora_adaptations' is missing" Addresses CodeRabbit review feedback on PR qdrant#563. Co-Authored-By: Claude <[email protected]>
Fix: Added Comprehensive Validation for
|
Summary
Add support for jinaai/jina-embeddings-v3, a state-of-the-art multilingual embedding model with task-specific LoRA adapters.
Model Specifications
Key Features
✅ Task-Specific Embeddings via 5 LoRA adapters:
retrieval.query- For search queriesretrieval.passage- For documents/passagesclassification- For text classificationtext-matching- For semantic similarityseparation- For clustering✅ Automatic Task Handling:
Implementation Details
Following the pattern from PR #561, but using task_id parameter instead of text prefixes:
Changes
Model Configuration (
fastembed/text/onnx_embedding.py):additional_filesformodel.onnx_datalora_adaptationsfromconfig.jsontask_idparameterquery_embed()andpassage_embed()for automatic task selectiontext-matchingtask for general purpose useTests (
tests/test_text_onnx_embeddings.py):test_multi_task_embeddingtestTest Results
pytest tests/test_text_onnx_embeddings.py::test_multi_task_embedding -v # ===== 1 passed in 5.97s =====✅ All tests passing
✅ No regressions in existing tests
✅ Multilingual support confirmed (English, French, Spanish, Chinese tested)
Why Jina v3 vs v2 or v4?
Jina v3 is the perfect middle ground: modern features with official ONNX support.
Related
🤖 Generated with Claude Code
Co-Authored-By: Claude [email protected]