Skip to content

Conversation

@kacperlukawski
Copy link
Member

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  • Does your submission pass the existing tests?
  • Have you added tests for your feature?
  • Have you installed pre-commit with pip3 install pre-commit and set up hooks with pre-commit install?

New models submission:

  • Have you added an explanation of why it's important to include this model?
  • Have you added tests for the new model? Were canonical values for tests computed via the original model?
  • Have you added the code snippet for how canonical values were computed?
  • Have you successfully ran tests with your changes locally?

@kacperlukawski kacperlukawski force-pushed the feature/colmodernvbert-support branch 2 times, most recently from bf215e1 to 2f0e85a Compare December 8, 2025 12:04
@kacperlukawski kacperlukawski requested a review from joein December 8, 2025 12:46
@kacperlukawski kacperlukawski marked this pull request as ready for review December 8, 2025 12:46
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the ColModernVBERT model (Qdrant/colmodernvbert), a late-interaction multimodal embedding model that uses bidirectional attention for retrieval tasks. The implementation extends the existing late-interaction multimodal embedding infrastructure with support for image splitting and dynamic patch handling.

Key Changes:

  • Introduced ColModernVBERT class with support for image splitting and patch-based processing
  • Added new image transformation operators (ImageSplitter, ResizeForVisionEncoder, SquareResize, ResizeLongestEdge) to handle Idefics3-style image preprocessing
  • Enhanced ONNX multimodal model to handle nested image structures with attention masks and metadata

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
fastembed/late_interaction_multimodal/colmodernvbert.py New model implementation with text/image embedding support, patch computation logic, and worker classes
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py Added ColModernVBERT to the embeddings registry
fastembed/late_interaction_multimodal/onnx_multimodal_model.py Updated image embedding to support nested structures with patch-level attention masks
fastembed/image/transform/operators.py Added transforms for longest edge resizing, vision encoder resizing, image splitting, and square resizing
fastembed/image/transform/functional.py Added utility functions for resizing longest edge, cropping ndarrays, and resizing ndarrays
fastembed/common/preprocessor_utils.py Added conditional check to avoid re-enabling padding if already configured
fastembed/common/onnx_model.py Added optional metadata field to OnnxOutputContext
tests/test_late_interaction_multimodal.py Added canonical values and test cases for the new model

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +159 to +166
Post-process the ONNX model output to convert it into a usable format.
Args:
output (OnnxOutputContext): The raw output from the ONNX model.
Returns:
Iterable[NumpyArray]: Post-processed output as NumPy arrays.
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring describes post-processing output, but this method actually preprocesses text input. The docstring should describe what the method does: adding empty image placeholders to text input for the ONNX model. The Args should describe onnx_input (not output), and the Returns should describe the updated onnx_input dict.

Suggested change
Post-process the ONNX model output to convert it into a usable format.
Args:
output (OnnxOutputContext): The raw output from the ONNX model.
Returns:
Iterable[NumpyArray]: Post-processed output as NumPy arrays.
Preprocesses the text input for the ONNX model by adding empty image placeholders.
Args:
onnx_input (dict[str, NumpyArray]): The text input dictionary for the ONNX model.
Returns:
dict[str, NumpyArray]: The updated input dictionary with empty image placeholders added.

Copilot uses AI. Check for mistakes.
@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

📝 Walkthrough

Walkthrough

Adds ColModernVBERT, a late-interaction multimodal ONNX embedding model and registers it. Extends ONNX image handling to support patch-aware 4D/5D inputs and to return an OnnxOutputContext containing model_output, attention_mask, and metadata. Adds numpy/PIL image utilities (resize_longest_edge, crop_ndarray, resize_ndarray) and new image transforms (ResizeLongestEdge, ResizeForVisionEncoder, ImageSplitter, SquareResize); updates Normalize/Rescale to accept nested patch lists. Makes tokenizer padding conditional. Adds tests and a model-caching fixture for the new model. Also adds an optional metadata field to OnnxOutputContext.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • tbung
  • dancixx
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly addresses the main change: adding support for a new model ModernVBERT/colmodernvbert, which is the primary objective of this pull request.
Description check ✅ Passed The description is related to the changeset, detailing checklist items completed for the new model submission including tests, explanations, and local test verification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
fastembed/image/transform/functional.py (1)

194-223: Verify float input range assumption.

The function assumes float arrays are normalized to [0, 1] range (line 209: img_hwc * 255). Values outside this range will clip or overflow. Also, float64 inputs are converted to float32 on output (line 212).

If this is intentional for the pipeline (where rescale happens later), consider adding a docstring note about the expected input range.

fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (1)

16-19: Consider annotating with ClassVar for type safety.

The EMBEDDINGS_REGISTRY is a mutable class attribute. Annotating it with ClassVar makes the intent explicit and prevents accidental instance-level assignment.

+from typing import Any, ClassVar, Iterable, Optional, Sequence, Type, Union
-from typing import Any, Iterable, Optional, Sequence, Type, Union
-    EMBEDDINGS_REGISTRY: list[Type[LateInteractionMultimodalEmbeddingBase]] = [
+    EMBEDDINGS_REGISTRY: ClassVar[list[Type[LateInteractionMultimodalEmbeddingBase]]] = [
         ColPali,
         ColModernVBERT,
     ]
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (1)

173-177: Unused context manager.

The ExitStack is created but never used to manage any resources. If this was intended for managing image file handles, consider either implementing that pattern or removing the unused context manager.

fastembed/late_interaction_multimodal/colmodernvbert.py (2)

136-153: Consider adding error handling for missing config files.

If any of the config files (processor_config.json, preprocessor_config.json, config.json) are missing, the error won't clearly indicate which model configuration is incomplete. Consider wrapping in try-except with a clearer message.

         # Load image processing configuration
         processor_config_path = self._model_dir / "processor_config.json"
-        with open(processor_config_path) as f:
-            processor_config = json.load(f)
-            self.image_seq_len = processor_config.get("image_seq_len", 64)
+        try:
+            with open(processor_config_path) as f:
+                processor_config = json.load(f)
+                self.image_seq_len = processor_config.get("image_seq_len", 64)
+        except FileNotFoundError:
+            raise FileNotFoundError(
+                f"processor_config.json not found at {processor_config_path}. "
+                "Ensure the model was downloaded correctly."
+            )

167-172: Add guard for self.image_size being None.

self.image_size is Optional[int] and initialized to None. While load_onnx_model should be called before this method, adding an assertion would provide a clearer error message and satisfy type checking.

     def _preprocess_onnx_text_input(
         self, onnx_input: dict[str, NumpyArray], **kwargs: Any
     ) -> dict[str, NumpyArray]:
-        """
-        Post-process the ONNX model output to convert it into a usable format.
-
-        Args:
-            output (OnnxOutputContext): The raw output from the ONNX model.
-
-        Returns:
-            Iterable[NumpyArray]: Post-processed output as NumPy arrays.
-        """
+        """Preprocess text input by adding empty image placeholders."""
+        assert self.image_size is not None, "image_size not initialized. Call load_onnx_model first."
         batch_size, seq_length = onnx_input["input_ids"].shape
         empty_image_placeholder: NumpyArray = np.zeros(
-            (batch_size, seq_length, 3, self.image_size, self.image_size), dtype=np.float32  # type: ignore[type-var,arg-type,assignment]
+            (batch_size, seq_length, 3, self.image_size, self.image_size), dtype=np.float32
         )
         onnx_input["pixel_values"] = empty_image_placeholder
         return onnx_input

Note: The docstring also appears to be copy-pasted from a post-process method — updated it to match the actual purpose.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 428381c and e462bf0.

📒 Files selected for processing (8)
  • fastembed/common/onnx_model.py (1 hunks)
  • fastembed/common/preprocessor_utils.py (1 hunks)
  • fastembed/image/transform/functional.py (1 hunks)
  • fastembed/image/transform/operators.py (9 hunks)
  • fastembed/late_interaction_multimodal/colmodernvbert.py (1 hunks)
  • fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (2 hunks)
  • fastembed/late_interaction_multimodal/onnx_multimodal_model.py (1 hunks)
  • tests/test_late_interaction_multimodal.py (4 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py

16-19: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

fastembed/late_interaction_multimodal/colmodernvbert.py

156-156: Unused method argument: kwargs

(ARG002)


189-189: Unused method argument: kwargs

(ARG002)


194-194: Unused method argument: kwargs

(ARG002)

fastembed/image/transform/operators.py

390-392: Avoid specifying long messages outside the exception class

(TRY003)


429-429: Avoid specifying long messages outside the exception class

(TRY003)


432-432: Unused static method argument: config

(ARG004)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🔇 Additional comments (18)
fastembed/common/preprocessor_utils.py (1)

53-56: LGTM! Conditional padding avoids overwriting existing configuration.

The guard prevents re-enabling padding when it's already configured, which is useful when tokenizer configs already include padding settings.

fastembed/image/transform/operators.py (9)

44-55: LGTM! Safe handling of both flat and nested image structures.

The images and isinstance(images[0], list) pattern correctly short-circuits on empty input, preventing IndexError.


75-86: Consistent with Normalize pattern.

Same safe short-circuit evaluation for nested structure detection.


109-121: Clean implementation delegating to functional utility.


124-167: Logic correctly aligns dimensions to multiples of max_size.

The aspect ratio calculation and rounding ensure proper vision encoder compatibility. PIL's (width, height) convention is correctly used in the resize call.


170-242: Patch splitting logic handles boundary patches correctly.

Edge patches are properly clamped via min(start_x + optimal_width, width), ensuring no out-of-bounds access. Note that boundary patches may have smaller dimensions than interior patches, which is expected behavior for grid-based splitting.


245-267: Output format correctly matches ImageSplitter for pipeline compatibility.

Wrapping single resized image in inner list ensures downstream transforms (Normalize, Rescale) work uniformly.


386-404: Proper handling of Idefics3ImageProcessor configuration.

The resample parameter conversion from int to Image.Resampling handles configs that serialize enums as integers.


426-429: Intentional no-op for Idefics3ImageProcessor center crop.

The unused config parameter (ARG004 hint) is required by the method signature pattern. Safely ignore this static analysis warning.


435-455: Clean conditional pipeline configuration for image splitting.

The @classmethod decorator is appropriate here since it follows the pattern of other _get_* methods. The chained .get() calls provide safe defaults.

fastembed/image/transform/functional.py (2)

152-175: Even dimension enforcement is useful for vision encoder compatibility.

The aspect ratio calculation and dimension adjustment logic is correct. Ensuring even dimensions (lines 170-173) is a common requirement for certain encoder architectures.


178-191: Simple and correct crop implementation.

Relies on caller for bounds validation, which is appropriate since ImageSplitter already clamps coordinates before calling this function.

tests/test_late_interaction_multimodal.py (4)

24-34: Test data added for colmodernvbert image embeddings.

Per the PR objectives, the code snippet showing how canonical test values were computed was not added. Consider adding this to the PR description or as a comment for reproducibility.


49-59: Query embedding canonical values added.

Consistent with image embedding test data structure.


115-117: Embedding size verification for colmodernvbert.

Confirms the model reports 128-dimensional embeddings, matching the existing colpali model.


130-133: Instance-based embedding size check added.

Tests lazy-loaded model instance reports correct embedding size.

fastembed/common/onnx_model.py (1)

19-24: Clean backward-compatible extension of OnnxOutputContext.

The optional metadata field allows carrying additional context (e.g., patch counts for image splitting) without breaking existing code that doesn't use it.

fastembed/late_interaction_multimodal/colmodernvbert.py (1)

35-45: Clean implementation following existing patterns.

The ColModernVBERT class properly inherits from both LateInteractionMultimodalEmbeddingBase and OnnxMultimodalModel, follows the established patterns from ColPali, and implements the required abstract methods. The visual prompt prefix and overall structure look correct.

Comment on lines 186 to 194
patch_counts = [len(patches) for patches in processed]
max_patches = max(patch_counts)

# Get dimensions from first patch
C, H, W = processed[0][0].shape

# Create padded array
batch_size = len(processed)
encoded = np.zeros((batch_size, max_patches, C, H, W), dtype=processed[0][0].dtype)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential IndexError if an image produces no patches.

If processed[0] is an empty list, accessing processed[0][0] will raise an IndexError. Consider adding a guard or handling the edge case where an image might yield zero patches.

             patch_counts = [len(patches) for patches in processed]
             max_patches = max(patch_counts)
+            
+            if max_patches == 0:
+                raise ValueError("All images produced zero patches")

             # Get dimensions from first patch
-            C, H, W = processed[0][0].shape
+            # Find first non-empty patch list to get dimensions
+            first_patch = next(
+                (patches[0] for patches in processed if len(patches) > 0), None
+            )
+            if first_patch is None:
+                raise ValueError("No valid patches found in processed images")
+            C, H, W = first_patch.shape

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In fastembed/late_interaction_multimodal/onnx_multimodal_model.py around lines
186-194, the code assumes processed[0] contains at least one patch and uses
processed[0][0].shape which will IndexError if any image produced zero patches;
change it to first find a non-empty entry (e.g., iterate processed to locate the
first patches list with length>0) and use that patch's shape for C,H,W, and
handle the case where no images have patches (max_patches == 0) by either
returning an appropriately shaped empty encoded array or raising a clear
exception; ensure subsequent logic that pads/fills encoded handles the
zero-patch case.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
fastembed/late_interaction_multimodal/colmodernvbert.py (2)

37-41: Clarify model identifier in docstring.

The docstring references ModernVBERT/colmodernvbert but the registered model identifier is Qdrant/colmodernvbert. Consider clarifying that this is the ONNX-converted version of the original model.


158-166: Incorrect docstring: describes post-processing but method preprocesses.

This docstring describes post-processing output, but the method actually preprocesses text input by adding empty image placeholders.

🧹 Nitpick comments (1)
fastembed/late_interaction_multimodal/colmodernvbert.py (1)

136-153: Add error handling for config file loading.

The config file loading lacks error handling for missing or malformed JSON files. Consider wrapping in try-except to provide clearer error messages when config files are missing or corrupted.

         # Load image processing configuration
         processor_config_path = self._model_dir / "processor_config.json"
-        with open(processor_config_path) as f:
-            processor_config = json.load(f)
-            self.image_seq_len = processor_config.get("image_seq_len", 64)
+        try:
+            with open(processor_config_path) as f:
+                processor_config = json.load(f)
+                self.image_seq_len = processor_config.get("image_seq_len", 64)
+        except FileNotFoundError:
+            raise FileNotFoundError(
+                f"Required config file not found: {processor_config_path}"
+            )
+        except json.JSONDecodeError as e:
+            raise ValueError(f"Invalid JSON in {processor_config_path}: {e}")

Apply similar handling to the other config file reads.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e462bf0 and 35b9026.

📒 Files selected for processing (1)
  • fastembed/late_interaction_multimodal/colmodernvbert.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
fastembed/late_interaction_multimodal/colmodernvbert.py

156-156: Unused method argument: kwargs

(ARG002)


189-189: Unused method argument: kwargs

(ARG002)


194-194: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Python 3.13.x on windows-latest test
  • GitHub Check: Python 3.9.x on macos-latest test
  • GitHub Check: Python 3.13.x on macos-latest test
  • GitHub Check: Python 3.10.x on ubuntu-latest test
  • GitHub Check: Python 3.9.x on windows-latest test
  • GitHub Check: Python 3.9.x on ubuntu-latest test
🔇 Additional comments (7)
fastembed/late_interaction_multimodal/colmodernvbert.py (7)

1-32: LGTM!

Imports and model description are well-structured and follow the existing patterns in the codebase.


174-191: LGTM!

These methods are straightforward. The unused kwargs parameters are likely intentional for interface compatibility with base class method signatures.


258-280: LGTM!

The grid dimension computation logic is sound with appropriate fallback handling for edge cases.


330-346: LGTM!

The post-processing logic correctly reshapes the output tensor using the model's embedding dimension.


348-418: LGTM!

Both embedding methods follow the established patterns and correctly delegate to internal implementation methods.


420-426: LGTM!

Worker class references are correct.


429-446: LGTM!

Worker classes correctly implement init_embedding with appropriate defaults for parallel processing contexts.

Comment on lines +106 to +111
self.mask_token_id = None
self.pad_token_id = None
self.image_seq_len: Optional[int] = None
self.max_image_size: Optional[int] = None
self.image_size: Optional[int] = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, check the class definition and its base classes
head -50 fastembed/late_interaction_multimodal/colmodernvbert.py | grep -A 5 "^class "

Repository: qdrant/fastembed

Length of output: 367


🏁 Script executed:

#!/bin/bash
# Search for mask_token_id and pad_token_id usage across the codebase
rg -n "mask_token_id|pad_token_id" --type py

Repository: qdrant/fastembed

Length of output: 1452


🏁 Script executed:

#!/bin/bash
# Check if the file has any references to these attributes
cat -n fastembed/late_interaction_multimodal/colmodernvbert.py | grep -E "(mask_token_id|pad_token_id)"

Repository: qdrant/fastembed

Length of output: 332


🏁 Script executed:

#!/bin/bash
# Find the LateInteractionMultimodalEmbeddingBase class definition
fd -e py | xargs grep -l "class LateInteractionMultimodalEmbeddingBase"

Repository: qdrant/fastembed

Length of output: 143


🏁 Script executed:

#!/bin/bash
# Check the base class for any requirements on mask_token_id and pad_token_id
rg -A 30 "class LateInteractionMultimodalEmbeddingBase" --type py

Repository: qdrant/fastembed

Length of output: 3890


Remove unused instance attributes mask_token_id and pad_token_id.

These attributes are initialized to None at lines 106-107 but are never assigned values or used anywhere in the class. The base class LateInteractionMultimodalEmbeddingBase does not require them, and the class logic uses local variables instead (e.g., pad_token_id is obtained directly from self.tokenizer.padding["pad_id"] at line 235). These are dead code and should be removed.

🤖 Prompt for AI Agents
In fastembed/late_interaction_multimodal/colmodernvbert.py around lines 106 to
110, remove the dead instance attributes mask_token_id and pad_token_id which
are initialized to None but never used; delete those two attribute
initializations and ensure no other code depends on self.mask_token_id or
self.pad_token_id (use local tokenizer/padding lookups already present), then
run tests/linters to confirm no references remain.

Comment on lines 167 to 173
batch_size, seq_length = onnx_input["input_ids"].shape
empty_image_placeholder: NumpyArray = np.zeros(
(batch_size, seq_length, 3, self.image_size, self.image_size), dtype=np.float32 # type: ignore[type-var,arg-type,assignment]
)
onnx_input["pixel_values"] = empty_image_placeholder
return onnx_input
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against None value for self.image_size.

self.image_size is Optional[int] initialized to None, and is only set during load_onnx_model(). If this method is called before the model is loaded (e.g., with lazy_load=True), using self.image_size will produce an invalid array shape.

+        if self.image_size is None:
+            raise RuntimeError(
+                "Model not loaded. Call load_onnx_model() first or initialize with lazy_load=False"
+            )
         batch_size, seq_length = onnx_input["input_ids"].shape
-        empty_image_placeholder: NumpyArray = np.zeros(
-            (batch_size, seq_length, 3, self.image_size, self.image_size), dtype=np.float32  # type: ignore[type-var,arg-type,assignment]
-        )
+        empty_image_placeholder: NumpyArray = np.zeros(
+            (batch_size, seq_length, 3, self.image_size, self.image_size), dtype=np.float32
+        )
🤖 Prompt for AI Agents
In fastembed/late_interaction_multimodal/colmodernvbert.py around lines 167-172,
guard against self.image_size being None before using it to build the
empty_image_placeholder; if self.image_size is None either call
self.load_onnx_model() (if safe here) or raise a clear ValueError indicating the
model must be loaded (or lazy_load disabled) before creating pixel_values, and
ensure you cast/check the value to an int before using it to construct the numpy
shape so an invalid shape is never created.

Comment on lines +233 to +241
# Get padding config from tokenizer
padding_direction = self.tokenizer.padding["direction"] # type: ignore[index,union-attr]
pad_token_id = self.tokenizer.padding["pad_id"] # type: ignore[index,union-attr]

# Initialize with pad token
padded_input_ids = np.full((batch_size, max_len), pad_token_id, dtype=np.int64)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add guard for uninitialized tokenizer.

self.tokenizer could be None if the model isn't fully loaded. The type: ignore comments suppress the error but don't prevent a runtime TypeError.

+        if self.tokenizer is None or self.tokenizer.padding is None:
+            raise RuntimeError("Tokenizer not initialized. Ensure model is loaded.")
+
         # Get padding config from tokenizer
-        padding_direction = self.tokenizer.padding["direction"]  # type: ignore[index,union-attr]
-        pad_token_id = self.tokenizer.padding["pad_id"]  # type: ignore[index,union-attr]
+        padding_direction = self.tokenizer.padding["direction"]
+        pad_token_id = self.tokenizer.padding["pad_id"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Get padding config from tokenizer
padding_direction = self.tokenizer.padding["direction"] # type: ignore[index,union-attr]
pad_token_id = self.tokenizer.padding["pad_id"] # type: ignore[index,union-attr]
# Initialize with pad token
padded_input_ids = np.full((batch_size, max_len), pad_token_id, dtype=np.int64)
if self.tokenizer is None or self.tokenizer.padding is None:
raise RuntimeError("Tokenizer not initialized. Ensure model is loaded.")
# Get padding config from tokenizer
padding_direction = self.tokenizer.padding["direction"]
pad_token_id = self.tokenizer.padding["pad_id"]
# Initialize with pad token
padded_input_ids = np.full((batch_size, max_len), pad_token_id, dtype=np.int64)
🤖 Prompt for AI Agents
fastembed/late_interaction_multimodal/colmodernvbert.py lines 233-238: the code
assumes self.tokenizer is set and immediately accesses its padding attributes,
which can raise a TypeError if tokenizer is None; add a guard that checks if
self.tokenizer is None and either (a) raise a clear RuntimeError/ValueError
stating the model/tokenizer is not initialized and instructing how to load it,
or (b) attempt to lazy-initialize or retrieve a fallback tokenizer before
accessing padding; after the guard, safely read padding_direction and
pad_token_id and remove reliance on the type: ignore suppression so the runtime
can fail loudly or recover with a proper fallback.

Comment on lines +282 to +314
def _create_single_image_prompt_string(self) -> str:
return (
"<fake_token_around_image>"
+ "<global-img>"
+ "<image>" * self.image_seq_len # type: ignore[operator]
+ "<fake_token_around_image>"
)

def _create_split_image_prompt_string(self, rows: int, cols: int) -> str:
text_split_images = ""

# Add tokens for each patch in the grid
for n_h in range(rows):
for n_w in range(cols):
text_split_images += (
"<fake_token_around_image>"
+ f"<row_{n_h + 1}_col_{n_w + 1}>"
+ "<image>" * self.image_seq_len # type: ignore[operator]
)
text_split_images += "\n"

# Add global image at the end
text_split_images += (
"\n<fake_token_around_image>"
+ "<global-img>"
+ "<image>" * self.image_seq_len # type: ignore[operator]
+ "<fake_token_around_image>"
)

return text_split_images
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against None value for self.image_seq_len.

Similar to self.image_size, self.image_seq_len could be None if the model isn't loaded, causing string multiplication to fail with TypeError.

Consider adding a runtime check at the start of methods that depend on model configuration:

if self.image_seq_len is None:
    raise RuntimeError("Model not loaded. Call load_onnx_model() first.")
🤖 Prompt for AI Agents
In fastembed/late_interaction_multimodal/colmodernvbert.py around lines 282 to
311, both _create_single_image_prompt_string and
_create_split_image_prompt_string assume self.image_seq_len is not None and will
raise a TypeError when it's None; add a runtime guard at the start of each
method that checks if self.image_seq_len is None and raises a RuntimeError with
a clear message like "Model not loaded. Call load_onnx_model() first." so
callers get a meaningful error instead of a TypeError from string
multiplication.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
fastembed/late_interaction_multimodal/colmodernvbert.py (4)

159-165: Fix incorrect docstring.

The docstring describes post-processing output, but this method actually preprocesses text input by adding empty image placeholders. The Args describe output when the parameter is onnx_input, and Returns describe the updated input dictionary.

-        Post-process the ONNX model output to convert it into a usable format.
+        Preprocesses the text input for the ONNX model by adding empty image placeholders.
 
         Args:
-            output (OnnxOutputContext): The raw output from the ONNX model.
+            onnx_input (dict[str, NumpyArray]): The text input dictionary for the ONNX model.
 
         Returns:
-            Iterable[NumpyArray]: Post-processed output as NumPy arrays.
+            dict[str, NumpyArray]: The updated input dictionary with empty image placeholders added.

233-238: Add guard for uninitialized tokenizer.

self.tokenizer is Optional[Tokenizer] and could be None if the model isn't fully loaded. Accessing self.tokenizer.padding at lines 234-235 will raise an AttributeError if called with lazy_load=True before the model is loaded. The type: ignore comments suppress type errors but don't prevent the runtime exception.

+        if self.tokenizer is None or self.tokenizer.padding is None:
+            raise RuntimeError("Tokenizer not initialized. Ensure model is loaded.")
+
         # Get padding config from tokenizer
         padding_direction = self.tokenizer.padding["direction"]  # type: ignore[index,union-attr]
         pad_token_id = self.tokenizer.padding["pad_id"]  # type: ignore[index,union-attr]

167-170: Guard against None value for self.image_size.

self.image_size is Optional[int] initialized to None (line 110) and only set during load_onnx_model(). If this method is called with lazy_load=True before the model is loaded, using self.image_size at line 169 will create an invalid array shape or raise a TypeError.

+        if self.image_size is None:
+            raise RuntimeError(
+                "Model not loaded. Call load_onnx_model() first or initialize with lazy_load=False"
+            )
         batch_size, seq_length = onnx_input["input_ids"].shape
         empty_image_placeholder: NumpyArray = np.zeros(
             (batch_size, seq_length, 3, self.image_size, self.image_size), dtype=np.float32  # type: ignore[type-var,arg-type,assignment]

282-311: Guard against None value for self.image_seq_len.

Both _create_single_image_prompt_string and _create_split_image_prompt_string use self.image_seq_len for string multiplication (lines 286, 299, 307). self.image_seq_len is Optional[int] initialized to None (line 108) and only set during load_onnx_model(). If called with lazy_load=True before the model is loaded, these will raise TypeError: can't multiply sequence by NoneType.

Add a runtime check at the start of both methods:

 def _create_single_image_prompt_string(self) -> str:
+    if self.image_seq_len is None:
+        raise RuntimeError("Model not loaded. Call load_onnx_model() first.")
     return (
 def _create_split_image_prompt_string(self, rows: int, cols: int) -> str:
+    if self.image_seq_len is None:
+        raise RuntimeError("Model not loaded. Call load_onnx_model() first.")
     text_split_images = ""
🧹 Nitpick comments (2)
fastembed/late_interaction_multimodal/colmodernvbert.py (2)

37-40: Clarify the relationship between original and ONNX models in docstring.

The docstring references "ModernVBERT/colmodernvbert" but the actual model identifier used is "Qdrant/colmodernvbert" (line 23). Consider clarifying that Qdrant hosts the ONNX-converted version of the original ModernVBERT model to help users understand which model files are being loaded.

-    The ModernVBERT/colmodernvbert model implementation. This model uses
-    bidirectional attention, which proves to work better for retrieval.
-
-    See: https://huggingface.co/ModernVBERT/colmodernvbert
+    The ColModernVBERT model implementation using ONNX runtime. This model uses
+    bidirectional attention, which proves to work better for retrieval.
+
+    This is the ONNX-converted version hosted at https://huggingface.co/Qdrant/colmodernvbert
+    Original model: https://huggingface.co/ModernVBERT/colmodernvbert

106-107: Remove unused instance attributes.

Based on learnings, mask_token_id and pad_token_id are initialized here but never assigned values or used anywhere in the class. The padding logic at lines 234-235 retrieves pad_token_id directly from self.tokenizer.padding["pad_id"] instead. These lines are dead code and can be safely removed.

-        self.mask_token_id = None
-        self.pad_token_id = None
         self.image_seq_len: Optional[int] = None
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35b9026 and e0acfe7.

📒 Files selected for processing (1)
  • fastembed/late_interaction_multimodal/colmodernvbert.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
fastembed/late_interaction_multimodal/colmodernvbert.py

156-156: Unused method argument: kwargs

(ARG002)


189-189: Unused method argument: kwargs

(ARG002)


194-194: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Python 3.9.x on windows-latest test
  • GitHub Check: Python 3.13.x on macos-latest test
  • GitHub Check: Python 3.13.x on windows-latest test
  • GitHub Check: Python 3.9.x on macos-latest test
  • GitHub Check: Python 3.11.x on ubuntu-latest test
🔇 Additional comments (3)
fastembed/late_interaction_multimodal/colmodernvbert.py (3)

330-346: LGTM!

The post-processing logic correctly reshapes the output and includes a proper assertion for the model dimension.


348-418: LGTM!

Both embed_text and embed_image methods properly delegate to inherited base methods with all configuration parameters correctly passed through.


420-446: LGTM!

Worker class methods and class definitions follow the standard pattern. Both text and image workers correctly initialize ColModernVBERT instances with single-threaded configuration.

@kacperlukawski kacperlukawski marked this pull request as draft December 9, 2025 12:42
@kacperlukawski kacperlukawski marked this pull request as ready for review December 9, 2025 15:15
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (2)

217-225: Guard against empty patch lists in _process_nested_patches.

If processed[0] contains zero patches (or the first few images yield no patches), processed[0][0] will raise an IndexError, even though later images may have valid patches. Similarly, if all images produce zero patches, max(patch_counts) is 0 and you have no patch to infer C, H, W from.

Consider locating the first non-empty patch list and handling the all-empty case explicitly:

-        patch_counts = [len(patches) for patches in processed]
-        max_patches = max(patch_counts)
-
-        # Get dimensions from first patch
-        C, H, W = processed[0][0].shape
+        patch_counts = [len(patches) for patches in processed]
+        max_patches = max(patch_counts)
+
+        if max_patches == 0:
+            raise ValueError("All images produced zero patches")
+
+        # Get dimensions from the first non-empty patch list
+        first_patch = next((patches[0] for patches in processed if patches), None)
+        if first_patch is None:
+            raise ValueError("No valid patches found in processed images")
+        C, H, W = first_patch.shape

This avoids hard failures on edge cases where some or all images contribute no patches.


187-190: Use encoded.shape[0] as the batch size source in _process_flat_images.

_process_flat_images currently takes num_images from the caller and uses it to size attention_mask and patch_counts. If a processor ever returns a different number of encoded images than the original images list (e.g., filtering, de-duplication), this will cause shape or length mismatches.

You can make the function self-consistent by deriving the batch size from encoded and dropping the num_images argument:

-            else:
-                encoded, attention_mask, metadata = self._process_flat_images(
-                    processed,  # type: ignore[arg-type]
-                    len(images),
-                )
+            else:
+                encoded, attention_mask, metadata = self._process_flat_images(
+                    processed,  # type: ignore[arg-type]
+                )

And in _process_flat_images:

-    def _process_flat_images(
-        self, processed: list[NumpyArray], num_images: int
-    ) -> tuple[NumpyArray, NumpyArray, dict[str, Any]]:
+    def _process_flat_images(
+        self, processed: list[NumpyArray]
+    ) -> tuple[NumpyArray, NumpyArray, dict[str, Any]]:
@@
-        encoded = np.array(processed)
+        encoded = np.array(processed)
+        batch_size = encoded.shape[0]
@@
-        if len(encoded.shape) == 5:
-            # 5D tensor: attention_mask shape is (batch, num_patches)
-            attention_mask = np.ones((num_images, encoded.shape[1]), dtype=np.int64)
-            metadata = {"patch_counts": [encoded.shape[1]] * num_images}
+        if len(encoded.shape) == 5:
+            # 5D tensor: attention_mask shape is (batch, num_patches)
+            attention_mask = np.ones((batch_size, encoded.shape[1]), dtype=np.int64)
+            metadata = {"patch_counts": [encoded.shape[1]] * batch_size}
         else:
-            # 4D tensor: attention_mask shape is (batch, 1)
-            attention_mask = np.ones((num_images, 1), dtype=np.int64)
-            metadata = {"patch_counts": [1] * num_images}
+            # 4D tensor: attention_mask shape is (batch, 1)
+            attention_mask = np.ones((batch_size, 1), dtype=np.int64)
+            metadata = {"patch_counts": [1] * batch_size}

This keeps attention masks and metadata consistent with the actual encoded tensor regardless of how the processor behaves.

Also applies to: 258-275

🧹 Nitpick comments (2)
tests/test_late_interaction_multimodal.py (1)

24-34: Canonical values for ColModernVBERT look consistent; please keep derivation reproducible.

The added CANONICAL_IMAGE_VALUES and CANONICAL_QUERY_VALUES entries for "Qdrant/colmodernvbert" integrate cleanly with the existing tests (shapes match the token×dim slicing logic used below). To keep these stable over time and aid future debugging, it would be helpful to either:

  • add a short note or link to the script/notebook used to generate these canonical values, or
  • include the missing “how canonical values were computed” snippet in the test docs/checklist.

This isn’t blocking, but it will make future updates to the model/tests much easier to validate.

Also applies to: 49-56

fastembed/late_interaction_multimodal/onnx_multimodal_model.py (1)

179-191: Propagate preprocessed attention_mask into OnnxOutputContext for image path.

After _preprocess_onnx_image_input, you always pass the original attention_mask into OnnxOutputContext. If a subclass overrides _preprocess_onnx_image_input and adjusts or replaces the mask, the context will become inconsistent with what was actually fed to the model.

You can mirror the text path by reading from onnx_input first:

-        onnx_input = {"pixel_values": encoded, "attention_mask": attention_mask}
-        onnx_input = self._preprocess_onnx_image_input(onnx_input, **kwargs)
-        model_output = self.model.run(None, onnx_input)  # type: ignore[union-attr]
-
-        return OnnxOutputContext(
-            model_output=model_output[0],
-            attention_mask=attention_mask,  # type: ignore[arg-type]
-            metadata=metadata,
-        )
+        onnx_input = {"pixel_values": encoded, "attention_mask": attention_mask}
+        onnx_input = self._preprocess_onnx_image_input(onnx_input, **kwargs)
+        model_output = self.model.run(None, onnx_input)  # type: ignore[union-attr]
+
+        return OnnxOutputContext(
+            model_output=model_output[0],
+            attention_mask=onnx_input.get("attention_mask", attention_mask),  # type: ignore[arg-type]
+            metadata=metadata,
+        )

This keeps the context aligned with any preprocessing logic while remaining backward compatible.

Also applies to: 196-200

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0acfe7 and f4b17fa.

📒 Files selected for processing (2)
  • fastembed/late_interaction_multimodal/onnx_multimodal_model.py (1 hunks)
  • tests/test_late_interaction_multimodal.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_late_interaction_multimodal.py (1)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (3)
  • LateInteractionMultimodalEmbedding (15-189)
  • get_embedding_size (98-122)
  • embedding_size (91-95)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Python 3.9.x on macos-latest test
  • GitHub Check: Python 3.13.x on macos-latest test
  • GitHub Check: Python 3.9.x on windows-latest test
  • GitHub Check: Python 3.13.x on windows-latest test
🔇 Additional comments (2)
tests/test_late_interaction_multimodal.py (1)

112-113: Embedding size tests for ColModernVBERT match registry expectations.

The new assertions for "Qdrant/colmodernvbert" in both test_get_embedding_size and test_embedding_size correctly expect a dim of 128 and are aligned with the registry-based lookup logic. No issues here.

Also applies to: 127-129

fastembed/late_interaction_multimodal/onnx_multimodal_model.py (1)

277-299: Patch-dimension detection via ONNX input shape looks sound.

_needs_patch_dimension cleanly inspects the pixel_values input shape to distinguish 4D vs 5D models and defaults to False when the model is not yet loaded or the input is missing. This matches the intended behavior for ColModernVBERT (5D) vs earlier 4D models and should be robust against dynamic dimension names.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
fastembed/late_interaction_multimodal/colmodernvbert.py (6)

37-41: Docstring references original model, but code uses Qdrant ONNX version.

The docstring mentions ModernVBERT/colmodernvbert and links to that HuggingFace page, but the actual model identifier in supported_colmodernvbert_models is Qdrant/colmodernvbert. Consider clarifying this distinction.


107-108: Remove unused instance attributes mask_token_id and pad_token_id.

These attributes are initialized but never assigned meaningful values or used anywhere in the class. This was previously flagged.


159-166: Incorrect docstring: describes post-processing but method preprocesses input.

The docstring says "Post-process the ONNX model output" but this method actually preprocesses text input by adding empty image placeholders.


168-172: Guard against None value for self.image_size.

With lazy_load=True, self.image_size remains None until load_onnx_model() is called, causing a runtime error when creating the placeholder array. This was previously flagged.


236-238: Add guard for uninitialized tokenizer.

self.tokenizer could be None if the model isn't loaded. This was previously flagged.


285-314: Guard against None value for self.image_seq_len.

self.image_seq_len could be None if the model isn't loaded, causing TypeError on string multiplication. This was previously flagged.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4b17fa and 94366ea.

📒 Files selected for processing (2)
  • fastembed/late_interaction_multimodal/colmodernvbert.py
  • tests/test_late_interaction_multimodal.py
🧰 Additional context used
🪛 Ruff (0.14.10)
fastembed/late_interaction_multimodal/colmodernvbert.py

46-46: Possible hardcoded password assigned to: "QUERY_AUGMENTATION_TOKEN"

(S105)


157-157: Unused method argument: kwargs

(ARG002)


190-190: Unused method argument: kwargs

(ARG002)


197-197: Unused method argument: kwargs

(ARG002)

🔇 Additional comments (11)
fastembed/late_interaction_multimodal/colmodernvbert.py (7)

21-32: Model description looks well-structured.

The model descriptor includes all necessary fields and the additional file processor_config.json is correctly specified for the image processing configuration.


244-258: Padding logic correctly handles both left and right directions.

The implementation properly places tokens based on padding_direction and constructs appropriate attention masks.


261-283: Grid computation logic handles edge cases appropriately.

The method correctly handles single patches (patch_count <= 1), attempts square grid detection first, and falls back to factor enumeration for non-square grids. The final fallback to (0, 0) ensures graceful degradation.


316-331: Prompt construction and tokenization logic is correct.

The method correctly selects the appropriate prompt format based on grid dimensions and properly tokenizes the expanded prompt.


333-349: Post-processing correctly reshapes output to expected dimensions.

The assertion guards against misconfigured models, and the reshape operation produces the expected (batch, n, dim) output format.


351-421: Embedding methods correctly delegate to base class utilities.

Both embed_text and embed_image properly pass through all configuration options and use yield from for efficient lazy iteration.


432-449: Worker classes are correctly implemented.

Both ColModernVBERTTextEmbeddingWorker and ColModernVBERTImageEmbeddingWorker properly initialize the ColModernVBERT model with threads=1 for parallel processing scenarios.

tests/test_late_interaction_multimodal.py (4)

24-34: Canonical image embedding values added for new model.

The test data follows the established pattern with abridged and rounded values. Note that the PR checklist indicates the code snippet showing how these canonical values were computed was not added - consider adding this for reproducibility.


49-59: Canonical query embedding values added for new model.

The test data is consistent with the existing format.


115-117: Embedding size test added for new model.

The assertion correctly validates the 128-dimensional embedding size matching the model description.


130-133: Lazy-load embedding size test added for new model.

The test correctly validates that the embedding size is accessible even with lazy_load=True.

Comment on lines +142 to +147
preprocessor_config_path = self._model_dir / "preprocessor_config.json"
with open(preprocessor_config_path) as f:
preprocessor_config = json.load(f)
self.max_image_size = preprocessor_config.get("max_image_size", {}).get(
"longest_edge", 512
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential AttributeError if max_image_size is not a dict.

If preprocessor_config.get("max_image_size", {}) returns a non-dict value (e.g., an integer), the subsequent .get("longest_edge", 512) call will fail with an AttributeError.

🔎 Suggested defensive handling
         preprocessor_config_path = self._model_dir / "preprocessor_config.json"
         with open(preprocessor_config_path) as f:
             preprocessor_config = json.load(f)
-            self.max_image_size = preprocessor_config.get("max_image_size", {}).get(
-                "longest_edge", 512
-            )
+            max_image_size_config = preprocessor_config.get("max_image_size", {})
+            if isinstance(max_image_size_config, dict):
+                self.max_image_size = max_image_size_config.get("longest_edge", 512)
+            else:
+                self.max_image_size = max_image_size_config if isinstance(max_image_size_config, int) else 512
🤖 Prompt for AI Agents
In fastembed/late_interaction_multimodal/colmodernvbert.py around lines 142 to
147, the code assumes preprocessor_config.get("max_image_size", {}) returns a
dict and calls .get("longest_edge", 512), which will raise AttributeError if
it's an int or other non-dict; change this to read the value into a variable,
check isinstance(value, dict) and if so use value.get("longest_edge", 512), elif
isinstance(value, int) use the int directly, otherwise fall back to 512, and
assign that result to self.max_image_size so the code is robust to different
config shapes.

@kacperlukawski kacperlukawski force-pushed the feature/colmodernvbert-support branch from 94366ea to 5a6d884 Compare January 2, 2026 20:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (7)
fastembed/late_interaction_multimodal/colmodernvbert.py (6)

35-46: Class docstring references original model but deployed model differs.

The docstring references ModernVBERT/colmodernvbert but the actual model identifier is Qdrant/colmodernvbert (line 23). Consider clarifying that this is the ONNX-converted version.


107-108: Remove unused instance attributes mask_token_id and pad_token_id.

These attributes are initialized to None but never assigned values or used anywhere in the class. The tokenizer padding is accessed directly via self.tokenizer.padding["pad_id"] at line 238. These are dead code.


142-147: Potential AttributeError if max_image_size is not a dict.

If preprocessor_config.get("max_image_size", {}) returns a non-dict value (e.g., an integer), the subsequent .get("longest_edge", 512) call will fail.

🔎 Proposed defensive handling
         preprocessor_config_path = self._model_dir / "preprocessor_config.json"
         with open(preprocessor_config_path) as f:
             preprocessor_config = json.load(f)
-            self.max_image_size = preprocessor_config.get("max_image_size", {}).get(
-                "longest_edge", 512
-            )
+            max_image_size_config = preprocessor_config.get("max_image_size", {})
+            if isinstance(max_image_size_config, dict):
+                self.max_image_size = max_image_size_config.get("longest_edge", 512)
+            else:
+                self.max_image_size = max_image_size_config if isinstance(max_image_size_config, int) else 512

156-173: Docstring describes post-processing but method preprocesses input; also missing None guard for self.image_size.

Two issues:

  1. The docstring incorrectly describes post-processing output when this method preprocesses text input by adding image placeholders.
  2. self.image_size is Optional[int] and only set during load_onnx_model(). With lazy_load=True, using it before model load will produce an invalid array shape.
🔎 Proposed fix
     def _preprocess_onnx_text_input(
         self, onnx_input: dict[str, NumpyArray], **kwargs: Any
     ) -> dict[str, NumpyArray]:
         """
-        Post-process the ONNX model output to convert it into a usable format.
+        Preprocess text input by adding empty image placeholders for the ONNX model.

         Args:
-            output (OnnxOutputContext): The raw output from the ONNX model.
+            onnx_input (dict[str, NumpyArray]): The text input dictionary for the ONNX model.

         Returns:
-            Iterable[NumpyArray]: Post-processed output as NumPy arrays.
+            dict[str, NumpyArray]: Updated input dictionary with empty image placeholders.
         """
+        if self.image_size is None:
+            raise RuntimeError(
+                "Model not loaded. Call load_onnx_model() first or initialize with lazy_load=False"
+            )
         batch_size, seq_length = onnx_input["input_ids"].shape

236-238: Add guard for uninitialized tokenizer.

self.tokenizer could be None if the model isn't fully loaded. The type: ignore comments suppress the error but don't prevent a runtime TypeError.

🔎 Proposed fix
+        if self.tokenizer is None or self.tokenizer.padding is None:
+            raise RuntimeError("Tokenizer not initialized. Ensure model is loaded.")
+
         # Get padding config from tokenizer
-        padding_direction = self.tokenizer.padding["direction"]  # type: ignore[index,union-attr]
-        pad_token_id = self.tokenizer.padding["pad_id"]  # type: ignore[index,union-attr]
+        padding_direction = self.tokenizer.padding["direction"]
+        pad_token_id = self.tokenizer.padding["pad_id"]

285-314: Guard against None value for self.image_seq_len.

self.image_seq_len could be None if the model isn't loaded, causing string multiplication ("<image>" * self.image_seq_len) to fail with TypeError.

🔎 Proposed fix

Add a runtime check at the start of methods that use image_seq_len:

if self.image_seq_len is None:
    raise RuntimeError("Model not loaded. Call load_onnx_model() first.")
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (1)

217-237: Potential IndexError if an image produces zero patches.

The past review comment about this issue was not fully addressed. If processed[0] is an empty list, accessing processed[0][0].shape at line 221 will raise an IndexError. Additionally, if processed itself is empty or all images produce zero patches, max(patch_counts) will be 0, and the dimension extraction will fail.

🔎 Proposed defensive handling
     def _process_nested_patches(
         self, processed: list[list[NumpyArray]]
     ) -> tuple[NumpyArray, NumpyArray, dict[str, Any]]:
         ...
         patch_counts = [len(patches) for patches in processed]
         max_patches = max(patch_counts)
+        
+        if max_patches == 0:
+            raise ValueError("All images produced zero patches")

         # Get dimensions from first patch
-        C, H, W = processed[0][0].shape
+        # Find first non-empty patch list to get dimensions
+        first_patch = next(
+            (patches[0] for patches in processed if len(patches) > 0), None
+        )
+        if first_patch is None:
+            raise ValueError("No valid patches found in processed images")
+        C, H, W = first_patch.shape
         batch_size = len(processed)
🧹 Nitpick comments (3)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (1)

16-19: Registry extended to include ColModernVBERT.

The EMBEDDINGS_REGISTRY now includes both ColPali and ColModernVBERT, enabling model discovery and instantiation via the unified interface.

Per the static analysis hint (RUF012), consider annotating mutable class attributes with typing.ClassVar for clarity:

EMBEDDINGS_REGISTRY: ClassVar[list[Type[LateInteractionMultimodalEmbeddingBase]]] = [...]

This is a minor improvement and not blocking.

fastembed/image/transform/functional.py (1)

192-221: Consider validating the normalization assumption for float inputs.

For float32/float64 dtypes, the code assumes values are normalized to [0, 1] range (line 206). If values fall outside this range, they will be clipped during uint8 conversion, potentially causing silent data quality issues.

Consider adding validation:

🔎 Add range validation for float inputs
     # Handle different dtypes
     if img_hwc.dtype == np.float32 or img_hwc.dtype == np.float64:
-        # Assume normalized, scale to 0-255 for PIL
+        # Validate normalized range and scale to 0-255 for PIL
+        if img_hwc.min() < 0 or img_hwc.max() > 1:
+            raise ValueError(
+                f"Float images must be normalized to [0, 1] range, got [{img_hwc.min()}, {img_hwc.max()}]"
+            )
         img_hwc_scaled = (img_hwc * 255).astype(np.uint8)
fastembed/image/transform/operators.py (1)

386-404: Optional: Extract error message to exception class.

Static analysis suggests moving the long error message to a custom exception class for better maintainability.

🔎 Refactor suggestion

Define a custom exception:

class InvalidImageProcessorConfigError(ValueError):
    def __init__(self, processor_type: str, missing_key: str):
        super().__init__(
            f"Size dictionary must contain '{missing_key}' key for {processor_type}"
        )

Then use it:

                 if "longest_edge" not in size:
-                    raise ValueError(
-                        "Size dictionary must contain 'longest_edge' key for Idefics3ImageProcessor"
-                    )
+                    raise InvalidImageProcessorConfigError("Idefics3ImageProcessor", "longest_edge")
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94366ea and 5a6d884.

📒 Files selected for processing (8)
  • fastembed/common/onnx_model.py
  • fastembed/common/preprocessor_utils.py
  • fastembed/image/transform/functional.py
  • fastembed/image/transform/operators.py
  • fastembed/late_interaction_multimodal/colmodernvbert.py
  • fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py
  • fastembed/late_interaction_multimodal/onnx_multimodal_model.py
  • tests/test_late_interaction_multimodal.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • fastembed/common/preprocessor_utils.py
  • fastembed/common/onnx_model.py
🧰 Additional context used
🧬 Code graph analysis (5)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (1)
fastembed/late_interaction_multimodal/colmodernvbert.py (1)
  • ColModernVBERT (35-429)
tests/test_late_interaction_multimodal.py (1)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (2)
  • get_embedding_size (98-122)
  • embedding_size (91-95)
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (3)
fastembed/late_interaction_multimodal/colmodernvbert.py (1)
  • _preprocess_onnx_image_input (196-259)
fastembed/late_interaction_multimodal/colpali.py (1)
  • _preprocess_onnx_image_input (210-227)
fastembed/common/onnx_model.py (1)
  • OnnxOutputContext (20-24)
fastembed/image/transform/operators.py (2)
fastembed/image/transform/functional.py (5)
  • crop_ndarray (176-189)
  • normalize (63-92)
  • resize (95-111)
  • resize_longest_edge (150-173)
  • resize_ndarray (192-221)
fastembed/common/utils.py (1)
  • normalize (18-23)
fastembed/late_interaction_multimodal/colmodernvbert.py (6)
fastembed/common/model_description.py (2)
  • DenseModelDescription (35-40)
  • ModelSource (7-20)
fastembed/common/onnx_model.py (2)
  • OnnxOutputContext (20-24)
  • _select_exposed_session_options (128-137)
fastembed/common/utils.py (1)
  • define_cache_dir (48-59)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding_base.py (1)
  • LateInteractionMultimodalEmbeddingBase (10-86)
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (4)
  • OnnxMultimodalModel (20-356)
  • load_onnx_model (82-83)
  • _load_onnx_model (59-80)
  • tokenize (85-86)
fastembed/common/model_management.py (2)
  • _get_model_description (67-84)
  • download_model (372-471)
🪛 GitHub Actions: Tests
fastembed/image/transform/functional.py

[error] 153-153: NameError: name 'Union' is not defined. Possibly missing 'from typing import Union' in the module.

🪛 GitHub Actions: type-checkers
fastembed/image/transform/functional.py

[error] 153-153: Name 'Union' is not defined. Did you forget to import it from 'typing'?


[error] 195-195: Name 'Union' is not defined. Did you forget to import it from 'typing'?

fastembed/image/transform/operators.py

[error] 45-45: Name 'Union' is not defined. Did you forget to import it from 'typing'?


[error] 46-46: Name 'Union' is not defined. Did you forget to import it from 'typing'?


[error] 76-76: Name 'Union' is not defined. Did you forget to import it from 'typing'?


[error] 77-77: Name 'Union' is not defined. Did you forget to import it from 'typing'?

🪛 Ruff (0.14.10)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py

16-19: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

fastembed/image/transform/functional.py

153-153: Undefined name Union

(F821)


195-195: Undefined name Union

(F821)

fastembed/image/transform/operators.py

45-45: Undefined name Union

(F821)


46-46: Undefined name Union

(F821)


76-76: Undefined name Union

(F821)


77-77: Undefined name Union

(F821)


390-392: Avoid specifying long messages outside the exception class

(TRY003)


429-429: Avoid specifying long messages outside the exception class

(TRY003)


432-432: Unused static method argument: config

(ARG004)

fastembed/late_interaction_multimodal/colmodernvbert.py

46-46: Possible hardcoded password assigned to: "QUERY_AUGMENTATION_TOKEN"

(S105)


157-157: Unused method argument: kwargs

(ARG002)


190-190: Unused method argument: kwargs

(ARG002)


197-197: Unused method argument: kwargs

(ARG002)

🔇 Additional comments (19)
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (3)

179-200: Refactored image embedding flow with proper dispatch logic looks good.

The dispatch between nested patches and flat images based on isinstance(processed[0], list) is clear and appropriate for distinguishing ColModernVBERT's patch-based processing from standard processors.


239-275: Flat image processing handles both 4D and 5D tensor shapes correctly.

The logic to conditionally add a patch dimension based on ONNX model signature is sound. The attention mask and metadata shapes are derived consistently from the encoded tensor dimensions.


277-299: ONNX signature inspection for patch dimension detection is well-implemented.

Checking len(input_meta.shape) == 5 for the pixel_values input is a reliable way to determine model expectations. The fallback to False ensures backward compatibility.

tests/test_late_interaction_multimodal.py (4)

24-34: Canonical image values for ColModernVBERT added correctly.

The test data follows the same structure and precision as the existing ColPali values.


49-59: Canonical query values for ColModernVBERT added correctly.

Consistent with the existing test pattern for query embeddings.


115-116: Embedding size test for ColModernVBERT.

The expected embedding size of 128 matches the model description defined in colmodernvbert.py.


130-132: Embedding size property test for ColModernVBERT with lazy_load.

Good addition to ensure the embedding_size property works correctly with lazy loading for the new model.

fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (1)

7-7: Import for ColModernVBERT added correctly.

The import follows the existing pattern for ColPali.

fastembed/late_interaction_multimodal/colmodernvbert.py (6)

21-32: Model description for ColModernVBERT is well-defined.

The model registry entry includes all required fields and correctly specifies the HuggingFace source.


261-283: Patch grid computation logic is correct.

The algorithm to compute rows/cols from patch count handles both square and non-square grids, with appropriate fallback to unsplit (0, 0) when computation fails.


316-331: Input ID building logic for image prompts is well-structured.

The method correctly dispatches between single and split image prompt strings and tokenizes the expanded prompt.


333-349: Image output post-processing correctly reshapes to expected dimensions.

The reshape using model_description.dim ensures consistent embedding output format.


351-421: embed_text and embed_image methods delegate correctly to base implementation.

Both methods properly pass all configuration parameters including providers, device settings, and session options.


423-449: Worker classes correctly instantiate ColModernVBERT with single thread.

The worker implementations follow the established pattern for parallel processing with threads=1.

fastembed/image/transform/operators.py (5)

44-55: LGTM: Clean backward-compatible nested list support.

The modifications to Normalize and Rescale elegantly handle both flat and nested list structures for patch-based processing while maintaining backward compatibility.

Also applies to: 75-86


124-167: Verify that significant aspect ratio changes are acceptable.

The implementation rounds both width and height up to the nearest multiple of max_size independently (lines 149-156), which can significantly alter the aspect ratio. For example, a 1000×700 image with max_size=512 becomes 1024×1024 (aspect ratio changes from ~1.43 to 1.0).

While the docstring mentions "approximately" preserving aspect ratio, please confirm this behavior aligns with the ColModernVBERT model's requirements.


170-242: LGTM: Robust image splitting logic.

The ImageSplitter implementation correctly handles grid-based patch generation with optimal patch sizing and appropriately includes a global view only when splitting is necessary.


245-267: LGTM: Consistent return structure with ImageSplitter.

The SquareResize class correctly returns list[list[NumpyArray]] to maintain API consistency with ImageSplitter, enabling seamless switching between splitting modes.


435-455: LGTM: Well-structured conditional image splitting logic.

The _get_image_splitting method cleanly orchestrates the choice between patch-based splitting and square resizing based on configuration. The docstring appropriately documents the ordering dependency with PILtoNDarray.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
fastembed/image/transform/operators.py (1)

170-242: Implicit dependency on ResizeForVisionEncoder for uniform patches.

The patch extraction logic produces uniform max_size × max_size patches only when input dimensions are exact multiples of max_size—which is guaranteed by ResizeForVisionEncoder in the pipeline. If ImageSplitter is used standalone on arbitrary inputs, patches may have varying dimensions.

Consider adding a note in the docstring about this prerequisite, or adding a runtime assertion/warning.

🔎 Optional: Add docstring clarification
 class ImageSplitter(Transform):
     """
     Split images into grid of patches plus a global view.
 
     If image dimensions exceed max_size:
     - Divide into ceil(H/max_size) x ceil(W/max_size) patches
     - Each patch is cropped from the image
     - Add a global view (original resized to max_size x max_size)
 
     If image is smaller than max_size:
     - Return single image unchanged
 
     Works on numpy arrays in (C, H, W) format.
+
+    Note: For uniform max_size x max_size patches, input dimensions should
+    be multiples of max_size. Use ResizeForVisionEncoder before this transform.
     """
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a6d884 and 2dc7de8.

📒 Files selected for processing (2)
  • fastembed/image/transform/functional.py
  • fastembed/image/transform/operators.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • fastembed/image/transform/functional.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-12T10:48:30.978Z
Learnt from: joein
Repo: qdrant/fastembed PR: 574
File: fastembed/sparse/sparse_embedding_base.py:2-2
Timestamp: 2025-11-12T10:48:30.978Z
Learning: In fastembed codebase, when using numpy NDArray types in dataclass fields, keep Union syntax instead of PEP 604 pipe operator (|) because dataclasses evaluate annotations at runtime and NDArray types don't support the __or__ operator. Add a comment explaining the constraint.

Applied to files:

  • fastembed/image/transform/operators.py
🧬 Code graph analysis (1)
fastembed/image/transform/operators.py (2)
fastembed/image/transform/functional.py (9)
  • center_crop (15-60)
  • convert_to_rgb (7-12)
  • crop_ndarray (176-189)
  • normalize (63-92)
  • pil2ndarray (118-121)
  • rescale (114-115)
  • resize (95-111)
  • resize_longest_edge (150-173)
  • resize_ndarray (192-221)
fastembed/common/utils.py (1)
  • normalize (18-23)
🪛 Ruff (0.14.10)
fastembed/image/transform/operators.py

390-392: Avoid specifying long messages outside the exception class

(TRY003)


429-429: Avoid specifying long messages outside the exception class

(TRY003)


432-432: Unused static method argument: config

(ARG004)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Python 3.13.x on windows-latest test
🔇 Additional comments (11)
fastembed/image/transform/operators.py (11)

1-17: LGTM!

The new imports (math, crop_ndarray, resize_longest_edge, resize_ndarray) are correctly added to support the new transform classes.


44-55: LGTM!

The polymorphic handling of flat vs nested image structures is implemented correctly. The images and isinstance(images[0], list) check safely handles empty lists via short-circuit evaluation.


75-86: LGTM!

Consistent with the Normalize implementation, correctly handling both flat and nested structures.


109-121: LGTM!

Clean wrapper around the resize_longest_edge functional helper, following the established Transform pattern.


139-167: LGTM!

The resize logic correctly ensures both dimensions become multiples of max_size. The aspect ratio is "approximately" preserved as documented, since independent ceiling operations on each dimension can introduce slight distortion—this is acceptable for vision encoder compatibility.


245-267: LGTM!

The singleton-list output format (list[list[NumpyArray]]) correctly matches ImageSplitter's output shape, enabling uniform downstream handling regardless of which path is taken.


306-306: LGTM!

Documentation correctly updated to reflect the new longest_edge size key support.


317-317: LGTM!

Correct pipeline ordering: _get_image_splitting is called after _get_pil2ndarray (requires numpy arrays) and before _get_rescale/_get_normalize.


386-404: LGTM!

The Idefics3ImageProcessor handling follows the established pattern. The resample int-to-enum conversion correctly handles cases where configuration provides the numeric value instead of the enum.


426-427: LGTM!

No-op for center crop is correct; Idefics3ImageProcessor uses image splitting instead.


435-455: LGTM!

The conditional transform wiring is correct:

  • Splitting enabled: ResizeForVisionEncoderImageSplitter (ensures proper grid extraction)
  • Splitting disabled: SquareResize (simple square output)

The nested dict access with chained .get() calls provides safe fallbacks.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/test_late_interaction_multimodal.py (1)

92-95: Unused loop variable name.

As flagged by the linter, the loop variable name is not used within the loop body.

♻️ Suggested fix
     if is_ci:
-        for name, model in cache.items():
+        for _, model in cache.items():
             delete_model_cache(model.model._model_dir)
     cache.clear()

Alternatively, use .values():

     if is_ci:
-        for name, model in cache.items():
-            delete_model_cache(model.model._model_dir)
+        for model in cache.values():
+            delete_model_cache(model.model._model_dir)
     cache.clear()
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2dc7de8 and 46453a2.

📒 Files selected for processing (4)
  • fastembed/image/onnx_image_model.py
  • fastembed/late_interaction_multimodal/colmodernvbert.py
  • fastembed/late_interaction_multimodal/onnx_multimodal_model.py
  • tests/test_late_interaction_multimodal.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_late_interaction_multimodal.py (2)
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py (6)
  • LateInteractionMultimodalEmbedding (15-189)
  • embed_image (147-168)
  • embed_text (124-145)
  • get_embedding_size (98-122)
  • embedding_size (91-95)
  • token_count (170-189)
tests/utils.py (1)
  • delete_model_cache (11-39)
🪛 Ruff (0.14.10)
tests/test_late_interaction_multimodal.py

93-93: Loop control variable name not used within loop body

Rename unused name to _name

(B007)

fastembed/late_interaction_multimodal/colmodernvbert.py

46-46: Possible hardcoded password assigned to: "QUERY_AUGMENTATION_TOKEN"

(S105)


157-157: Unused method argument: kwargs

(ARG002)


191-191: Unused method argument: kwargs

(ARG002)


202-202: Unused method argument: kwargs

(ARG002)


215-215: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Python 3.11.x on ubuntu-latest test
  • GitHub Check: Python 3.12.x on ubuntu-latest test
  • GitHub Check: Python 3.13.x on windows-latest test
  • GitHub Check: Python 3.13.x on macos-latest test
  • GitHub Check: Python 3.13.x on ubuntu-latest test
  • GitHub Check: Python 3.10.x on ubuntu-latest test
🔇 Additional comments (17)
fastembed/image/onnx_image_model.py (1)

78-92: LGTM! Good resource management with ExitStack.

The ExitStack properly manages image file handles, ensuring they're closed after processing. The image processing occurs within the context block before the handles are released, which is correct.

tests/test_late_interaction_multimodal.py (1)

151-163: Good test coverage for token counting.

The test properly validates that:

  1. Individual token counts sum correctly
  2. Batching doesn't affect total count
  3. Extension tokens are counted when include_extension=True
fastembed/late_interaction_multimodal/onnx_multimodal_model.py (4)

172-202: LGTM! Clean dispatch logic for nested vs flat image processing.

The method correctly:

  1. Uses ExitStack for resource management
  2. Dispatches based on processor output structure
  3. Returns a complete OnnxOutputContext with attention mask and metadata

204-239: Well-structured patch processing with proper attention masking.

The method correctly handles variable patch counts per image by:

  1. Finding max patches for batch padding
  2. Zero-padding shorter sequences
  3. Using attention mask to distinguish real patches from padding

241-277: LGTM! Proper handling of 4D/5D tensor requirements.

The method adapts to model input requirements by checking the ONNX signature and appropriately reshaping tensors for backward compatibility.


279-301: LGTM! Good defensive programming.

The method safely handles uninitialized models and defaults to False for backward compatibility.

fastembed/late_interaction_multimodal/colmodernvbert.py (11)

35-46: LGTM! Clean class structure.

The class constants are appropriately defined. The static analysis warning about QUERY_AUGMENTATION_TOKEN being a "hardcoded password" is a false positive—it's clearly a prompt formatting token, not a credential.


48-114: LGTM! Comprehensive initialization with good device handling.

The constructor properly handles:

  • Lazy loading for parallel processing scenarios
  • Device ID fallback (explicit → first from list → None)
  • Model download and configuration setup

191-195: LGTM! Query augmentation follows colpali-engine pattern.

The 10 augmentation tokens match the upstream process_queries logic as noted in the comment. The unused **kwargs parameter maintains API consistency with the base class.


197-212: LGTM! Clean token counting implementation.

The method properly:

  1. Ensures the model/tokenizer is loaded before counting
  2. Supports both single strings and iterables
  3. Uses appropriate tokenization based on include_extension flag

214-277: LGTM! Well-implemented dynamic input construction.

The method correctly:

  1. Counts real patches from attention mask
  2. Computes grid dimensions from patch count
  3. Builds input_ids with appropriate padding direction from tokenizer config

279-302: LGTM! Correct grid dimension computation.

The method properly handles:

  • Single/no patches (returns 0, 0)
  • Square and near-square grids
  • Non-square grid fallback search

304-351: LGTM! Clean prompt construction following Idefics3 format.

The methods properly construct prompt strings with appropriate special tokens for both single and split image scenarios.


352-368: LGTM! Correct output reshaping for late-interaction format.

The method properly reshapes model output to the expected (batch, n, dim) format for late-interaction retrieval.


370-440: LGTM! Clean delegation to base class methods.

Both embedding methods properly forward all configuration options to the underlying implementation.


451-467: LGTM! Standard worker implementations.

The workers correctly initialize the model with threads=1, which is appropriate for parallel processing where each worker handles a subset of the workload.


136-155: Add error handling for JSON config file loading.

Lines 137-154 open three JSON config files without error handling. While these files are guaranteed to exist (config.json and preprocessor_config.json are part of HuggingFace's standard model files, and processor_config.json is explicitly listed in additional_files), adding try/except blocks would improve debuggability if a file is corrupted or contains invalid JSON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants