[NA] [Optimizer] Add support for multi-modal prompts in Opik Optimizer #3926

jverre · 2025-11-03T23:19:23Z

Details

This PR adds comprehensive multimodal content support across the Opik Optimizer SDK, enabling optimization of prompts that include both text and images. The changes ensure that multimodal message structures are preserved throughout the optimization process.

Key Changes:

OptimizationResult Model: Updated to support multimodal content by changing prompt and initial_prompt fields from list[dict[str, str]] to list[MessageDict], which properly supports content as either a string or a list of text/image parts.
Hierarchical Reflective Optimizer:
- Fixed issue where multimodal content was being converted to string representation after first optimization run
- Updated PromptMessage model to use MessageDict type for consistency with existing codebase
- Implemented structured outputs using Pydantic models for more robust LLM responses
- Changed template formatting to use JSON serialization to preserve multimodal structure
Meta Prompt Optimizer:
- Replaced manual JSON parsing with structured outputs using Pydantic models
- Created dedicated types.py file with PromptCandidate, CandidatePromptsResponse, ToolDescriptionCandidate, and ToolDescriptionsResponse models
- Removed brittle regex-based JSON extraction logic
Reporting Utilities:
- Updated display_optimized_prompt_diff to handle multimodal content in prompt diffs
- Added _content_to_string helper for converting multimodal content to string representation for diffing
- Enhanced display methods to properly format multimodal content using existing _format_message_content utility
New Dataset: Added driving_hazard_50 dataset for multimodal evaluation scenarios
Example Script: Added multimodal_example.py demonstrating multimodal prompt optimization

Change checklist

User facing
Documentation update

Issues

Resolves #
OPIK-000

Testing

Updated test_optimization_result.py to reflect new type definitions
Manual testing with multimodal example script confirms:
- Multimodal content structure is preserved through optimization rounds
- Display methods correctly format multimodal content
- Structured outputs work correctly with Pydantic validation

Documentation

Added multimodal example script demonstrating usage
Updated type hints and docstrings to reflect multimodal support
No breaking changes to public API - existing string-based prompts continue to work

vincentkoc · 2025-11-10T00:48:21Z

Cherry picked commits into #3926, closing this branch and #3488 - biggest issue is the Pydantic type changes are extensive and would need a large scale refactor, will adjust and combine my EO changes into a new PR. Will raise additional issues/tickets for missing parts

jverre added 6 commits November 3, 2025 21:38

ChatPrompt updates

705e00f

ChatPrompt updates

45fc527

Added new driving hazard dataset

6fe5f2f

Added multimodal example

2c2115a

Fixed hierarchical optimizer support for multi-modal

67df510

Fixed hierarchical optimizer support for multi-modal

36ac706

github-actions bot assigned jverre Nov 3, 2025

fix tests

cd5e53b

vincentkoc changed the base branch from main to feat/imagebased-optimizer November 5, 2025 15:46

vincentkoc changed the base branch from feat/imagebased-optimizer to main November 8, 2025 02:55

Merge branch 'main' into jacques/multi-modal-example

4905264

github-actions bot assigned vincentkoc Nov 8, 2025

vincentkoc closed this Nov 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NA] [Optimizer] Add support for multi-modal prompts in Opik Optimizer #3926

[NA] [Optimizer] Add support for multi-modal prompts in Opik Optimizer #3926

Uh oh!

jverre commented Nov 3, 2025

Uh oh!

vincentkoc commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[NA] [Optimizer] Add support for multi-modal prompts in Opik Optimizer #3926

[NA] [Optimizer] Add support for multi-modal prompts in Opik Optimizer #3926

Uh oh!

Conversation

jverre commented Nov 3, 2025

Details

Change checklist

Issues

Testing

Documentation

Uh oh!

vincentkoc commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants