Skip to content

Conversation

@yhyang201
Copy link
Collaborator

Motivation

Add tests tomorrow

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@yhyang201 yhyang201 requested a review from mickqian as a code owner December 1, 2025 17:12
@github-actions github-actions bot added the diffusion SGLang Diffusion label Dec 1, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yhyang201, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces foundational changes to enable the processing of multiple input images within the diffusion model pipelines. It modifies how image sizes are calculated, images are resized, and how the VAE encoding stage handles multiple image inputs, laying the groundwork for more complex multi-image generation and editing capabilities, such as image-to-image and image-to-video tasks. A new specialized pipeline for Qwen-Image-Edit is also added to leverage these multi-image features.

Highlights

  • Multi-Image Input Support: Core pipeline configurations (base.py, flux.py, qwen_image.py) are updated to handle lists of images for size calculation and resizing, enabling multi-image input for diffusion models.
  • New Pipeline for Qwen-Image-Edit: A QwenImageEditPlusPipelineConfig and corresponding sampling parameters are introduced, specifically tailored for advanced multi-image editing tasks within the Qwen-Image-Edit framework.
  • API and Internal Data Structure Updates: The OpenAI-compatible image API now supports uploading multiple images, and the internal Req object and ImageVAEEncodingStage are modified to correctly process and store multiple image inputs and their latents.
  • Image Preprocessing Integration: A new preprocess_image method is added to pipeline configurations and integrated into the image encoding stage, allowing for custom preprocessing logic for input images.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multiple image inputs in the diffusion pipeline, which is a significant feature enhancement. The changes are mostly well-implemented, refactoring several methods to handle lists of images instead of single images. However, I've identified several issues that need attention. There are multiple incorrect type hints across different files, instances of code duplication that affect maintainability, and a critical bug in QwenImageEditPlusPipelineConfig.prepare_image_processor_kwargs that could lead to a NameError. Additionally, the new qwen-image-edit-2509 model is registered with incorrect sampling parameters. Addressing these points will improve the correctness and robustness of the new functionality.

Comment on lines +371 to +391
def prepare_image_processor_kwargs(self, batch) -> dict:
prompt = batch.prompt
prompt_list = [prompt] if isinstance(prompt, str) else prompt
image_list = batch.condition_image

prompt_template_encode = (
"<|im_start|>system\nDescribe the key features of the input image "
"(color, shape, size, texture, objects, background), then explain how "
"the user's text instruction should alter or modify the image. Generate "
"a new image that meets the user's requirements while maintaining "
"consistency with the original input where appropriate.<|im_end|>\n"
"<|im_start|>user\n{}<|im_end|>\n"
"<|im_start|>assistant\n"
)
img_prompt_template = "Picture {}: <|vision_start|><|image_pad|><|vision_end|>"
if isinstance(image_list, list):
base_img_prompt = ""
for i, img in enumerate(image_list):
base_img_prompt += img_prompt_template.format(i + 1)
txt = [prompt_template_encode.format(base_img_prompt + p) for p in prompt_list]
return dict(text=txt, padding=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a potential NameError in this method. If batch.condition_image (assigned to image_list) is not a list (e.g., a single image object), the if isinstance(image_list, list): block will be skipped. This means base_img_prompt will not be initialized, causing a NameError on the line txt = [prompt_template_encode.format(base_img_prompt + p) for p in prompt_list]. To fix this, you should initialize base_img_prompt before the if block and also handle the case where image_list is a single image.

    def prepare_image_processor_kwargs(self, batch) -> dict:
        prompt = batch.prompt
        prompt_list = [prompt] if isinstance(prompt, str) else prompt
        image_list = batch.condition_image

        prompt_template_encode = (
            "<|im_start|>system\nDescribe the key features of the input image "
            "(color, shape, size, texture, objects, background), then explain how "
            "the user's text instruction should alter or modify the image. Generate "
            "a new image that meets the user's requirements while maintaining "
            "consistency with the original input where appropriate.<|im_end|>\n"
            "<|im_start|>user\n{}\n<|im_end|>\n"
            "<|im_start|>assistant\n"
        )
        img_prompt_template = "Picture {}: <|vision_start|><|image_pad|><|vision_end|>"
        base_img_prompt = ""
        if image_list:
            if not isinstance(image_list, list):
                image_list = [image_list]
            for i, img in enumerate(image_list):
                base_img_prompt += img_prompt_template.format(i + 1)
        txt = [prompt_template_encode.format(base_img_prompt + p) for p in prompt_list]
        return dict(text=txt, padding=True)

def calculate_condition_image_size(self, image, width, height) -> tuple[int, int]:
vae_scale_factor = self.vae_config.arch_config.spatial_compression_ratio
height, width = get_default_height_width(image, vae_scale_factor, height, width)
def calculate_condition_image_size(self, images) -> tuple[int, int]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The return type hint for calculate_condition_image_size is incorrect. The function returns two lists of integers (width and height), but the type hint is tuple[int, int]. It should be tuple[list[int], list[int]] to match the actual return value.

Suggested change
def calculate_condition_image_size(self, images) -> tuple[int, int]:
def calculate_condition_image_size(self, images) -> tuple[list[int], list[int]]:

def calculate_condition_image_size(
self, image, width, height
) -> Optional[tuple[int, int]]:
def calculate_condition_image_size(self, images) -> Optional[tuple[int, int]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The return type hint for calculate_condition_image_size is incorrect. The function now returns two lists of integers, but the hint is Optional[tuple[int, int]]. It should be updated to tuple[list[int], list[int]]. Since the function no longer returns None, the Optional is also unnecessary.

Suggested change
def calculate_condition_image_size(self, images) -> Optional[tuple[int, int]]:
def calculate_condition_image_size(self, images) -> tuple[list[int], list[int]]:

Comment on lines +468 to +474
def resize_condition_image(self, images, target_width, target_height):
new_images = []
for image, width, height in zip(images, target_width, target_height):
new_images.append(
image.resize((width, height), PIL.Image.Resampling.LANCZOS)
)
return new_images
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This resize_condition_image method is an exact copy of the implementation in the base class PipelineConfig. This override is redundant and can be removed to avoid code duplication and improve maintainability.

Comment on lines +401 to +413
def calculate_condition_image_size(self, images) -> tuple[int, int]:
calculated_widths = []
calculated_heights = []

for img in images:
image_width, image_height = img.size
calculated_width, calculated_height, _ = calculate_dimensions(
VAE_IMAGE_SIZE, image_width / image_height
)
calculated_widths.append(calculated_width)
calculated_heights.append(calculated_height)

return calculated_widths, calculated_heights
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method has two issues:

  1. The return type hint is tuple[int, int], but it returns two lists of integers. It should be tuple[list[int], list[int]].
  2. This method is a near-identical copy of the implementation in its parent class QwenImageEditPipelineConfig. This creates unnecessary code duplication.

Since the logic is the same, this method can be removed from QwenImageEditPlusPipelineConfig to inherit the implementation from the parent class, which would fix both issues and improve maintainability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant