[diffusion] Support cache-dit #14234

Brain97 · 2025-12-01T16:03:40Z

Motivation

Support cache-dit #13753

Modifications

Accuracy Tests

Benchmarking and Profiling

baseline
sglang generate --model-path black-forest-labs/FLUX.1-dev --prompt "A Logo With Bold Large Text: SGL Diffusion" --save-output --perf-dump-path sglang_no_cache.json

with cache
sglang generate --model-path black-forest-labs/FLUX.1-dev --prompt "A Logo With Bold Large Text: SGL Diffusion" --save-output --enable-cache-dit --cache-dit-warmup 8 --cache-dit-rdt 0.35 --perf-dump-path sglang_with_cache.json

Performance Comparison Report

1. High-level Summary

Metric	Baseline	New	Diff	Status
E2E Latency	29920.22 ms	13750.82 ms	-16169.40 ms (-54.0%)	✅
Throughput	0.03 req/s	0.07 req/s	-	-

2. Stage Breakdown

Stage Name	Baseline (ms)	New (ms)	Diff (ms)	Diff (%)	Status
InputValidationStage	0.04	0.05	+0.01	+18.8%	⚪️
TextEncodingStage	441.19	532.33	+91.14	+20.7%	⚪️
ConditioningStage	0.02	0.02	+0.00	+3.7%	⚪️
TimestepPreparationStage	46.94	46.68	-0.26	-0.6%	⚪️
LatentPreparationStage	5.72	5.71	-0.01	-0.2%	⚪️
DenoisingStage	28049.23	11611.66	-16437.57	-58.6%	🟢
DecodingStage	1189.57	1366.76	+177.18	+14.9%	🔴

Metadata

Baseline Commit: N/A
New Commit: N/A
Timestamp: 2025-12-02T17:12:10.214027

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-01T16:04:01Z

Summary of Changes

Hello @Brain97, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for cache-dit acceleration within SGLang's DiT (Diffusion Transformer) pipelines. By integrating cache-dit, the system can leverage caching mechanisms to significantly improve inference speed for DiT models, with reported speedups ranging from 1.5x to 3.5x. The changes include a dedicated integration module, modifications to the denoising stage for dynamic activation, and new server arguments for granular control over the caching behavior.

Highlights

Cache-dit Integration Module: A new Python module cache_dit_integration.py has been added to encapsulate all cache-dit related functionalities, including configuration, enabling, and summary retrieval.
Denoising Stage Enhancement: The DenoisingStage now includes logic to conditionally enable cache-dit on the transformer model, ensuring it's applied after loading and before compilation, and handling idempotent activation.
Configurable via Server Arguments: New command-line arguments and ServerArgs attributes have been introduced to allow users to enable and fine-tune cache-dit parameters like block computation, warmup steps, and residual difference thresholds.
Wrapped Function Handling: The prepare_extra_func_kwargs utility has been updated to correctly inspect the original function signature when cache-dit or other wrappers modify the function object.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for cache-dit to accelerate DiT-based model inference. The changes are well-structured, adding a dedicated integration module (cache_dit_integration.py) that handles the lazy loading and configuration of cache-dit. The core denoising pipeline is updated to enable caching on the transformer model, with appropriate safety checks for distributed environments and correct handling of wrapped functions. Configuration options are also exposed through server arguments. Overall, this is a solid implementation that thoughtfully integrates a new performance optimization.

gemini-code-assist · 2025-12-01T16:05:33Z

python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py

+
+
+        self.transformer = enable_cache_on_transformer(
+            self.transformer,
+            config,
+            model_name="transformer",
+        )
+        self._cache_dit_enabled = True
+        self._cached_num_steps = num_inference_steps
+        logger.info(
+            "cache-dit enabled successfully on transformer (steps=%d)",
+            num_inference_steps,
+        )
+
+
    @lru_cache(maxsize=8)


These extra blank lines can be removed to improve code compactness.

) self.transformer = enable_cache_on_transformer( self.transformer, config, model_name="transformer", ) self._cache_dit_enabled = True self._cached_num_steps = num_inference_steps logger.info( "cache-dit enabled successfully on transformer (steps=%d)", num_inference_steps, ) @lru_cache(maxsize=8)

please solve it

gemini-code-assist · 2025-12-01T16:05:34Z

python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py

+            if hasattr(transformer_instance, '_original_forward'):
+                target_func = transformer_instance._original_forward


This logic relies on an internal attribute _original_forward of the transformer_instance from cache-dit. This creates a tight coupling and could break if cache-dit changes its internal implementation. It would be good to add a comment here to note this dependency, or investigate if cache-dit provides a more stable API for unwrapping functions.

mickqian

Follow up TODOs:

add to CI
add docs

mickqian · 2025-12-02T01:56:08Z

python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py

+        self._cache_dit_enabled = False
+        self._cached_num_steps = None
+
+    def _maybe_enable_cache_dit(


should we use it in denoising_dmd.py?

mickqian · 2025-12-02T02:14:06Z

python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py

+            return
+
+        # Check if cache-dit is enabled in config
+        if not getattr(server_args, "enable_cache_dit", False):


nit: why not server_args.enable_cache_dit?

mickqian · 2025-12-02T02:14:42Z

python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py

+        # Check if cache-dit is available
+        if not is_cache_dit_available():
+            logger.warning(
+                "cache-dit is not installed. Please install it with: pip install cache-dit"


should we add it as a requirement?

mickqian · 2025-12-02T02:15:24Z

python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising.py

+
+
+        self.transformer = enable_cache_on_transformer(
+            self.transformer,
+            config,
+            model_name="transformer",
+        )
+        self._cache_dit_enabled = True
+        self._cached_num_steps = num_inference_steps
+        logger.info(
+            "cache-dit enabled successfully on transformer (steps=%d)",
+            num_inference_steps,
+        )
+
+
    @lru_cache(maxsize=8)


please solve it

mickqian · 2025-12-02T02:23:25Z

python/sglang/multimodal_gen/runtime/pipelines_core/cache_dit_integration.py

@@ -0,0 +1,190 @@
+# SPDX-License-Identifier: Apache-2.0


move this file to runtime/utils

mickqian · 2025-12-02T02:24:24Z

python/sglang/multimodal_gen/runtime/server_args.py


+    # cache-dit acceleration parameters
+    enable_cache_dit: bool = False
+    cache_dit_Fn: int = 1  # Number of first blocks to always compute


Should we consider supporting these in env vars?

mickqian · 2025-12-02T02:27:28Z

Also, could you dump the perf report according to contributing.md?

Brain97 · 2025-12-02T09:13:33Z

Also, could you dump the perf report according to contributing.md?

done

mickqian · 2025-12-02T09:48:55Z

@Brain97 Great, would you solve the issues

Brain97 · 2025-12-02T10:14:32Z

@Brain97 Great, would you solve the issues

sure

fy1214 · 2025-12-02T16:12:17Z

update the quantizer method to commit

shuxiguo and others added 2 commits December 1, 2025 20:01

feat: support for cache-dit

9c43457

Merge branch 'sgl-project:main' into feat-cache-dit

cb6d7bb

Brain97 requested a review from mickqian as a code owner December 1, 2025 16:03

github-actions bot added the diffusion SGLang Diffusion label Dec 1, 2025

gemini-code-assist bot reviewed Dec 1, 2025

View reviewed changes

mickqian reviewed Dec 2, 2025

View reviewed changes

mickqian added the high priority label Dec 2, 2025

[feature] add cache-dit quantize method

900c024

fy1214 requested a review from yhyang201 as a code owner December 2, 2025 16:11

		if hasattr(transformer_instance, '_original_forward'):
		target_func = transformer_instance._original_forward

[diffusion] Support cache-dit #14234

Are you sure you want to change the base?

[diffusion] Support cache-dit #14234

Conversation

Brain97 commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Performance Comparison Report

1. High-level Summary

2. Stage Breakdown

Checklist

Uh oh!

gemini-code-assist bot commented Dec 1, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian left a comment

Choose a reason for hiding this comment

Uh oh!

mickqian Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

mickqian commented Dec 2, 2025

Uh oh!

Brain97 commented Dec 2, 2025

Uh oh!

mickqian commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Brain97 commented Dec 2, 2025

Uh oh!

fy1214 commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Brain97 commented Dec 1, 2025 •

edited

Loading

mickqian commented Dec 2, 2025 •

edited

Loading