Skip to content

Conversation

@Brain97
Copy link

@Brain97 Brain97 commented Dec 1, 2025

Motivation

Support cache-dit #13753

Modifications

Accuracy Tests

Benchmarking and Profiling

baseline
sglang generate --model-path black-forest-labs/FLUX.1-dev --prompt "A Logo With Bold Large Text: SGL Diffusion" --save-output --perf-dump-path sglang_no_cache.json

with cache
sglang generate --model-path black-forest-labs/FLUX.1-dev --prompt "A Logo With Bold Large Text: SGL Diffusion" --save-output --enable-cache-dit --cache-dit-warmup 8 --cache-dit-rdt 0.35 --perf-dump-path sglang_with_cache.json

Performance Comparison Report

1. High-level Summary

Metric Baseline New Diff Status
E2E Latency 29920.22 ms 13750.82 ms -16169.40 ms (-54.0%)
Throughput 0.03 req/s 0.07 req/s - -

2. Stage Breakdown

Stage Name Baseline (ms) New (ms) Diff (ms) Diff (%) Status
InputValidationStage 0.04 0.05 +0.01 +18.8% ⚪️
TextEncodingStage 441.19 532.33 +91.14 +20.7% ⚪️
ConditioningStage 0.02 0.02 +0.00 +3.7% ⚪️
TimestepPreparationStage 46.94 46.68 -0.26 -0.6% ⚪️
LatentPreparationStage 5.72 5.71 -0.01 -0.2% ⚪️
DenoisingStage 28049.23 11611.66 -16437.57 -58.6% 🟢
DecodingStage 1189.57 1366.76 +177.18 +14.9% 🔴
Metadata
  • Baseline Commit: N/A
  • New Commit: N/A
  • Timestamp: 2025-12-02T17:12:10.214027

Checklist

@Brain97 Brain97 requested a review from mickqian as a code owner December 1, 2025 16:03
@github-actions github-actions bot added the diffusion SGLang Diffusion label Dec 1, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Brain97, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for cache-dit acceleration within SGLang's DiT (Diffusion Transformer) pipelines. By integrating cache-dit, the system can leverage caching mechanisms to significantly improve inference speed for DiT models, with reported speedups ranging from 1.5x to 3.5x. The changes include a dedicated integration module, modifications to the denoising stage for dynamic activation, and new server arguments for granular control over the caching behavior.

Highlights

  • Cache-dit Integration Module: A new Python module cache_dit_integration.py has been added to encapsulate all cache-dit related functionalities, including configuration, enabling, and summary retrieval.
  • Denoising Stage Enhancement: The DenoisingStage now includes logic to conditionally enable cache-dit on the transformer model, ensuring it's applied after loading and before compilation, and handling idempotent activation.
  • Configurable via Server Arguments: New command-line arguments and ServerArgs attributes have been introduced to allow users to enable and fine-tune cache-dit parameters like block computation, warmup steps, and residual difference thresholds.
  • Wrapped Function Handling: The prepare_extra_func_kwargs utility has been updated to correctly inspect the original function signature when cache-dit or other wrappers modify the function object.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for cache-dit to accelerate DiT-based model inference. The changes are well-structured, adding a dedicated integration module (cache_dit_integration.py) that handles the lazy loading and configuration of cache-dit. The core denoising pipeline is updated to enable caching on the transformer model, with appropriate safety checks for distributed environments and correct handling of wrapped functions. Configuration options are also exposed through server arguments. Overall, this is a solid implementation that thoughtfully integrates a new performance optimization.

Comment on lines 225 to 240


self.transformer = enable_cache_on_transformer(
self.transformer,
config,
model_name="transformer",
)
self._cache_dit_enabled = True
self._cached_num_steps = num_inference_steps
logger.info(
"cache-dit enabled successfully on transformer (steps=%d)",
num_inference_steps,
)


@lru_cache(maxsize=8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These extra blank lines can be removed to improve code compactness.

        )

        self.transformer = enable_cache_on_transformer(
            self.transformer,
            config,
            model_name="transformer",
        )
        self._cache_dit_enabled = True
        self._cached_num_steps = num_inference_steps
        logger.info(
            "cache-dit enabled successfully on transformer (steps=%d)",
            num_inference_steps,
        )

    @lru_cache(maxsize=8)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please solve it

Comment on lines 1033 to 1034
if hasattr(transformer_instance, '_original_forward'):
target_func = transformer_instance._original_forward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic relies on an internal attribute _original_forward of the transformer_instance from cache-dit. This creates a tight coupling and could break if cache-dit changes its internal implementation. It would be good to add a comment here to note this dependency, or investigate if cache-dit provides a more stable API for unwrapping functions.

Copy link
Collaborator

@mickqian mickqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up TODOs:

  1. add to CI
  2. add docs

self._cache_dit_enabled = False
self._cached_num_steps = None

def _maybe_enable_cache_dit(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use it in denoising_dmd.py?

return

# Check if cache-dit is enabled in config
if not getattr(server_args, "enable_cache_dit", False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not server_args.enable_cache_dit?

# Check if cache-dit is available
if not is_cache_dit_available():
logger.warning(
"cache-dit is not installed. Please install it with: pip install cache-dit"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add it as a requirement?

Comment on lines 225 to 240


self.transformer = enable_cache_on_transformer(
self.transformer,
config,
model_name="transformer",
)
self._cache_dit_enabled = True
self._cached_num_steps = num_inference_steps
logger.info(
"cache-dit enabled successfully on transformer (steps=%d)",
num_inference_steps,
)


@lru_cache(maxsize=8)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please solve it

@@ -0,0 +1,190 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this file to runtime/utils


# cache-dit acceleration parameters
enable_cache_dit: bool = False
cache_dit_Fn: int = 1 # Number of first blocks to always compute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider supporting these in env vars?

@mickqian
Copy link
Collaborator

mickqian commented Dec 2, 2025

Also, could you dump the perf report according to contributing.md?

@Brain97
Copy link
Author

Brain97 commented Dec 2, 2025

Also, could you dump the perf report according to contributing.md?

done

@mickqian
Copy link
Collaborator

mickqian commented Dec 2, 2025

@Brain97 Great, would you solve the issues

@Brain97
Copy link
Author

Brain97 commented Dec 2, 2025

@Brain97 Great, would you solve the issues

sure

@fy1214 fy1214 requested a review from yhyang201 as a code owner December 2, 2025 16:11
@fy1214
Copy link
Collaborator

fy1214 commented Dec 2, 2025

update the quantizer method to commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants