Fix cache sizing and cache block layout edge cases by grimoire · Pull Request #4552 · InternLM/lmdeploy

grimoire · 2026-04-23T10:04:08Z

Summary

This PR fixes several cache sizing and cache block layout edge cases in the PyTorch engine, and improves readability around cache configuration/update logic.

Changes

Refactor ExecutorBase.update_configs() into smaller helper methods to make cache memory estimation easier to follow.
Use integer byte budgets when computing available KV cache memory and num_gpu_blocks.
Reserve state cache memory and runtime memory before applying cache_max_entry_count to pageable KV cache memory.
Improve speculative decoding cache sizing:
- spec ranks are estimated with target_cache_block_size + spec_cache_block_size;
- non-spec ranks are estimated with only target_cache_block_size;
- final num_gpu_blocks is the minimum capacity across ranks.
Keep target/spec cache block layout aligned after executor-side block size adjustment.
Add validation for block_size and kernel_block_size:
- block_size >= kernel_block_size;
- block_size % kernel_block_size == 0.
Temporarily reject block_size != kernel_block_size for PD migration, both in engine checker and migration runtime path.
Preserve zero initialization for cache allocation to avoid dirty cache data producing invalid outputs.
Remove unused custom cache allocation helper code.
Add comments around cache sizing, rank-specific speculative cache memory, CUDA device path assumptions, and PD migration limitations.
Fix config propagation details:
- CAMB auto block-size adjustment now also updates kernel_block_size;
- speculative cache config now copies target kernel_block_size.

Tests

python -m pytest \
  tests/pytorch/engine/test_engine_checker.py \
  tests/pytorch/engine/test_cache_engine.py \
  tests/pytorch/engine/test_executor_base.py \
  -q

grimoire added 4 commits April 23, 2026 16:51

fix num_gpu_blocks for spec decoding

10885ef

update cache engine

1916034

update config and message

3f8c4d9

fix ut

8f8d0b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cache sizing and cache block layout edge cases#4552

Fix cache sizing and cache block layout edge cases#4552
grimoire wants to merge 4 commits intoInternLM:mainfrom
grimoire:fix-spec-num-blocks

grimoire commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

grimoire commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

grimoire commented Apr 23, 2026 •

edited

Loading