Skip to content

Fix cache sizing and cache block layout edge cases#4552

Open
grimoire wants to merge 4 commits intoInternLM:mainfrom
grimoire:fix-spec-num-blocks
Open

Fix cache sizing and cache block layout edge cases#4552
grimoire wants to merge 4 commits intoInternLM:mainfrom
grimoire:fix-spec-num-blocks

Conversation

@grimoire
Copy link
Copy Markdown
Collaborator

@grimoire grimoire commented Apr 23, 2026

Summary

This PR fixes several cache sizing and cache block layout edge cases in the PyTorch engine, and improves readability around cache configuration/update logic.

Changes

  • Refactor ExecutorBase.update_configs() into smaller helper methods to make cache memory estimation easier to follow.
  • Use integer byte budgets when computing available KV cache memory and num_gpu_blocks.
  • Reserve state cache memory and runtime memory before applying cache_max_entry_count to pageable KV cache memory.
  • Improve speculative decoding cache sizing:
    • spec ranks are estimated with target_cache_block_size + spec_cache_block_size;
    • non-spec ranks are estimated with only target_cache_block_size;
    • final num_gpu_blocks is the minimum capacity across ranks.
  • Keep target/spec cache block layout aligned after executor-side block size adjustment.
  • Add validation for block_size and kernel_block_size:
    • block_size >= kernel_block_size;
    • block_size % kernel_block_size == 0.
  • Temporarily reject block_size != kernel_block_size for PD migration, both in engine checker and migration runtime path.
  • Preserve zero initialization for cache allocation to avoid dirty cache data producing invalid outputs.
  • Remove unused custom cache allocation helper code.
  • Add comments around cache sizing, rank-specific speculative cache memory, CUDA device path assumptions, and PD migration limitations.
  • Fix config propagation details:
    • CAMB auto block-size adjustment now also updates kernel_block_size;
    • speculative cache config now copies target kernel_block_size.

Tests

python -m pytest \
  tests/pytorch/engine/test_engine_checker.py \
  tests/pytorch/engine/test_cache_engine.py \
  tests/pytorch/engine/test_executor_base.py \
  -q

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant