-
Notifications
You must be signed in to change notification settings - Fork 69
Add support for chunked attention #560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: releases/v0.11.0
Are you sure you want to change the base?
Add support for chunked attention #560
Conversation
Signed-off-by: Jan Kaniecki <[email protected]>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for chunked attention, a mechanism that restricts attention within fixed-size chunks during both prefill and decode phases. The implementation includes computing chunked attention biases, managing chunked block mappings, and integrating these features into the existing attention metadata infrastructure.
Key changes:
- Implements chunked attention bias computation for both prompt and decode phases
- Adds new metadata fields to track chunked block mappings, lists, groups, and usage
- Integrates chunked attention configuration detection and layer initialization
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| vllm_gaudi/v1/worker/hpu_worker.py | Adds empty pass statement without removing warmup call |
| vllm_gaudi/v1/worker/hpu_model_runner.py | Implements chunked attention bias computation, metadata updates, and model initialization |
| vllm_gaudi/v1/attention/backends/hpu_attn.py | Extends decode metadata creation with chunked attention parameters |
| vllm_gaudi/attention/backends/hpu_attn.py | Adds chunked attention metadata fields and attention backend logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]> Signed-off-by: Jan Kaniecki <[email protected]>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
Co-authored-by: Copilot <[email protected]> Signed-off-by: Jan Kaniecki <[email protected]>
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
🚧 CI BlockedThe main CI workflow was not started for the following reason:
|
No description provided.