Skip to content

Conversation

@JanCSEM
Copy link

@JanCSEM JanCSEM commented Dec 2, 2025

Deepquant MHA makes assumptions on MHA configurations, instead of reading them from the Brevitas object.

This adds support for batch_first implementation.

I was unable to implement a grouped QKV implementation due to the splitting operation which is not natively supported on Brevitas QuantTensor. Operating on QuantTensor.value results in a mismatch between tracing and execution stages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant