Add transformer model support, fix Multi-Head Attention quantization and enhance the pipeline #3

federicobrancasi · 2025-06-25T12:31:23Z

Add transformer model support, fix Multi-Head Attention quantization and enhance the pipeline

Summary

Added comprehensive support for transformer-based architectures (CCT, ViT) to DeepQuant.
Fixed critical Multi-Head Attention (MHA) quantization issues that prevented proper handling of packed projections and batch-first layouts.
Enhanced the quantization pipeline with new utilities and improved error handling.

Key Changes

New models support

Implemented Compact Convolutional Transformer (CCT) architecture
Added Vision Transformer (ViT-B/32) support
Created comprehensive test suites: TestCCT.py, TestCCTPretrained.py, TestVitB32.py, TestVitB32Pretrained.py
Added ResNet-50 pretrained test to validate pipeline scalability on larger networks

Multi-Head Attention fixes (needed to make ViT's QuantMultiheadAttention work)

Fixed MHA handling in CustomForwards/MultiHeadAttention.py to properly support:
- Packed in_proj weights (common in transformers)
- Batch-first tensor layouts with correct transposition
- View operations that previously caused dimension mismatches
Added support for self-attention scenarios where Q, K, V use the same input

Quantization improvements

Changed rounding policy from round to floor in quantization modules for more accurate integer representation
Added checkEquivalence flag to brevitasToTrueQuant() function to automatically validate quantized models against original outputs

ONNX Runtime enhancements

Updated to ONNX opset version 18 for proper LayerNormalization folding required by Deeploy
Implemented deterministic session handling by creating a function to disable all optimizations, ensuring exact reproducibility for Deeploy verification

Infrastructure improvements

Fixed .view(-1) operations in tensor recording that caused shape inference issues
Included CIFAR-10 dataset for CI testing of CCT models

* Initial commit fbrancasi/dev * Working Resnet18 * Codebase Refactor * update Resnet18 test * Fix CI * Minor Fixes

The current version of DeepQuant simply assumes that all linear modules have a bias, otherwise it skips unifying the Dequant nodes. This modification enables unification of Dequant blocks even when there is no `biasDequantNode`. This implementation is incomplete as it assumes that the input Dequant zeroPoint is 0.

…to fbrancasi/dev

federicobrancasi and others added 30 commits April 19, 2025 10:48

Initial commit fbrancasi/dev

832a715

Working Resnet18

59c2066

Codebase Refactor

f9325e8

Codebase Refactor

c0fdc13

Codebase Refactor

be6ec27

Codebase Refactor

9b17ee7

update Resnet18 test

908c611

Fix CI

71977cb

Minor Fixes

516605e

Rename for better understanding

b998c38

Refactor codebase & fix ResNet-18 test (#1)

98c828c

* Initial commit fbrancasi/dev * Working Resnet18 * Codebase Refactor * update Resnet18 test * Fix CI * Minor Fixes

Merge remote-tracking branch 'upstream/main' into fbrancasi/dev

316bd96

Add CCT Model and Test

cfd21c3

Change Rounding Policy in Quant Module

e2bc1b4

Refactor CCT Test

01a60c5

Handle Deterministic Session for ORT and Update Tests

2cbc76c

Add checkEquivalence Flag to brevitasToTrueQuant function

2277543

Fix Problem of .view(-1) in TensorRecorder Util

9e489b1

Modify CCT Test

531fa57

Modify CCT Test

dfe48d0

Modify CCT Test Pretrained

29d7c0c

Fix CCT Test Pretrained

ac3cd41

Fix CCT Test (use 8bit for weight_bit_width of Linear)

383e8bc

Add CIFAR10 for CI

67a4094

Fix CI

b64126b

Fix CI

21b9c74

Update README.md

18b2642

Add New Models and Fix MHA Problems

2f00a67

Update ORT opset version

b2c6fe2

federicobrancasi added 3 commits June 24, 2025 02:52

Add ViTB32 Test

856f249

Merge branch 'fbrancasi/dev' of github.com:pulp-platform/DeepQuant in…

d580088

…to fbrancasi/dev

Update TestCCT

2657957

federicobrancasi requested a review from Victor-Jung June 25, 2025 12:32

federicobrancasi self-assigned this Jun 25, 2025

Update TestViTB32

15250c8

Xeratec self-requested a review June 26, 2025 08:09

federicobrancasi added 4 commits July 2, 2025 13:24

Use the right version of CCT

0adc5a3

Update Tests to use right version of CCT

3d3b90e

Remove wrong version of CCT from Model Folder

3406450

Fix pytest configuration to exclude non-test files

e27b638

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add transformer model support, fix Multi-Head Attention quantization and enhance the pipeline #3

Add transformer model support, fix Multi-Head Attention quantization and enhance the pipeline #3

Uh oh!

federicobrancasi commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add transformer model support, fix Multi-Head Attention quantization and enhance the pipeline #3

Are you sure you want to change the base?

Add transformer model support, fix Multi-Head Attention quantization and enhance the pipeline #3

Uh oh!

Conversation

federicobrancasi commented Jun 25, 2025

Add transformer model support, fix Multi-Head Attention quantization and enhance the pipeline

Key Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants