add channel wise quantization option for QDQ, and opt for intel NPU #669

bopeng1234 · 2025-04-22T02:16:21Z

Description

add 4bits channel-wised quantization capability for DequantizeLinear Op for phi3 model, it optimized the TPS on Intel NPU

JIRA - https://jira.devtools.intel.com/browse/EISW-163602

Motivation and Context

As Intel's NPU support to LLM shows https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/text_generation#npu-support

if we want to use onnx quantized model in intel NPU, like phi3, the quantized model need to meet two requirements:

symmetric, zp=0
channel wised quantize, block_size = K

So this PR's changes is to enable the channel wised quantize, and symmetric.

we tested it with onnxruntime_genai changes (we created a PR to onnxruntime_genai too to support this extra args, microsoft/onnxruntime-genai#1362). and openvino changes openvinotoolkit/openvino#30265

command:
python -m onnxruntime_genai.models.builder -o E:\download\onnx\Phi-3-mini-4k-instruct-onnx-channelwise-modified-QDQ-T -p int4 -e cpu -i E:\download\huggingface\Phi-3-mini-4k-instruct --extra_options use_channel_wised_quantization=1 use_qdq=1

normally without channel-wised quantize model, the phi3 with NPUW, runs about 4000ms per token when kv cache model.
apply this PR, and phi3 with NPUW, runs about 150ms per token, speed up 20+ times.

bopeng1234 · 2025-04-22T02:18:43Z

@ankitm3k , create this new one, only add the QDQ CW, removed Qoperater related code.

ankitm3k · 2025-04-24T09:03:48Z

@bopeng1234 kindly resolve conflicts

ankitm3k

Reviewed & tested the changes, LGTM

…669) * add channel wise quantization option for QDQ, it optimize for intel NPU * add channel_wised_quantize args to MatMulNBitsQuantizer

bopeng1234 mentioned this pull request Apr 22, 2025

add 4bits channel-wised quantization capability for MatMulNbits Op #631

Closed

bopeng1234 force-pushed the ovep-develop-dev branch from 2fc7fad to 542bab2 Compare April 22, 2025 02:17

bopeng1234 force-pushed the ovep-develop-dev branch from 542bab2 to 02e5d04 Compare April 30, 2025 06:50

bopeng1234 force-pushed the ovep-develop-dev branch 3 times, most recently from a435ea0 to 9194c4f Compare May 9, 2025 04:40

bopeng1234 added 2 commits May 14, 2025 14:06

add channel wise quantization option for QDQ, it optimize for intel NPU

1fb7382

add channel_wised_quantize args to MatMulNBitsQuantizer

680d454

bopeng1234 force-pushed the ovep-develop-dev branch from 9194c4f to 680d454 Compare May 14, 2025 06:06

ankitm3k approved these changes May 14, 2025

View reviewed changes

ankitm3k merged commit 8d2f3c4 into intel:ovep-develop May 14, 2025
3 of 5 checks passed

ankitm3k pushed a commit that referenced this pull request Jul 2, 2025

add channel wise quantization option for QDQ, and opt for intel NPU (#…

b3f6d89

…669) * add channel wise quantization option for QDQ, it optimize for intel NPU * add channel_wised_quantize args to MatMulNBitsQuantizer

bopeng1234 mentioned this pull request Jul 28, 2025

add extra_options use_channel_wised_quantization to builder.py microsoft/onnxruntime-genai#1362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add channel wise quantization option for QDQ, and opt for intel NPU #669

add channel wise quantization option for QDQ, and opt for intel NPU #669

Uh oh!

bopeng1234 commented Apr 22, 2025 •

edited

Loading

Uh oh!

bopeng1234 commented Apr 22, 2025

Uh oh!

ankitm3k commented Apr 24, 2025

Uh oh!

ankitm3k left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add channel wise quantization option for QDQ, and opt for intel NPU #669

add channel wise quantization option for QDQ, and opt for intel NPU #669

Uh oh!

Conversation

bopeng1234 commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

bopeng1234 commented Apr 22, 2025

Uh oh!

ankitm3k commented Apr 24, 2025

Uh oh!

ankitm3k left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bopeng1234 commented Apr 22, 2025 •

edited

Loading