Commit 056ed3d
[Performance] Batched calibration (#2054)
## Purpose ##
* Reduce calibration runtime by providing users with options to increase
performance
* `batch_size` controls the batch size of calibration data
* `offload_sequential_activations` controls whether calibration data is
offloaded to the CPU between layers
## Prerequisites ##
* #2080
* #2081
## Changes ##
### Batched Calibration ###
* Add `batch_size` argument
* Change `data_collator` default from the default data collator to a
`"truncation"` collator
* The `data_collator_with_truncation` function truncates all samples to
the shortest length sample in the batch.
* Statistics about how many tokens are dropped using this method are in
the tables below
* The data collator can also be changed to "padding" instead to pad to
the longest length sample in the batch
* In order to reduce the amount of excess truncation/padding, default to
`LengthAwareSampler` which samples from the dataset such that samples
with similar batch lengths are batched together
Batch Size | Time | % Speedup | % Deleted
-- | -- | -- | --
Original (1) | 11m17 | N/A | 0.0
1 | 11m17 | 0.0 | 0.0
2 | 10m48 | 4.2 | 0.2
4 | 10m39 | 5.6 | 0.5
8 | 10m39 | 5.6 | 1.1
16 | 10m58 | 2.8 | 2.6
64 | 11m4 | 11.2 | 12.0
128 | 9m29 | 16.0 | 23.9
512 | 7m39 | 37.3 | 75.3
<!-- notionvc: 36bc5ab7-4968-4c6d-8f38-e5715769b9ba -->
* The speedup is relatively meager up until you start deleting
significant portions of the dataset via truncation
### Disable Offloading ###
* Add `offload_sequential_activations` argument, defaults to True (no
behavior change)
* Enabling this option increases throughput but also increases memory
usage
Batch Size | Time | % Speedup | % Deleted
-- | -- | -- | --
Original (1) | 11m17 | N/A | 0.0
1 | 10m14 | 9.3 | 0.0
2 | 9m46 | 13.4 | 0.2
4 | 9m36 | 14.9 | 0.5
8 | 9m48 | 13.1 | 1.1
16 | 9m26 | 16.3 | 2.6
32 | 9m27 | 16.2 | 5.8
128 | 8m34 | 24.0 | 23.9
512 | 6m40 | 40.9 | 75.3
<!-- notionvc: 3c954cd3-850c-412c-92b3-fa4cfa914be8 -->
* Memory requirement for 512 samples on Llama 8B is ~70Gb, which is
equivalent to batch size 128
* With this option enabled and batch size 32, calibration runtime is
less than 1s per layer (down from ~11s)
* This implies that the theoretical maximum speedup from reducing
calibration time alone is ~15% for this model + dataset
### Misc ###
* Fix examples
* Fixed examples where there's issues between model dtypes and processor
dtypes (Mixtral, Pixtral, Whisper)
* For multimodal models which use multimodal datasets, remove their data
collators, as the batch unwrapping is now done by
the`TextGenerationDataset`
* Remove `_mask_padding` from `IntermediatesCache`, as I do not believe
that this method is effective in masking padding tokens from hessian
calculations
* Fix AWQ
* AWQ was hard coded to handle only batches of size 1
## Testing ##
### Evaluation Regression ###
Batch Size | Eval Score | Difference | % Deleted
-- | -- | -- | --
Original (1) | 0.6573 | 0.000 | 0.0
1 | 0.6513 | -0.6 | 0.0
2 | 0.6513 | -0.6 | 0.2
4 | 0.6657 | +0.8 | 0.5
8 | 0.6513 | -0.6 | 1.1
16 | 0.6672 | +1.0 | 2.6
64 | 0.6338 | -2.4 | 12.0
128 | 0.6603 | +0.3 | 23.9
512 | 0.6391 | -1.8 | 75.3
<!-- notionvc: c37244c6-0013-463e-9b1e-d82d6d78ebe1 -->
Deleting significant portions of the dataset (delete longer sequences
first) has a detrimental effect on recovery
### Modifiers ###
* GPTQ
* Ran full regression tests, as shown above
* AWQ
* Ran AWQ with batch size 32 and checked output sanity
* Quantization Modifier
* Ran NVFP4 with batch size 10 and checked output sanity
### Calibration Regression Testing ###
I ran calibration for the following models (but did not evaluate
recovery)
The following model examples can calibrate without issue:
* Llama3
* Gemma3
* Internvl3
* Mllama
* Llama4
The following models had a bug where processor and model dtypes were
mismatched, but are now fixed by this PR:
* Mistral3
* Pixtral
* Whisper
The following models have an accelerate device offloading bug:
* Idefics3
* Phi3 Vision
The following model examples have an MoE replacement bug:
* qwen3-vl-30b-a3b-Instruct
## Future Work ##
While these options are a great place to start, the next step to improve
runtime is to allow multi-GPU compression, likely via torch.distributed
tensor parallelism
---------
Signed-off-by: Kyle Sayers <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>1 parent 6cf8d29 commit 056ed3d
File tree
17 files changed
+259
-203
lines changed- examples
- multimodal_audio
- multimodal_vision
- src/llmcompressor
- args
- datasets
- entrypoints
- modifiers/awq
- pipelines
- sequential
- transformers/data
- tests/llmcompressor
- pipelines
- transformers
- data
- sparsegpt
17 files changed
+259
-203
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
4 | 8 | | |
5 | 9 | | |
6 | 10 | | |
| |||
55 | 59 | | |
56 | 60 | | |
57 | 61 | | |
58 | | - | |
| 62 | + | |
59 | 63 | | |
60 | 64 | | |
61 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
62 | 69 | | |
63 | 70 | | |
64 | 71 | | |
65 | 72 | | |
66 | 73 | | |
67 | 74 | | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
72 | 83 | | |
73 | 84 | | |
74 | 85 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | 2 | | |
4 | 3 | | |
5 | 4 | | |
| |||
13 | 12 | | |
14 | 13 | | |
15 | 14 | | |
16 | | - | |
17 | | - | |
| 15 | + | |
18 | 16 | | |
19 | 17 | | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
| 18 | + | |
| 19 | + | |
27 | 20 | | |
28 | 21 | | |
29 | 22 | | |
| |||
41 | 34 | | |
42 | 35 | | |
43 | 36 | | |
44 | | - | |
| 37 | + | |
45 | 38 | | |
46 | 39 | | |
47 | 40 | | |
| 41 | + | |
48 | 42 | | |
49 | 43 | | |
50 | | - | |
51 | | - | |
52 | 44 | | |
53 | 45 | | |
54 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
41 | | - | |
42 | 40 | | |
43 | | - | |
| 41 | + | |
| 42 | + | |
44 | 43 | | |
| 44 | + | |
45 | 45 | | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | 46 | | |
| 47 | + | |
54 | 48 | | |
55 | 49 | | |
56 | 50 | | |
| |||
68 | 62 | | |
69 | 63 | | |
70 | 64 | | |
71 | | - | |
72 | 65 | | |
73 | 66 | | |
74 | 67 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | 2 | | |
4 | 3 | | |
5 | 4 | | |
| |||
19 | 18 | | |
20 | 19 | | |
21 | 20 | | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | 21 | | |
29 | 22 | | |
30 | 23 | | |
| |||
44 | 37 | | |
45 | 38 | | |
46 | 39 | | |
47 | | - | |
48 | 40 | | |
49 | 41 | | |
50 | 42 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
8 | 12 | | |
9 | 13 | | |
10 | 14 | | |
| |||
27 | 31 | | |
28 | 32 | | |
29 | 33 | | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
38 | 39 | | |
39 | | - | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | 2 | | |
4 | 3 | | |
5 | 4 | | |
| |||
19 | 18 | | |
20 | 19 | | |
21 | 20 | | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | 21 | | |
29 | 22 | | |
30 | 23 | | |
| |||
44 | 37 | | |
45 | 38 | | |
46 | 39 | | |
47 | | - | |
48 | 40 | | |
49 | 41 | | |
50 | 42 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
5 | 9 | | |
6 | 10 | | |
7 | 11 | | |
| |||
19 | 23 | | |
20 | 24 | | |
21 | 25 | | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
| 50 | + | |
49 | 51 | | |
50 | 52 | | |
51 | 53 | | |
52 | 54 | | |
53 | | - | |
54 | 55 | | |
55 | 56 | | |
56 | 57 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
12 | | - | |
13 | | - | |
| 11 | + | |
14 | 12 | | |
15 | 13 | | |
16 | 14 | | |
| |||
69 | 67 | | |
70 | 68 | | |
71 | 69 | | |
72 | | - | |
73 | | - | |
74 | | - | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
75 | 91 | | |
76 | 92 | | |
77 | 93 | | |
| |||
126 | 142 | | |
127 | 143 | | |
128 | 144 | | |
129 | | - | |
130 | | - | |
| 145 | + | |
| 146 | + | |
131 | 147 | | |
132 | 148 | | |
133 | 149 | | |
| |||
142 | 158 | | |
143 | 159 | | |
144 | 160 | | |
145 | | - | |
| 161 | + | |
146 | 162 | | |
147 | 163 | | |
148 | 164 | | |
| |||
214 | 230 | | |
215 | 231 | | |
216 | 232 | | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
217 | 241 | | |
218 | 242 | | |
219 | 243 | | |
| |||
0 commit comments