Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
6a7ff53
Q3_HIFI added
Nov 27, 2025
431fa1e
Update Q3_HIFI outliers count for accuracy improvement
geoffmunn Nov 29, 2025
7b5e058
Refactor quantization with optional quant_weights
geoffmunn Nov 29, 2025
13184ab
Add quantize_row_q3_hifi_ref function declaration
geoffmunn Nov 29, 2025
a91b6c8
Fix syntax error in ggml.c
geoffmunn Nov 29, 2025
1fb4f16
Add GGML_TYPE_Q3_HIFI case to ops.cpp
geoffmunn Nov 29, 2025
739f7d6
Add quantize_row_q3_hifi function declaration
geoffmunn Nov 29, 2025
7d003b2
Add LLAMA_FTYPE_MOSTLY_Q3_HIFI to llama.h
geoffmunn Nov 29, 2025
d0dcce9
Add Q3_HIFI type support in llama model loader
geoffmunn Nov 29, 2025
2e8e69a
Add support for GGML_TYPE_Q3_HIFI in llama-quant
geoffmunn Nov 29, 2025
3cf3235
Add Q3_HIFI quantization option
geoffmunn Nov 29, 2025
2a23338
Add comparison of Q3 quantization formats
geoffmunn Nov 29, 2025
10b2019
Add complete guide for Importance Matrix (imatrix) files
geoffmunn Nov 29, 2025
11c85c4
Add high-fidelity quantization function
geoffmunn Nov 29, 2025
ac8003e
Implement Q3_HIFI type in ggml-cpu.c
geoffmunn Nov 29, 2025
f4b5ecb
Revise Q3 quantization formats comparison document
geoffmunn Nov 29, 2025
d302e6d
Add GGML_API qualifier to dequantize_row_q3_hifi
geoffmunn Dec 3, 2025
230ee25
Add NEON-optimized dequantization for Q3_HIFI
geoffmunn Dec 3, 2025
f2a2d97
Implement AVX2 dequantization for Q3_HIFI
geoffmunn Dec 3, 2025
7d6a887
Update dequantize.cuh
geoffmunn Dec 3, 2025
c2b5957
Update ggml-metal.metal
geoffmunn Dec 3, 2025
27e8f1b
Create dequant_q3_hifi.comp
geoffmunn Dec 3, 2025
2025109
First round of optimisations, speed is 5.6x slower
GeoffApples Dec 11, 2025
ae313c5
Results updated
GeoffApples Dec 11, 2025
cc7c51d
ql/qh block structure updated
GeoffApples Dec 11, 2025
31200f1
Speed improvements made. 84% of base model.
GeoffApples Dec 11, 2025
40181d8
Hybrid tensor speed improvements
GeoffApples Dec 11, 2025
560865f
More CPU architecture support
GeoffApples Dec 11, 2025
e54de2c
Loop unrolling for small speed improvement
GeoffApples Dec 11, 2025
eeada9d
float casts for more speed improvements
GeoffApples Dec 11, 2025
1fb41ec
HIFI names consolidated
GeoffApples Dec 11, 2025
07eab7b
More GPU support improvements
GeoffApples Dec 11, 2025
5e74059
CUDA support added
GeoffApples Dec 11, 2025
ee314fd
Apple metal support
GeoffApples Dec 11, 2025
530b372
More GPU support
GeoffApples Dec 11, 2025
d834494
Conversion script updated
GeoffApples Dec 11, 2025
a7d56ac
Q3_HIFI tests added
GeoffApples Dec 11, 2025
0ca15bd
Merge pull request #1 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 11, 2025
6ff0291
Vulkan shaders added
GeoffApples Dec 12, 2025
d7fb478
Merge pull request #2 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 12, 2025
0189dd8
Syntax error fixed
GeoffApples Dec 12, 2025
697e328
Merge pull request #3 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 12, 2025
8a4f2d4
Missing Q3_HIFI constants added
GeoffApples Dec 12, 2025
d8ae285
GPU disabled (bad results)
GeoffApples Dec 13, 2025
9344bfe
Latest speed improvements
GeoffApples Dec 13, 2025
c5bf27f
All 3 metrics now exceed Q3_K_M
GeoffApples Dec 13, 2025
1cf26dc
Documentation updated
GeoffApples Dec 13, 2025
9b58d82
Merge pull request #4 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 13, 2025
0baa2c8
Q3_HIFI_A now the official version
GeoffApples Dec 13, 2025
bc8ba8a
Merge pull request #5 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 13, 2025
2d4d0b3
Speed benchmark script added
GeoffApples Dec 14, 2025
a177f2c
Merge pull request #6 from GeoffApples/Q3_HIFI_1.7B_fast
geoffmunn Dec 14, 2025
bc3c5cf
Merge pull request #7 from ggml-org/master
geoffmunn Dec 14, 2025
0e6f3aa
Merge branch 'Q3_HIFI' into master
geoffmunn Dec 14, 2025
9971857
Merge pull request #8 from geoffmunn/master
geoffmunn Dec 14, 2025
42b6477
Old files removed
Dec 21, 2025
5792ab4
Cross-model documentation added
Dec 21, 2025
8b72146
Validation errors fixed
Dec 21, 2025
daf0e20
Whitespace fixed
Dec 21, 2025
bf0d021
Whitespace fixes
Dec 21, 2025
f79424e
Whitespace fixes
Dec 21, 2025
abcb4cc
Whitespace fixes
Dec 21, 2025
7724f7b
Whitespace changes
Dec 21, 2025
a6bb077
Whitespace fixes
Dec 21, 2025
9bae334
Whitespace fixes
Dec 21, 2025
dce3e67
Whitespace fixes
Dec 21, 2025
3e3f931
Whitespace fixes
Dec 21, 2025
972d662
Whitespace fixes
Dec 21, 2025
20390e2
Whitespace fixes
Dec 21, 2025
4851a00
print statements changed to logging()
Dec 21, 2025
9be1c3d
Extra blank line removed
Dec 21, 2025
c42d48f
Merge pull request #9 from geoffmunn/Q3_HIFI
geoffmunn Dec 21, 2025
dbf9a9a
Documentation moved
Dec 21, 2025
2c4049e
GGML_TYPE_Q3_HIFI now value 12
Dec 21, 2025
e4fd98f
GGML_TYPE_Q3_HIFI moved to end, numbers re-ordered
Dec 21, 2025
beb72af
Missing files added
GeoffApples Dec 27, 2025
c553255
Added new ftype enum
GeoffApples Dec 27, 2025
73dd524
Added tensor type selection logic
GeoffApples Dec 27, 2025
d44e778
Added CLI entry
GeoffApples Dec 27, 2025
b491e3e
Added description string
GeoffApples Dec 27, 2025
325d34e
Phase 2 improvements
GeoffApples Dec 27, 2025
62acf48
V3 changes
GeoffApples Dec 27, 2025
94b18a1
Build error fixed
GeoffApples Dec 27, 2025
969754f
Build warning fixed
GeoffApples Dec 27, 2025
30535c6
Missing type added
GeoffApples Dec 27, 2025
7c233ec
Quantisation error fixed
GeoffApples Dec 27, 2025
e7862d1
Outlier budget and early exit
GeoffApples Dec 27, 2025
13e8b25
Build error fixed
GeoffApples Dec 27, 2025
7249d24
Missing type added
GeoffApples Dec 27, 2025
2128f33
Q4_HIFI now standardised
GeoffApples Dec 27, 2025
e6ab6f6
INT8 residuals for size reduction
GeoffApples Dec 30, 2025
8543275
Unused variables removed
GeoffApples Dec 30, 2025
e443866
Add HIFI quantization support with layer-adaptive outlier allocation
GeoffApples Dec 31, 2025
131c0e7
Refactor thread-local storage declaration and improve tensor importan…
GeoffApples Dec 31, 2025
aaa3564
Add maximum outliers definition for Q6_K_HIFI_RES8 format
GeoffApples Dec 31, 2025
5411916
Update include path for HIFI quantization header
GeoffApples Dec 31, 2025
0e0830a
Enhance model parameter calculation and logging for HIFI quantization
GeoffApples Dec 31, 2025
8077951
Add model size parameter to HIFI quantization context
GeoffApples Dec 31, 2025
3b40c30
Enhance layer-adaptive outlier allocation for Q4_HIFI quantization
GeoffApples Dec 31, 2025
344495f
Parameter finetuning
GeoffApples Dec 31, 2025
1762085
Refine scale-dependent adjustments for outlier allocation in HIFI qua…
GeoffApples Dec 31, 2025
e9a5e7c
Refine scale-dependent adjustments for outlier allocation in HIFI qua…
GeoffApples Dec 31, 2025
a246d6e
Refine model parameter calculation and scale adjustments for HIFI qua…
GeoffApples Dec 31, 2025
eed04a7
Missing constants added
GeoffApples Jan 1, 2026
79e1751
Implement Q6_K_HIFI_RES8 kernel with residual corrections
GeoffApples Jan 1, 2026
0ed8ad0
Test to see what is happening in the GPU implementation
GeoffApples Jan 1, 2026
581f3ea
First round of size reductions
GeoffApples Jan 1, 2026
e30d855
Option2 of size reductions
GeoffApples Jan 1, 2026
b1312f3
Merge pull request #10 from GeoffApples/Q4_HIFI_v3
geoffmunn Jan 1, 2026
c29a18f
Add quantization type string for Hugging Face model card display
GeoffApples Jan 1, 2026
c4afeb9
Merge pull request #12 from GeoffApples/Q4_HIFI_v3
geoffmunn Jan 3, 2026
3ccfcd3
Q4_HIFI renamed to Q4_K_HIFI
GeoffApples Jan 4, 2026
48e01fb
Add Q5_K_HIFI_RES8 quantization format and associated functions
GeoffApples Jan 4, 2026
9127308
Update Q5_K_HIFI_RES8 structure size and padding initialization
GeoffApples Jan 4, 2026
1782b40
Enhance Q5_K_HIFI_RES8 dequantization and dot product functions
GeoffApples Jan 4, 2026
a6d58d7
Refactor Q5_K_HIFI_RES8 quantization function names for consistency
GeoffApples Jan 4, 2026
339080d
Enhance Q5_K_HIFI_RES8 quantization support in CPU operations
GeoffApples Jan 4, 2026
4c9a074
Add maximum outliers definition for Q5_K_HIFI_RES8 format
GeoffApples Jan 4, 2026
ac65290
Refactor Q5_K_HIFI_RES8 quantization function for improved clarity
GeoffApples Jan 5, 2026
909fa27
Build warnings fixed
GeoffApples Jan 5, 2026
8b2338d
2 extra strategies implemented
GeoffApples Jan 5, 2026
e5c0c28
Merge pull request #13 from GeoffApples/Q4_K_HIFI
geoffmunn Jan 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -137,3 +137,18 @@ poetry.toml
/.windsurf/
# emscripten
a.out.*
wikitext-2-raw/wikitext-2-raw/wiki.test.raw
wikitext-2-raw/wikitext-2-raw/wiki.train.raw
wikitext-2-raw/wikitext-2-raw/wiki.valid.raw
Qwen3-1.7B/.gitattributes
Qwen3-1.7B/config.json
Qwen3-1.7B/generation_config.json
Qwen3-1.7B/LICENSE
Qwen3-1.7B/merges.txt
Qwen3-1.7B/model-00001-of-00002.safetensors
Qwen3-1.7B/model-00002-of-00002.safetensors
Qwen3-1.7B/model.safetensors.index.json
Qwen3-1.7B/README.md
Qwen3-1.7B/tokenizer_config.json
Qwen3-1.7B/tokenizer.json
Qwen3-1.7B/vocab.json
Loading