13 Nov 11:14

BenjaminBossan

77daa8d

0.18.0: RoAd, ALoRA, Arrow, WaveFT, DeLoRA, OSF, and more Latest

Latest

Highlights

FIXME update list of all changes, so some more commits were added

New Methods

RoAd

@ppetrushkov added RoAd: 2D Rotary Adaptation to PEFT in #2678. RoAd learns 2D rotation matrices that are applied using only element-wise multiplication, thus promising very fast inference with adapters in unmerged state.

Remarkably, besides LoRA, RoAd is the only PEFT method that supports mixed adapter batches. This means that when you have loaded a model with multiple RoAd adapters, you can use all of them for different samples in the same batch, which is much more efficient than switching adapters between batches:

model = PeftModel.from_pretrained(base_model, <path-to-road-adapter-A>, adapter_name="adapter-A")
model.add_adapter("adapter-B", <path-to-road-adapter-B>)

inputs = ...  # input with 3 samples
# apply adapter A to sample 0, adapter B to sample 1, and use the base model for sample 2:
adapter_names = ["adapter-A", "adapter-B", "__base__"]
output_mixed = model(**inputs, adapter_names=adapter_names)
gen_mixed = model.generate(**inputs, adapter_names=adapter_names)

ALoRA

Activated LoRA is a technique added by @kgreenewald in #2609 for causal language models, allowing to selectively enable LoRA adapters depending on a specific token invocation sequence in the input. This has the major benefit of being able to re-use most of the KV cache during inference when the adapter is only used to generate part of the response, after which the base model takes over again.

Arrow & GenKnowSub

@TheTahaaa contributed not only support for Arrow, a dynamic routing algorithm between multiple loaded LoRAs in #2644, but also GenKnowSub, a technique built upon Arrow where the 'library' of LoRAs available to Arrow is first modified by subtracting general knowledge adapters (e.g., trained on subsets of Wikipedia) to enhance task-specific performance.

WaveFT

Thanks to @Bilican, Wavelet Fine-Tuning (WaveFT) was added to PEFT in #2560. This method trains sparse updates in the wavelet domain of residual matrices, which is especially parameter efficient. It is very interesting for image generation, as it promises to generate diverse outputs while preserving subject fidelity.

DeLoRA

Decoupled Low-rank Adaptation (DeLoRA) was added by @mwbini in #2780. This new PEFT method is similar to DoRA in so far as it decouples the angle and magnitude of the learned adapter weights. However, DeLoRA implements this in a way that promises to better prevent divergence. Moreover, it constrains the deviation of the learned weight by imposing an upper limit of the norm, which can be adjusted via the delora_lambda parameter.

OSF

Orthogonal Fine-Tuning (OSF) was added by @NikhilNayak-debug in #2685. By freezing the high-rank subspace of the targeted weight matrices and projecting gradient updates to a low-rank subspace, OSF achieves good performance on continual learning tasks. While it is a bit memory intensive for standard fine-tuning processes, it is definitely worth checking out on tasks where performance degradation of previously learned tasks is a concern.

Enhancements

Text generation benchmark

In #2525, @ved1beta added the text generation benchmark to PEFT. This is a framework to determine and compare metrics with regard to text generation of different PEFT methods, e.g. runtime and memory usage. Right now, this benchmark is still lacking experimental settings and a visualization, analogous to what we have in the MetaMathQA benchmark. If this is something that interests you, we encourage you to let us know or, even better, contribute to this benchmark.

Reliable interface for integrations

PEFT has integrations with other libraries like Transformers and Diffusers. To facilitate this integration, PEFT now provides a stable interface of functions that should be used if applicable. For example, the set_adapter function can be used to switch between PEFT adapters on the model, even if the model is not a PeftModel instance. We commit to keeping these functions backwards compatible, so it's safe for other libraries to build on top of those.

Handling of weight tying

Some Transformers models can have tied weights. This is especially prevalent when it comes to the embedding and the LM head. Currently, the way that this is handled in PEFT is not obvious. We thus drafted an issue to illustrate the intended behavior in #2864. This shows what our goal is, although not everything is implemented yet.

In #2803, @romitjain added the ensure_weight_tying argument to LoraConfig. This argument, if set to True, enforces weight tying of the modules targeted with modules_to_save. Thus, if embedding and LM head are tied, they will share weights, which is important to allow, for instance, weight merging. Therefore, for most users, we recommend to enable this setting if they want to fully fine-tune the embedding and LM head. For backwards compatability, the setting is off by default though.

Note that in accordance with #2864, the functionality of ensure_weight_tying=True will be expanded to also include trainable tokens (#2870) and LoRA (tbd.) in the future.

Support Conv1d and 1x1 Conv2 layers in LoHa and LoKr

@grewalsk extended LoHa and LoKr to support nn.Conv1d layers, as well as nn.Conv2d with 1x1 kernels, in #2515.

New prompt tuning initialization

Thanks to @macmacmacmac, we now have a new initialization option for prompt tuning, random discrete initialization (#2815). This option should generally work better than random initialization, as corroborated on our PEFT method comparison suite. Give it a try if you use prompt tuning.

Combining LoRA adapters with negative weights

If you use multiple LoRA adapters, you can merge them into a single adapter using model.add_weighted_adapter. However, so far, this only worked with positive weights per adapter. Thanks to @sambhavnoobcoder and @valteu, it is now possible to pass negative weights too.

Changes

Transformers compatibility

At the time of writing, the Transformers v5 release is imminent. This Transformers version will be incomptabile with PEFT < 0.18.0. If you plan to use Transformers v5 with PEFT, please upgrade PEFT to 0.18.0+.

Python version

This PEFT version no longer supports Python 3.9, which has reached its end of life. Please use Python 3.10+.

Updates to OFT

The OFT method has been updated to make it slightly faster and to stabilize the numerics in #2805. This means, however, that existing checkpoints may give slightly different results after upgrading to PEFT 0.18.0. Therefore, if you use OFT, we recommend to retrain the adapter.

All Changes

add xpu support for boft/controlnet example by @kaixuanliu in #2674
enabe boft_dreambooth on XPU by @yao-matrix in #2679
Add XPU support for dna_language_model example by @kaixuanliu in #2689
validated lora dreambooth on xpu, pass by @yao-matrix in #2696
validated lorafa on xpu, passed by @yao-matrix in #2697
enable corda finetuning on xpu by @yao-matrix in #2687
validated cpt, ephemeral_gpu_offloading and eva finetuning on XPU by @yao-matrix in #2694
validated PISSA on xpu, pass by @yao-matrix in #2703
validated MISS on xpu, pass by @yao-matrix in #2704
fix bug for feature_extraction example by @kaixuanliu in #2706
Use hub_online_once in trainable token tests by @githubnemo in #2701
Bump version to 0.17.1.dev0 after release by @BenjaminBossan in #2707
validated multi_adapter on xpu, pass by @yao-matrix in #2711
verified mlp on xpu, pass by @yao-matrix in #2712
use CPU instead of XPU for face_alignment by @kaixuanliu in #2713
Add conditional_generation example xpu support by @kaixuanliu in #2684
validated POLY on XPU, pass by @yao-matrix in #2702
add XPU support for hra_dreambooth example by @kaixuanliu in #2717
enable xpu device for causal_language_modeling example by @kaixuanliu in #2680
add xpu support for fp4_finetuing example by @kaixuanliu in #2714
...

Contributors

githubnemo, nirbo, and 31 other contributors

Assets 2

21 Aug 10:04

BenjaminBossan

v0.17.1

53c25fe

0.17.1

This patch release contains a few fixes (via #2710) for the newly introduced target_parameters feature, which allows LoRA to target nn.Parameters directly (useful for mixture of expert layers). Most notably:

PEFT no longer removes possibly existing parametrizations from the parameter.
Adding multiple adapters (via model.add_adapter or model.load_adapter) did not work correctly. Since a solution is not trivial, PEFT now raises an error to prevent this situation.

Assets 2

01 Aug 17:08

githubnemo

v0.17.0

48f6493

0.17.0: SHiRA, MiSS, LoRA for MoE, and more

Highlights

New Methods

SHiRA

@kkb-code contributed Sparse High Rank Adapters (SHiRA, paper) which promise to offer a potential gain in performance over LoRAs - especially the concept loss when using multiple adapters is improved. Since the adapters only train on 1-2% of the weights and are inherently sparse, switching between adapters may be cheaper than with LoRAs. (#2584)

MiSS

@JL-er added a new PEFT method, MiSS (Matrix Shard Sharing) in #2604. This method is an evolution of Bone, which, according to our PEFT method comparison benchmark, gives excellent results when it comes to performance and memory efficiency. If you haven't tried it, you should do so now.

At the same time, Bone will be deprecated in favor of MiSS and will be removed in PEFT v0.19.0. If you already have a Bone checkpoint, you can use scripts/convert-bone-to-miss.py to convert it into a MiSS checkpoint and proceed with training using MiSS.

Enhancements

LoRA for `nn.Parameter`

LoRA is now able to target nn.Parameter directly (#2638, #2665)! Ever had this complicated nn.Module with promising parameters inside but it was too custom to be supported by your favorite fine-tuning library? No worries, now you can target nn.Parameters directly using the target_parameters config attribute which works similarly to target_modules.

This option can be especially useful for models with Mixture of Expert (MoE) layers, as those often use nn.Parameters directly and cannot be targeted with target_modules. For example, for the Llama4 family of models, use the following config to target the MoE weights:

config = LoraConfig(
    ...,
    target_modules=[],  # <= prevent targeting any modules
    target_parameters=["feed_forward.experts.down_proj", "feed_forward.experts.gate_up_proj"],
)

Note that this feature is still experimental as it comes with a few caveats and therefore might change in the future. Also, MoE weights with many experts can be quite huge, so expect a higher memory usage than compared to targeting normal nn.Linear layers.

Injecting adapters based on a `state_dict`

Sometimes, it is possible that there is a PEFT adapter checkpoint but the corresponding PEFT config is not known for whatever reason. To inject the PEFT layers for this checkpoint, you would usually have to reverse-engineer the corresponding PEFT config, most notably the target_modules argument, based on the state_dict from the checkpoint. This can be cumbersome and error prone. To avoid this, it is also possible to call inject_adapter_in_model and pass the loaded state_dict as an argument:

from safetensors.torch import load_file
from peft import LoraConfig, inject_adapter_in_model

model = ...
state_dict = load_file(<path-to-safetensors-file>)
lora_config = LoraConfig()  # <= no need to specify further
model = inject_adapter_in_model(lora_config, model, state_dict=state_dict)

Find more on state_dict based injection in the docs.

Changes

Compatibility

A bug in prompt learning methods caused modules_to_save to be ignored. Especially classification tasks are affected since they usually add the classification/score layer to modules_to_save. In consequence, these layers were neither trained nor stored after training. This has been corrected now. (#2646)

All Changes

Bump version to 0.16.1.dev0 after release by @BenjaminBossan in #2632
FEAT: Add GH action to deploy method comparison app by @BenjaminBossan in #2625
enable FSDP example for model `hugging-quants/Meta-Llama-3.1-8B-Instr… by @kaixuanliu in #2626
FIX: Create mask function signature change in transformers 4.53.1 by @BenjaminBossan in #2633
FIX: Correctly skip AWQ test based on torch version by @BenjaminBossan in #2631
FIX: Faulty OFT parameter device test by @BenjaminBossan in #2630
Fix #2634: Allow peft_type to be a string by @githubnemo in #2635
SHiRA Adapters by @kkb-code in #2584
FIX: Prompt learning methods modules_to_save issue by @BenjaminBossan in #2646
FIX: Error in workflow file to deploy method comparison app by @BenjaminBossan in #2645
FEAT Allow LoRA to target nn.Parameter by @BenjaminBossan in #2638
Update BibTeX entry by @cx-alberto-simoes in #2659
FIX Prefix tuning after transformers PR 38635 by @BenjaminBossan in #2662
make method comparison device agnostic, so it can expand to more accelerators like XPU by @yao-matrix in #2610
Update tokenizer parameter in sfttrainer across multiple examples by @gapsong in #2664
Update lora.md by @qgallouedec in #2666
GPT2 compatible version of LLama-Adapters by @efraimdahl in #2643
Method Comparison: Improve formatting/layout of table by @githubnemo in #2670
ENH: Targeting multiple parameters on the same module by @BenjaminBossan in #2665
Update extending vocab docs by @githubnemo in #2669
FIX Failing target_parameters param usage count by @BenjaminBossan in #2676
Fix trainable tokens with fsdp by @BenjaminBossan in #2681
FIX: Small fixes to target_parameters by @BenjaminBossan in #2677
TST: Add more HF Hub model caching by @BenjaminBossan in #2682
FIX: Missing device map for facebook/opt-125m by @BenjaminBossan in #2675
Fix not detecting regex-targeted embedding layer by @githubnemo in #2649
Add MiSS as a replacement for Bone. by @JL-er in #2604
[WIP] ENH: Adapter injection based on state_dict by @BenjaminBossan in #2637
Release 0.17.0 by @BenjaminBossan in #2691

New Contributors

@kaixuanliu made their first contribution in #2626
@kkb-code made their first contribution in #2584
@cx-alberto-simoes made their first contribution in #2659
@efraimdahl made their first contribution in #2643

Full Changelog: v0.16.0...v0.17.0

Contributors

githubnemo, BenjaminBossan, and 8 other contributors

Assets 2

03 Jul 15:35

BenjaminBossan

v0.16.0

45996a1

0.16.0: LoRA-FA, RandLoRA, C³A, and much more

Highlights

New Methods

LoRA-FA

In #2468, @AaronZLT added the LoRA-FA optimizer to PEFT. This optimizer is based on AdamW and it increases memory efficiency of LoRA training. This means that you can train LoRA with less memory, or, with the same memory budget, use higher LoRA ranks, potentially getting better results.

RandLoRA

Thanks to @PaulAlbert31, a new PEFT method called RandLoRA was added to PEFT (#2464). Similarly to VeRA, it uses non-learnable random low rank matrices that are combined through learnable matrices. This way, RandLoRA can approximate full rank updates of the weights. Training models quantized with bitsandbytes is supported.

C³A

@Phoveran added Circular Convolution Adaptation, C3A, in #2577. This new PEFT method can overcome the limit of low rank adaptations as seen e.g. in LoRA while still promising to be fast and memory efficient.

Enhancements

Thanks to @gslama12 and @SP1029, LoRA now supports Conv2d layers with groups != 1. This requires the rank r being divisible by groups. See #2403 and #2567 for context.

@dsocek added support for Intel Neural Compressor (INC) quantization to LoRA in #2499.

DoRA now supports Conv1d layers thanks to @EskildAndersen (#2531).

Passing init_lora_weights="orthogonal" now enables orthogonal weight initialization for LoRA (#2498).

@gapsong brought us Quantization-Aware LoRA training in #2571. This can make QLoRA training more efficient, please check the included example. Right now, only GPTQ is supported.

There has been a big refactor of Orthogonal Finetuning, OFT, thanks to @zqiu24 (#2575). This makes the PEFT method run more quickly and require less memory. It is, however, incompatible with old OFT checkpoints. If you have old OFT checkpoints, either pin the PEFT version to <0.16.0 or retrain it with the new PEFT version.

Thanks to @keepdying, LoRA hotswapping with compiled models no longer leads to CUDA graph re-records (#2611).

Changes

Compatibility

#2481: The value of required_grads_ of modules_to_save is now set to True when used directly with inject_adapter. This is relevant for PEFT integrations, e.g. Transformers or Diffusers.
Due to a big refactor of vision language models (VLMs) in Transformers, the model architecture has been slightly adjusted. One consequence of this is that if you use a PEFT prompt learning method that is applied to vlm.language_model, it will no longer work, please apply it to vlm directly (see #2554 for context). Morever, the refactor results in different checkpoints. We managed to ensure backwards compatability in PEFT, i.e. old checkpoints can be loaded successfully. There is, however, no forward compatibility, i.e. loading checkpoints trained after the refactor is not possible with package versions from before the refactor. In this case, you need to upgrade PEFT and transformers. More context in #2574.
#2579: There have been bigger refactors in Transformers concerning attention masks. This required some changes on the PEFT side which can affect prompt learning methods. For prefix tuning specifically, this can result in numerical differences but overall performance should be the same. For other prompt learning methods, numerical values should be the same, except if the base model uses 4d attention masks, like Gemma. If you load old prompt learning checkpoints, please double-check that they still perform as expected, especially if they're trained on Gemma or similar models. If not, please re-train them or pin PEFT and transformers to previous versions (<0.16.0 and <4.52.0, respectively).

All Changes

Bump version and minor instruction fix by @githubnemo in #2439
FIX for ConvNd layers using the groups argument. by @gslama12 in #2403
DOC: Tip on how to merge with DeepSpeed by @BenjaminBossan in #2446
Fix incorrect link in docs by @kenning in #2444
Fix typos by @omahs in #2447
Refactor to better support LoRA variants by @BenjaminBossan in #2443
enable 5 test cases on XPU by @yao-matrix in #2442
FIX: Faulty test that results in nan weights by @BenjaminBossan in #2448
Fix sft example script trl and env var by @BenjaminBossan in #2454
LoRA variant init now also receives kwargs by @BenjaminBossan in #2455
Fix #2450: Revamp adapter_state_dict_* methods by @githubnemo in #2456
Method comparison evaluation suite by @githubnemo in #2395
Bump version to reflect patch release by @githubnemo in #2461
The paper on the Bone structure has been updated by @JL-er in #2312
CI: More caching in tests by @BenjaminBossan in #2472
fix gpu tests by @jiqing-feng in #2471
Fix compare results by @jiqing-feng in #2473
fix error_factor for xpu by @jiqing-feng in #2475
Fix: Multiple PEFT methods have issues with models loaded in float16 or bfloat16 by @BenjaminBossan in #2433
TST Refactor tests to make them simpler by @BenjaminBossan in #2462
Use Python 3.9 as RUFF target version and apply fixes by @cyyever in #2483
FIX Deleting adapters on auxiliary modules by @BenjaminBossan in #2466
fix args by @real-zhangzhe in #2474
ENH Add default target_modules for Llama4 by @BenjaminBossan in #2480
[Feature Request] Add LoRA-FA to PEFT by @AaronZLT in #2468
TST Refactor (continued) of encoder tests by @BenjaminBossan in #2478
FIX: Error when merging LoRA bias with scale != 1 by @BenjaminBossan in #2489
FIX: X-LoRA error when targeting different modules by @BenjaminBossan in #2488
Fix: the evaluation_strategy is deprecated by @yuanwu2017 in #2487
Testing common uses situational HF_HUB_OFFLINE by @githubnemo in #2490
MNT: Update HF Hub download kwargs by @BenjaminBossan in #2492
FIX Multi GPU tests: explicit device map by @BenjaminBossan in #2484
Fix #2477: Regression accessing modules_to_save by @githubnemo in #2481
make test_lora_use_dora_linear pass on XPU by @yao-matrix in #2493
TST: AQLM test no longer x-fails by @BenjaminBossan in #2506
TST make 3 flaky test cases always pass on XPU by @yao-matrix in #2503
FIX: CPT should not be tested with sequence classification by @BenjaminBossan in #2507
Update Docker image builds for torch 2.7+cu126 by @matthewdouglas in #2514
Feature: RandLora integration into peft by @PaulAlbert31 in #2464
LORA/MODEL: Use max rank of pattern for add_weighted_adapter by @Beinsezii in #2512
fix typo for skipping test by @jiqing-feng in #2519
docs typo: fix links by @imba-tjd in #2517
Add INC dispatcher by @dsocek in #2499
ENH: Add default Qwen3 target modules by @BenjaminBossan in #2522
MNT: Pin GitHub action hashes for security by @BenjaminBossan in #2521
TST: Refactor remaining common tests to use pytest by @BenjaminBossan in #2491
ENH: Add tests, docs, types for scaling methods by @BenjaminBossan in #2526
TST Mark AutoAWQ as xfail for now by @BenjaminBossan in #2529
FIX Prompt learning issue with 4d attention mask by @BenjaminBossan in #2458
FIX: Use correct argument name in MultiheadAttention forward by @BenjaminBossan in #2510
Method comparison: Support more options for the optimizer by @BenjaminBossan in #2479
Randlora documentation and some example usage by @PaulAlbert31 in #2524
added support for Conv1d for DoRA by @EskildAndersen in #2531
Fix #2535: Prev...

Contributors

githubnemo, BenjaminBossan, and 25 other contributors

Assets 2

15 Apr 15:28

githubnemo

v0.15.2

3c7b6e7

v0.15.2

This patch fixes a bug that resulted in prompt learning methods like P-tuning not to work (#2477).

Assets 2

27 Mar 15:46

githubnemo

v0.15.1

c42eb22

v0.15.1

This patch includes a fix for #2450. In this bug modules_to_save was not handled correctly when used in conjunction with DeepSpeed ZeRO stage 3 which resulted in those modules being placeholder values in the saved checkpoints.

Full Changelog: v0.15.0...v0.15.1

Assets 2

19 Mar 15:05

githubnemo

v0.15.0

b34d8a2

v0.15.0

Highlights

New Methods

CorDA: Context-Oriented Decomposition Adaptation

@iboing and @5eqn contributed CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning . This task-driven initialization method has two modes, knowledge-preservation and instruction-preservation, both using external data to select ranks intelligently. The former can be used to select those ranks that correspond to weights not affiliated with knowledge from, say, a QA dataset. The latter can be used to select those ranks that correspond most to the task at hand (e.g., a classification task). (#2231)

Trainable Tokens: Selective token update

The new Trainable Tokens tuner allows for selective training of tokens without re-training the full embedding matrix, e.g. when adding support for reasoning / thinking tokens. This is a lot more memory efficient and the saved checkpoint is much smaller. It can be used standalone or in conjunction with LoRA adapters by passing trainable_token_indices to LoraConfig. (#2376)

Enhancements

LoRA now supports targeting multihead attention modules (but for now only those with _qkv_same_embed_dim=True). These modules were tricky as they may expose linear submodules but won't use their forward methods, therefore needing explicit support. (#1324)

Hotswapping now allows different alpha scalings and ranks without recompilation of the model when the model is prepared using a call to prepare_model_for_compiled_hotswap() before compiling the model. (#2177)

GPTQModel support was added in #2247 as a replacement for AutoGPTQ which is not maintained anymore.

Changes

It's now possible to use all-linear as target_modules for custom (non-transformers) models (#2267). With this change comes a bugfix where it was possible that non-linear layers were selected when they shared the same name with a linear layer (e.g., bar.foo and baz.foo).
The internal tuner API was refactored to make method registration easier. With this change the number of changes to numerous files is reduced to a single register_peft_method() call. (#2282)
PEFT_TYPE_TO_MODEL_MAPPING is now deprecated and should not be relied upon. Use PEFT_TYPE_TO_TUNER_MAPPING instead. (#2282)
Mixed adapter batches can now be used in conjunction with beam search. (#2287)
It was possible that modules_to_save keys wrongly matched parts of the state dict if the key was a substring of another key (e.g., classifier and classifier2). (#2334)
Auto-casting of the input dtype to the LoRA adapter dtype can now be disabled via disable_input_dtype_casting=True. (#2353)
The config parameters rank_pattern and alpha_pattern used by many adapters now supports matching full paths as well by specifying the pattern with a caret in front, for example: ^foo to target model.foo but not model.bar.foo. (#2419)
AutoPeftModels do not reduce the embedding size anymore if the tokenizer size differs from the embedding size. Only if there are more tokens in the tokenizer than in the embedding matrix, the matrix will be resized. This is to prevent resizing of embedding matrices in models that have 'spare' tokens built-in. (#2427)

What's Changed

FIX: Ensure Device Compatibility for BOFT Forward/Merging by @d-kleine in #2242
MNT: Bump version to 0.14.1.dev0 by @BenjaminBossan in #2263
ENH: fix library interface by @bluenote10 in #2265
FIX: Add warning for adapter_name conflict with tuner by @pzdkn in #2254
ENH: FIX: Allow "all-linear" to target custom models by @BenjaminBossan in #2267
MNT: apply sorting of exported symbols in __all__ by @bluenote10 in #2280
MNT: apply sorting of imports by @bluenote10 in #2279
FIX: Adoption prompt: New way to obtain position embeddings by @BenjaminBossan in #2276
FIX: Int8 check for torchao v0.7.0 by @BenjaminBossan in #2284
FEAT: Adding CorDA as an optional initialization method of LoRA by @iboing in #2231
FIX: typo in lora config.py by @innerlee in #2297
DOC: Added information regarding freezing the base model in prepare_model_for_kbit_training docstring by @NilBiescas in #2305
DOC: add resize_token_embeddings to docs by @bingwork in #2290
FIX: Make CorDA example work by @5eqn in #2300
FIX: #2295: Warn when user reloads modified model by @githubnemo in #2306
ENH: Extend usage for OLoRA finetune script by @jiqing-feng in #2308
CI: Add zizmor for CI (security) linting by @githubnemo in #2288
FEAT: Add LoRA multihead attention module by @BenjaminBossan in #1324
DOC: Updated documentation for get_peft_model() for in-place base model modification by @d-kleine in #2313
FIX: Prefix tuning test w/ rotary embedding on multi GPU by @BenjaminBossan in #2311
FIX: Adaption prompt errors after changes from transformers #35235 by @BenjaminBossan in #2314
FIX: Package checks for torchao, EETQ by @BenjaminBossan in #2320
Refactor: PEFT method registration function by @BenjaminBossan in #2282
FIX: low_cpu_mem_usage=True with 8bit bitsandbytes by @BenjaminBossan in #2325
FIX: Reinstate PEFT_TYPE_TO_MODEL_MAPPING variable with deprecation by @BenjaminBossan in #2328
FIX: reduce CorDA memory consumption + docs by @5eqn in #2324
MNT: React on new zizmor version findings by @githubnemo in #2331
TST: make cuda-only tests device-agnostic by @faaany in #2323
FIX: Generating with mixed adapter batches and with beam search enabled by @BenjaminBossan in #2287
FIX: Bug with modules_to_save loading if substring by @BenjaminBossan in #2334
FIX: Add missing attributes to MultiheadAttention by @BenjaminBossan in #2335
FIX: for zizmor permission warnings by @githubnemo in #2338
CI: Attempt at adding a cache for models by @githubnemo in #2327
FIX: Avoid needless copy from modules_to_save by @BenjaminBossan in #2220
DOC: Add entry to solve unknown config argument by @BenjaminBossan in #2340
FEAT: add gptqmodel support by @jiqing-feng in #2247
MNT: Update ruff to v0.9.2 by @BenjaminBossan in #2343
TST: Update torch.compile tests and docs by @BenjaminBossan in #2332
FIX: Documentation & error checking for AdaLoRA timing by @githubnemo in #2341
DOC: Better document init_lora_weights=False option by @BenjaminBossan in #2347
ENH: Adding Lora implementation for nn.Conv1d by @CCLDArjun in #2333
FIX: Failing AdaLoRA GPU test by @BenjaminBossan in #2349
ENH: Improve invalid peft config error message by @thedebugger in #2346
TST: Use different diffusion model for testing by @BenjaminBossan in #2345
CI: Use locked install for zizmor by @githubnemo in #2350
DOC: fix links to PEFT guides by @makelinux in #2357
DOC: rename link to PEFT Quicktour by @makelinux in #2358
ENH: Allow disabling input dtype casting for LoRA by @BenjaminBossan in #2353
ENH: Hotswap allow different alpha scalings and ranks by @BenjaminBossan in #2177
DOC: Fix links to boft by @makelinux in #2365
DOC: Explain uninitialized weights warning by @BenjaminBossan in #2369
ENH: Optimization for ConvNd if dropout=0. by @gslama12 in #2371
FIX: Small fixes to hotswapping by @BenjaminBossan in #2366
ENH: prepare_model_for_compiled_hotswap raises when no adapter was found by @BenjaminBossan in https://github.com/hugging...

Contributors

githubnemo, Qubitium, and 16 other contributors

Assets 2

06 Dec 11:42

githubnemo

v0.14.0

de88c70

Version 0.14.0: EVA, Context-aware Prompt Tuning, Bone, and more

Highlights

New Methods

Context-aware Prompt Tuning

@tsachiblau added a new soft prompt method called Context-aware Prompt Tuning (CPT) which is a combination of In-Context Learning and Prompt Tuning in the sense that, for each training sample, it builds a learnable context from training examples in addition to the single training sample. Allows for sample- and parameter-efficient few-shot classification and addresses recency-bias.

Explained Variance Adaptation

@sirluk contributed a new LoRA initialization method called Explained Variance Adaptation (EVA). Instead of randomly initializing LoRA weights, this method uses SVD on minibatches of finetuning data to initialize the LoRA weights and is also able to re-allocate the ranks of the adapter based on the explained variance ratio (derived from SVD). Thus, this initialization method can yield better initial values and better rank distribution.

Bone

@JL-er added an implementation for Block Affine (Bone) Adaptation which utilizes presumed sparsity in the base layer weights to divide them into multiple sub-spaces that share a single low-rank matrix for updates. Compared to LoRA, Bone has the potential to significantly reduce memory usage and achieve faster computation.

Enhancements

PEFT now supports LoRAs for int8 torchao quantized models (check this and this notebook) . In addition, VeRA can now be used with 4 and 8 bit bitsandbytes quantization thanks to @ZiadHelal.

Hot-swapping of LoRA adapters is now possible using the hotswap_adapter function. Now you are able to load one LoRA and replace its weights in-place with the LoRA weights of another adapter which, in general, should be faster than deleting one adapter and loading the other adapter in its place. The feature is built so that no re-compilation of the model is necessary if torch.compile was called on the model (right now, this requires ranks and alphas to be the same for the adapters).

LoRA and IA³ now support Conv3d layers thanks to @jsilter, and @JINO-ROHIT added a notebook showcasing PEFT model evaluation using lm-eval-harness toolkit.

With the target_modules argument, you can specify which layers to target with the adapter (e.g. LoRA). Now you can also specify which modules not to target by using the exclude_modules parameter (thanks @JINO-ROHIT).

Changes

There have been made several fixes to the OFT implementation, among other things, to fix merging, which makes adapter weights trained with PEFT versions prior to this release incompatible (see #1996 for details).
Adapter configs are now forward-compatible by accepting unknown keys.
Prefix tuning was fitted to the DynamicCache caching infrastructure of transformers (see #2096). If you are using this PEFT version and a recent version of transformers with an old prefix tuning checkpoint, you should double check that it still works correctly and retrain it if it doesn't.
Added lora_bias parameter to LoRA layers to enable bias on LoRA B matrix. This is useful when extracting LoRA weights from fully fine-tuned parameters with bias vectors so that these can be taken into account.
#2180 provided a couple of bug fixes to LoKr (thanks @yaswanth19). If you're using LoKr, your old checkpoints should still work but it's recommended to retrain your adapter.
from_pretrained now warns the user if PEFT keys are missing.
Attribute access to modules in modules_to_save is now properly and transparently handled.
PEFT supports the changes to bitsandbytes 8bit quantization from the recent v0.45.0 release. To benefit from these improvements, we thus recommend to upgrade bitsandbytes if you're using QLoRA. Expect slight numerical differences in model outputs if you're using QLoRA with 8bit bitsandbytes quantization.

What's Changed

Bump version to 0.13.1.dev0 by @BenjaminBossan in #2094
Support Conv3d layer in LoRA and IA3 by @jsilter in #2082
Fix Inconsistent Missing Keys Warning for Adapter Weights in PEFT by @yaswanth19 in #2084
FIX: Change check if past_key_values is empty by @BenjaminBossan in #2106
Update install.md by @Salehbigdeli in #2110
Update OFT to fix merge bugs by @Zeju1997 in #1996
ENH: Improved attribute access for modules_to_save by @BenjaminBossan in #2117
FIX low_cpu_mem_usage consolidates devices by @BenjaminBossan in #2113
TST Mark flaky X-LoRA test as xfail by @BenjaminBossan in #2114
ENH: Warn when from_pretrained misses PEFT keys by @BenjaminBossan in #2118
FEAT: Adding exclude modules param(#2044) by @JINO-ROHIT in #2102
fix merging bug / update boft conv2d scaling variable by @Zeju1997 in #2127
FEAT: Support quantization for VeRA using bitsandbytes (#2070) by @ZiadHelal in #2076
Bump version to 0.13.2.dev0 by @BenjaminBossan in #2137
FEAT: Support torchao by @BenjaminBossan in #2062
FIX: Transpose weight matrix based on fan_in_fan_out condition in PiSSA initialization (#2103) by @suyang160 in #2104
FIX Type annoations in vera/bnb.py by @BenjaminBossan in #2139
ENH Make PEFT configs forward compatible by @BenjaminBossan in #2038
FIX Raise an error when performing mixed adapter inference and passing non-existing adapter names by @BenjaminBossan in #2090
FIX Prompt learning with latest transformers error by @BenjaminBossan in #2140
adding peft lora example notebook for ner by @JINO-ROHIT in #2126
FIX TST: NaN issue with HQQ GPU test by @BenjaminBossan in #2143
FIX: Bug in target module optimization if child module name is suffix of parent module name by @BenjaminBossan in #2144
Bump version to 0.13.2.dev0 by @BenjaminBossan in #2145
FIX Don't assume past_key_valus for encoder models by @BenjaminBossan in #2149
Use SFTConfig instead of SFTTrainer keyword args by @qgallouedec in #2150
FIX: Sft train script FSDP QLoRA embedding mean resizing error by @BenjaminBossan in #2151
Optimize DoRA in eval and no dropout by @ariG23498 in #2122
FIX Missing low_cpu_mem_usage argument by @BenjaminBossan in #2156
MNT: Remove version pin of diffusers by @BenjaminBossan in #2162
DOC: Improve docs for layers_pattern argument by @BenjaminBossan in #2157
Update HRA by @DaShenZi721 in #2160
fix fsdp_auto_wrap_policy by @eljandoubi in #2167
MNT Remove Python 3.8 since it's end of life by @BenjaminBossan in #2135
Improving error message when users pass layers_to_transform and layers_pattern by @JINO-ROHIT in #2169
FEAT Add hotswapping functionality by @BenjaminBossan in #2120
Fix to prefix tuning to fit transformers by @BenjaminBossan in #2096
MNT: Enable Python 3.12 on CI by @BenjaminBossan in #2173
MNT: Update docker nvidia base image to 12.4.1 by @BenjaminBossan in #2176
DOC: Extend modules_to_save doc with pooler example by @BenjaminBossan in #2175
FIX VeRA failure on multiple GPUs by @BenjaminBossan in #2163
FIX: Import location of HF hub errors by @BenjaminBossan in #2178
DOC: fix broken link in the README of loftq by @dennis2030 in #2183
added checks for layers to transforms and layer pattern in lora by @JINO-ROHIT in #2159
ENH: Warn when loading PiSSA/OLoRA together with other adapters by @BenjaminBossan in #2186
TST: Skip AQLM test that is incompatible with torch 2.5 by @BenjaminBossan in #2187
FIX: Prefix...

Contributors

githubnemo, jsilter, and 19 other contributors

Assets 2

11 Oct 11:45

BenjaminBossan

v0.13.2

431c0e2

v0.13.2: Small patch release

This patch release contains a small bug fix for an issue that prevented some LoRA checkpoints to be loaded correctly (mostly concerning stable diffusion checkpoints not trained with PEFT when loaded in diffusers, #2144).

Full Changelog: v0.13.1...v0.13.2

Assets 2

08 Oct 12:29

BenjaminBossan

v0.13.1

b8da272

v0.13.1: Small patch release

This patch release contains a small bug fix for the low_cpu_mem_usage=True option (#2113).

Full Changelog: v0.13.0...v0.13.1

Assets 2

Releases: huggingface/peft

0.18.0: RoAd, ALoRA, Arrow, WaveFT, DeLoRA, OSF, and more

Highlights

New Methods

RoAd

ALoRA

Arrow & GenKnowSub

WaveFT

DeLoRA

OSF

Enhancements

Text generation benchmark

Reliable interface for integrations

Handling of weight tying

Support Conv1d and 1x1 Conv2 layers in LoHa and LoKr

New prompt tuning initialization

Combining LoRA adapters with negative weights

Changes

Transformers compatibility

Python version

Updates to OFT

All Changes

Contributors

Uh oh!

0.17.1

Uh oh!

0.17.0: SHiRA, MiSS, LoRA for MoE, and more

Highlights

New Methods

SHiRA

MiSS

Enhancements

LoRA for nn.Parameter

Injecting adapters based on a state_dict

Changes

Compatibility

All Changes

New Contributors

Contributors

Uh oh!

0.16.0: LoRA-FA, RandLoRA, C³A, and much more

Highlights

New Methods

LoRA-FA

RandLoRA

C³A

Enhancements

Changes

Compatibility

All Changes

Contributors

Uh oh!

v0.15.2

Uh oh!

v0.15.1

Uh oh!

v0.15.0

Highlights

New Methods

CorDA: Context-Oriented Decomposition Adaptation

Trainable Tokens: Selective token update

Enhancements

Changes

What's Changed

Contributors

Uh oh!

Version 0.14.0: EVA, Context-aware Prompt Tuning, Bone, and more

Highlights

New Methods

Context-aware Prompt Tuning

Explained Variance Adaptation

Bone

Enhancements

Changes

What's Changed

Contributors

Uh oh!

v0.13.2: Small patch release

Uh oh!

v0.13.1: Small patch release

LoRA for `nn.Parameter`

Injecting adapters based on a `state_dict`