Add `quantization_config` trainer argument (streamline QLoRA) by qgallouedec · Pull Request #6157 · huggingface/trl

qgallouedec · 2026-06-24T00:13:19Z

Adds a quantization_config argument to SFTTrainer, DPOTrainer, GRPOTrainer, RLOOTrainer, and RewardTrainer, so QLoRA no longer requires reaching into model_init_kwargs (or worse, manual model loading)

After:

SFTTrainer(
    model="meta-llama/Llama-2-7b-hf",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    peft_config=LoraConfig(),
    train_dataset=dataset,
)

Compare with before (many ressources are written like this!):

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    device_map="auto",
)
SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=LoraConfig(),
)

Before (the "right" way, but not very popular):

SFTTrainer(
    model="meta-llama/Llama-2-7b-hf",
    args=SFTConfig(model_init_kwargs={"quantization_config": BitsAndBytesConfig(load_in_4bit=True)}),
    peft_config=LoraConfig(),
    train_dataset=dataset,
)

It sits next to peft_config (the other non-serializable QLoRA ingredient), flows into from_pretrained, and raises if also set in args.model_init_kwargs.

Changes

New quantization_config arg on the five trainers above (+ docstrings).
The trl/scripts/{sft,dpo,grpo,rloo,reward}.py CLIs now pass it directly instead of injecting into model_init_kwargs.
This drops the redundant model_init_kwargs["device_map"] = get_kbit_device_map() line: verified on 8×H100 that QLoRA trains identically with and without it, across transformers 4.56.2 (min supported) and 5.13; distributed runs override device_map to None anyway, and single-process runs auto-place quantized weights on the current CUDA device. See Remove redundant get_kbit_device_map() #6158
Updated the QLoRA example in docs/source/peft_integration.md.

Note

Medium Risk
Touches model loading for all major trainers and reference-model paths; behavior change is mostly API surface, but incorrect quantization or duplicate config could break QLoRA runs.

Overview
Adds a quantization_config argument to SFTTrainer, DPOTrainer, GRPOTrainer, RLOOTrainer, and RewardTrainer, so QLoRA can pass BitsAndBytesConfig next to peft_config instead of pre-loading the model or stuffing model_init_kwargs.

When model is a hub id, trainers merge quantization_config into from_pretrained kwargs (and the same for reference models where applicable). Setting it in both the trainer arg and args.model_init_kwargs now raises; passing it with an already-instantiated model warns and ignores it.

trl/scripts/{sft,dpo,grpo,rloo,reward}.py and example scripts pass get_quantization_config(model_args) directly to the trainer rather than injecting into training_args.model_init_kwargs. Docs and the SFT/GRPO QLoRA notebooks are updated to use model=model_id plus quantization_config and model_init_kwargs for attention/dtype only.

^{Reviewed by Cursor Bugbot for commit 77f287d. Bugbot is set up for automated code reviews on this repo. Configure here.}

bot-ci-comment · 2026-06-24T00:16:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45d6a2decd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-24T00:17:49Z

        optimizers: tuple[torch.optim.Optimizer | None, torch.optim.lr_scheduler.LambdaLR | None] = (None, None),
        optimizer_cls_and_kwargs: tuple[type[torch.optim.Optimizer], dict[str, Any]] | None = None,
        preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None,
+        quantization_config: "BitsAndBytesConfig | None" = None,


Preserve positional peft_config compatibility

Adding quantization_config before the existing peft_config parameter shifts any current positional peft_config argument into quantization_config because this public constructor is not keyword-only. In existing calls that pass peft_config positionally, a model id will forward a PeftConfig object to from_pretrained(..., quantization_config=...) and fail, while an already-instantiated model will ignore it and train without the adapter; the same signature insertion appears in the other updated trainers. Put the new argument after peft_config or otherwise preserve the old positional layout.

Useful? React with 👍 / 👎.

although not specifically disallowed, it would be very surprising that peft_config is used as positional arg

…ument

sergiopaniego

the example scripts/notebooks in the examples/ folder should also be reviewed and updated

qgallouedec · 2026-06-26T15:51:18Z

right @sergiopaniego , updated!

…onfig # Conflicts: # examples/scripts/grpo_vlm.py # examples/scripts/gspo.py # examples/scripts/gspo_vlm.py # examples/scripts/rloo_vlm.py # trl/scripts/dpo.py # trl/scripts/grpo.py # trl/scripts/reward.py # trl/scripts/rloo.py # trl/scripts/sft.py

cursor

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.

^{Reviewed by Cursor Bugbot for commit 77f287d. Configure here.}

qgallouedec · 2026-07-03T15:40:00Z

I'm going to merge this one with no review, it's been open for too long

Add quantization_config trainer argument (streamline QLoRA)

45d6a2d

cursor Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread trl/trainer/dpo_trainer.py

chatgpt-codex-connector Bot reviewed Jun 24, 2026

View reviewed changes

style

38626f3

qgallouedec requested review from AmineDiro, albertvillanova and kashif June 24, 2026 03:39

cursor Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread trl/trainer/grpo_trainer.py

Clarify error message for quantization_config to prefer trainer arg…

0bb426c

…ument

cursor Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread trl/trainer/sft_trainer.py

qgallouedec added 2 commits June 24, 2026 13:01

Merge branch 'main' into native-quantization-config

261f974

Merge branch 'main' into native-quantization-config

65f79a9

sergiopaniego reviewed Jun 26, 2026

View reviewed changes

qgallouedec and others added 3 commits June 26, 2026 10:37

Merge branch 'main' into native-quantization-config

2e37125

fix quantization configuration handling in trainers and scripts

f6a660b

update notebooks

7be97c4

qgallouedec and others added 8 commits June 28, 2026 21:53

Merge branch 'main' into native-quantization-config

508e00e

Merge branch 'main' into native-quantization-config

eb111ce

Merge branch 'main' into native-quantization-config

516d977

Merge branch 'main' into native-quantization-config

223fb54

Merge branch 'main' into native-quantization-config

f9a4b87

Merge branch 'main' into native-quantization-config

2d11fca

Merge branch 'main' into native-quantization-config

77f287d

cursor Bot reviewed Jul 3, 2026

View reviewed changes

Comment thread trl/trainer/dpo_trainer.py

qgallouedec merged commit 7f8fbf0 into main Jul 3, 2026
13 checks passed

qgallouedec deleted the native-quantization-config branch July 3, 2026 15:40

qgallouedec mentioned this pull request Jul 3, 2026

Align KTO with DPO: quantization_config trainer argument #6276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `quantization_config` trainer argument (streamline QLoRA)#6157

Add `quantization_config` trainer argument (streamline QLoRA)#6157
qgallouedec merged 16 commits into
mainfrom
native-quantization-config

qgallouedec commented Jun 24, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

bot-ci-comment Bot commented Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Uh oh!

qgallouedec Jun 24, 2026

Uh oh!

Uh oh!

Uh oh!

sergiopaniego left a comment

Uh oh!

qgallouedec commented Jun 26, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

qgallouedec commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

qgallouedec commented Jun 24, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Uh oh!

bot-ci-comment Bot commented Jun 24, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

qgallouedec Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec commented Jun 26, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

qgallouedec commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented Jun 24, 2026 •

edited by cursor Bot

Loading