Add quantization_config trainer argument (streamline QLoRA)#6157
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 45d6a2decd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| optimizers: tuple[torch.optim.Optimizer | None, torch.optim.lr_scheduler.LambdaLR | None] = (None, None), | ||
| optimizer_cls_and_kwargs: tuple[type[torch.optim.Optimizer], dict[str, Any]] | None = None, | ||
| preprocess_logits_for_metrics: Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None, | ||
| quantization_config: "BitsAndBytesConfig | None" = None, |
There was a problem hiding this comment.
Preserve positional peft_config compatibility
Adding quantization_config before the existing peft_config parameter shifts any current positional peft_config argument into quantization_config because this public constructor is not keyword-only. In existing calls that pass peft_config positionally, a model id will forward a PeftConfig object to from_pretrained(..., quantization_config=...) and fail, while an already-instantiated model will ignore it and train without the adapter; the same signature insertion appears in the other updated trainers. Put the new argument after peft_config or otherwise preserve the old positional layout.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
although not specifically disallowed, it would be very surprising that peft_config is used as positional arg
sergiopaniego
left a comment
There was a problem hiding this comment.
the example scripts/notebooks in the examples/ folder should also be reviewed and updated
|
right @sergiopaniego , updated! |
…onfig # Conflicts: # examples/scripts/grpo_vlm.py # examples/scripts/gspo.py # examples/scripts/gspo_vlm.py # examples/scripts/rloo_vlm.py # trl/scripts/dpo.py # trl/scripts/grpo.py # trl/scripts/reward.py # trl/scripts/rloo.py # trl/scripts/sft.py
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 77f287d. Configure here.
|
I'm going to merge this one with no review, it's been open for too long |

Adds a
quantization_configargument toSFTTrainer,DPOTrainer,GRPOTrainer,RLOOTrainer, andRewardTrainer, so QLoRA no longer requires reaching intomodel_init_kwargs(or worse, manual model loading)After:
Compare with before (many ressources are written like this!):
Before (the "right" way, but not very popular):
It sits next to
peft_config(the other non-serializable QLoRA ingredient), flows intofrom_pretrained, and raises if also set inargs.model_init_kwargs.Changes
quantization_configarg on the five trainers above (+ docstrings).trl/scripts/{sft,dpo,grpo,rloo,reward}.pyCLIs now pass it directly instead of injecting intomodel_init_kwargs.model_init_kwargs["device_map"] = get_kbit_device_map()line: verified on 8×H100 that QLoRA trains identically with and without it, across transformers 4.56.2 (min supported) and 5.13; distributed runs overridedevice_maptoNoneanyway, and single-process runs auto-place quantized weights on the current CUDA device. See Remove redundantget_kbit_device_map()#6158docs/source/peft_integration.md.Note
Medium Risk
Touches model loading for all major trainers and reference-model paths; behavior change is mostly API surface, but incorrect quantization or duplicate config could break QLoRA runs.
Overview
Adds a
quantization_configargument toSFTTrainer,DPOTrainer,GRPOTrainer,RLOOTrainer, andRewardTrainer, so QLoRA can passBitsAndBytesConfignext topeft_configinstead of pre-loading the model or stuffingmodel_init_kwargs.When
modelis a hub id, trainers mergequantization_configintofrom_pretrainedkwargs (and the same for reference models where applicable). Setting it in both the trainer arg andargs.model_init_kwargsnow raises; passing it with an already-instantiated model warns and ignores it.trl/scripts/{sft,dpo,grpo,rloo,reward}.pyand example scripts passget_quantization_config(model_args)directly to the trainer rather than injecting intotraining_args.model_init_kwargs. Docs and the SFT/GRPO QLoRA notebooks are updated to usemodel=model_idplusquantization_configandmodel_init_kwargsfor attention/dtype only.Reviewed by Cursor Bugbot for commit 77f287d. Bugbot is set up for automated code reviews on this repo. Configure here.