Skip to content

Commit 031ce0f

Browse files
Andrew Gupytorchmergebot
authored andcommitted
[FSDP][7/N] Add warning about frozen params (pytorch#104967)
Pull Request resolved: pytorch#104967 Approved by: https://github.com/rohan-varma ghstack dependencies: pytorch#104427
1 parent bdcc454 commit 031ce0f

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

torch/distributed/fsdp/fully_sharded_data_parallel.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,18 @@ class FullyShardedDataParallel(nn.Module, _FSDPState):
176176
same FSDP unit. If enhanced shared parameter support is needed for your
177177
use case, please ping https://github.com/pytorch/pytorch/issues/77724
178178
179-
.. note:
179+
.. warning::
180+
FSDP has some constraints on freezing parameters (i.e. setting
181+
``param.requires_grad=False``). For ``use_orig_params=False``, each
182+
FSDP instance must manage parameters that are all frozen or all
183+
non-frozen. For ``use_orig_params=True``, FSDP supports mixing frozen
184+
and non-frozen, but we recommend not doing so since then the gradient
185+
memory usage will be higher than expected (namely, equivalent to not
186+
freezing those parameters). This means that ideally, frozen parameters
187+
should be isolated into their own ``nn.Module`` s and wrapped
188+
separately with FSDP.
189+
190+
.. note::
180191
Attempting to run the forward pass of a submodule that is contained in an
181192
FSDP instance is not supported and will result in errors. This is because the
182193
submodule's parameters will be sharded, but it itself is not an FSDP instance,

0 commit comments

Comments
 (0)