ch06 - 02_bonus_additional-experiments #168

d-kleine · 2024-05-21T14:07:00Z

d-kleine
May 21, 2024

I am wondering why the code for training only the last layer works, because it only passes the code:

for param in model.parameters():
        param.requires_grad = False

...

if args.trainable_layers == "last_layer":
    pass

...

I have set it more specifically, with the same results:

for param in model.parameters():
        param.requires_grad = False
...

if args.trainable_layers == "last_layer":
    # Get the last layer
    last_layer = list(model.children())[-1]

    # Make the last layer trainable
    for param in last_layer.parameters():
        param.requires_grad = True

...

What do you also think about adding test for the two last layers + two last transformer blocks, like from your article about finetuning LLMs:

Answered by rasbt

May 22, 2024

I am wondering why the code for training only the last layer works, because it only passes the code:

Oh that's simply because we first make all layers untrainable, and then we replace the last layer with nn.Linear, and nn.Linear is trainable by default. And that's because nn.Linear uses nn.Parameter, which has requires_grad=True by default.

View full answer

rasbt · 2024-05-22T00:32:14Z

rasbt
May 22, 2024
Maintainer

I am wondering why the code for training only the last layer works, because it only passes the code:

Oh that's simply because we first make all layers untrainable, and then we replace the last layer with nn.Linear, and nn.Linear is trainable by default. And that's because nn.Linear uses nn.Parameter, which has requires_grad=True by default.

12 replies

rasbt May 31, 2024
Maintainer

That's not a bad idea. I will put that on my list ... Right now, I need to finish chapter 7 (the deadline in 2.5 weeks 😅)

rasbt May 31, 2024
Maintainer

After the last chapter is done and submitted, I plan to add way more things here and there to the repo

d-kleine May 31, 2024
Author

Alright, thanks! 👍🏻

rasbt May 31, 2024
Maintainer

And no worries, I am planning to add bonus material to the repo for a long time to come. Have such a long list of interesting things :). Big thanks to you by the way for all the invaluable feedback! I'll make sure you get a big shoutout in the Acknowledgements section!

d-kleine May 31, 2024
Author

Sounds great, and thanks for the shoutout! Looking forward to it! 🙂

rasbt · 2024-05-22T00:52:57Z

rasbt
May 22, 2024
Maintainer

That was a good suggestion, the performance of the last two blocks is quite good!

1 reply

d-kleine May 22, 2024
Author

Thanks for the implementation - yeah, the performance is a little better! 👍🏻 🙂
I just have tested it, but there was an issue ("two_last_blocks" instead of "last_two_blocks"). Pushed fix via #171

ch06 - 02_bonus_additional-experiments #168

Uh oh!

Uh oh!

d-kleine May 21, 2024

Replies: 2 comments · 13 replies

Uh oh!

rasbt May 22, 2024 Maintainer

Uh oh!

rasbt May 31, 2024 Maintainer

Uh oh!

rasbt May 31, 2024 Maintainer

Uh oh!

d-kleine May 31, 2024 Author

Uh oh!

Uh oh!

rasbt May 31, 2024 Maintainer

Uh oh!

d-kleine May 31, 2024 Author

Uh oh!

rasbt May 22, 2024 Maintainer

Uh oh!

d-kleine May 22, 2024 Author

d-kleine
May 21, 2024

Replies: 2 comments 13 replies

rasbt
May 22, 2024
Maintainer

rasbt May 31, 2024
Maintainer

rasbt May 31, 2024
Maintainer

d-kleine May 31, 2024
Author

rasbt May 31, 2024
Maintainer

d-kleine May 31, 2024
Author

rasbt
May 22, 2024
Maintainer

d-kleine May 22, 2024
Author