Skip to content

Conversation

@felipemello1
Copy link
Contributor

@felipemello1 felipemello1 commented Dec 9, 2025

We currently have 3 places for compile

This PR creates a single flag at the top and sets it to true as default, which helps with memory / tok/s

For the current bsz / seq len, its not a huge difference in speed/memory, but as the models/sequence grow, it becomes more relevant.

image

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 9, 2025
@felipemello1 felipemello1 changed the title add compile flag to configs easy - add compile flag to configs Dec 9, 2025
model: "Qwen/Qwen3-4B"
off_by_n: 1 # Off by one by default
launcher: mast
compile: true # Enable torch.compile for trainer/ref_model, and CUDA graphs for vLLM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should have been removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its there: https://github.com/meta-pytorch/torchforge/blob/main/.meta/mast/qwen3_4b_mast.yaml

But i can delete it in this PR if you want

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOf, why are there so many configs.
Yes, i missed it in https://github.com/meta-pytorch/torchforge/pull/632/files . Please just remove it.

max_res_tokens: 2048
model: "Qwen/Qwen3-8B"
off_by_n: 1 # Off by one by default
compile: true # Enable torch.compile for trainer/ref_model, and CUDA graphs for vLLM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not enabling it by default if you're updating all of the configs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym by "enabling it by default"? We still need to expose the flag because compile can be tricky in some setups. It also add a bit of warmup time, so if someone is just quickly testing something, they may want to set it to false

Copy link
Contributor

@JenniferWang JenniferWang Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was suggesting that to reduce the number of hyper parameters in the yaml config because

  1. We seem to want it to be enable for production runs
  2. This is a niche config (many things can slow down warmup time) that I don't expect people to remember to toggle in practice. We can set the default to be false when launching the job in local mode or ONLY set them to be true for large models.

Not a big deal.

@JenniferWang
Copy link
Contributor

Will enabling compile add to the job starting up time? Is there usually instrumentation around that?

@felipemello1
Copy link
Contributor Author

felipemello1 commented Dec 10, 2025

Will enabling compile add to the job starting up time? Is there usually instrumentation around that?

Yes, the larger the model, the longer it takes. Its between a couple of seconds to 60s. But it decreases peak memory by 40% and increase Tok/s > 20%, in my experience

@felipemello1 felipemello1 merged commit 7b8580a into meta-pytorch:main Dec 10, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants