Skip to content

Conversation

@timmy-feng
Copy link
Contributor

@timmy-feng timmy-feng commented Nov 13, 2025

Motivation

PR #293 introduces new behavior which saves only the last layer hidden states from the target model when generating offline hidden states. This saves us from needing to load the entire vocab size of logits into CPU RAM.

This breaks the old OfflineEagle3Model class which passes target to OnlineEagle3Model without projecting the shape from hidden_size to vocab_size.

Modifications

Project the offline hidden state into the correct shape using the target LM head in OfflineEagle3Model.forward().

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@zhyncs zhyncs merged commit a4453bf into sgl-project:main Nov 15, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants