[Feature]. Eagle3 support #364

SiqiLi-Fighting · 2025-11-12T13:36:51Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Please use English, otherwise it will be closed.
The purpose of the PR, or link existing issues this PR will resolve.
The test plan, such as providing test command.
(Optional) The necessary documentation update.

* add llama eagle3 model file * fix padding bug * fix some padding problem * rm some debug log

* qwen eagle3 * rm log

* feat: implement tree_verify_greedy pallas kernel * feat: implement build tree kernel for eagle

gemini-code-assist · 2025-11-12T13:38:43Z

Summary of Changes

Hello @SiqiLi-Fighting, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for a significant performance improvement in language model inference by integrating Multi-Token Prediction (MTP) into the EAGLE speculative decoding framework. The changes span across various system components, from low-level attention kernels to high-level scheduling and model management, all designed to enable models to generate text more rapidly while maintaining quality. The introduction of MTP-specific model configurations and optimized data structures is central to achieving higher throughput and lower latency in sequence generation.

Highlights

Multi-Token Prediction (MTP) for EAGLE Speculative Decoding: This pull request introduces the design and initial implementation for Multi-Token Prediction (MTP) as an enhancement to the existing EAGLE speculative decoding algorithm. MTP aims to significantly improve inference throughput by enabling models to predict multiple tokens simultaneously.
Core Architectural Changes: The implementation includes modifications to the attention kernel to support custom masking for parallel verification, updates to model configurations to recognize MTP-capable architectures (e.g., LlamaForCausalLMEagle3), and the integration of new MTP logic within the scheduling and model worker components.
New EAGLE-Specific Components: New files have been added to define EAGLE-specific model layers and structures, utility functions for managing speculative decoding inputs and outputs, and Pallas kernels for efficient tree building and verification processes.
Precompilation and Memory Management: The system now includes precompilation steps for speculative decoding and refined memory pool allocation strategies to handle the multi-token sequences efficiently, including backing up and restoring allocator states.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

SiqiLi-Fighting and others added 20 commits October 23, 2025 12:25

feat/eagle-support

e4474e2

fix cache loc bug

29b9069

fix: cu_q_len and cu_kv_len error (#30)

743505a

fix: multi batch run panic (#31)

4c92553

fix: seq len bug (#32)

7d9ca03

two prompt ok

fac0b3e

fix: memory leak (#34)

b6bf41f

fix: non-greedy sample bug (#35)

2a0af12

Feat/eagle support eagle3 (#36)

46fb96c

* add llama eagle3 model file * fix padding bug * fix some padding problem * rm some debug log

Feat/eagle support eagle3 (#37)

36d236f

* qwen eagle3 * rm log

feat: implement tree_verify_greedy pallas kernel (#38)

585e1a2

feat: support eagle pallas kernel (#44)

e7de910

* feat: implement tree_verify_greedy pallas kernel * feat: implement build tree kernel for eagle

fix eagle worker refactor and padding

dbe2acd

fix build tree kernel cache miss

acea510

fix mem leak

120a919

add metrics and rm debug log, pagesize 1 ok, 64 crash

e6b05d3

fix page size 64

aec187c

fix some batch size bug

52113ae

precompile

3410a9e

complete precompile

2c76d29

SiqiLi-Fighting requested a review from jimoosciuc November 12, 2025 13:36

SiqiLi-Fighting marked this pull request as draft November 12, 2025 13:37

SiqiLi-Fighting self-assigned this Nov 12, 2025

SiqiLi-Fighting added the enhancement New feature or request label Nov 12, 2025

SiqiLi-Fighting added 3 commits November 13, 2025 15:58

fix pagesize 64 bug

9ddc750

make jax.Array to np.ndarray

4efe6d8

refactor code for compality

aca1dd9

SiqiLi-Fighting closed this Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]. Eagle3 support #364

[Feature]. Eagle3 support #364

Uh oh!

SiqiLi-Fighting commented Nov 12, 2025

Uh oh!

gemini-code-assist bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature]. Eagle3 support #364

[Feature]. Eagle3 support #364

Uh oh!

Conversation

SiqiLi-Fighting commented Nov 12, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Nov 12, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants