SkillAgentSearch skills...

EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Install / Use

/learn @SafeAILab/EAGLE
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<img src="figs/logo.png" alt="EAGLE" width="220" align="left"><div align="center"><h1> EAGLE</h1></div>

<p align="center"> | <a href="https://arxiv.org/pdf/2401.15077.pdf"><b>Paper (EAGLE)</b></a> | <a href="https://arxiv.org/pdf/2406.16858"><b>Paper (EAGLE-2)</b></a> | <a href="https://arxiv.org/pdf/2503.01840"><b>Paper (EAGLE-3)</b></a> | <a href="https://sites.google.com/view/ eagle-llm"><b>Blog</b></a> | </p> <p align="center"> <a href=""> <img src="https://img.shields.io/badge/Version-v3.0.0-orange.svg" alt="Version"> </a> <a href="https://opensource.org/licenses/Apache-2.0"> <img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License"> </a> <a href="https://github.com/SafeAILab/EAGLE/issues"> <img src="https://img.shields.io/badge/Maintained%3F-yes-green.svg" alt="Maintenance"> </a> <a href="https://github.com/SafeAILab/EAGLE/pulls"> <img src="https://img.shields.io/badge/Contributions-welcome-brightgreen.svg?style=flat" alt="Contributions welcome"> </a> </p>

<p align="center"> <img src="./figs/eagle3r.jpg" alt="benchmark" width="790"> </p>

EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency) is a new baseline for fast decoding of Large Language Models (LLMs) with provable performance maintenance. This approach involves extrapolating the second-top-layer contextual feature vectors of LLMs, enabling a significant boost in generation efficiency.

  • EAGLE is:
    • certified by the <a href="https://github.com/hemingkx/Spec-Bench/blob/main/Leaderboard.md"><b>third-party</b></a> evaluation as the fastest speculative method so far.
    • achieving 2x speedup on <a href="https://github.com/pytorch-labs/gpt-fast"><b>gpt-fast</b></a>.
    • 3x faster than vanilla decoding (13B).
    • 2x faster than <a href="https://lmsys.org/blog/2023-11-21-lookahead-decoding/"><b>Lookahead</b></a> (13B).
    • 1.6x faster than <a href="https://sites.google.com/view/medusa-llm"><b>Medusa</b></a> (13B).
    • provably maintaining the consistency with vanilla decoding in the distribution of generated texts.
    • trainable (within 1-2 days) and testable on 8x RTX 3090 GPUs. So even the GPU poor can afford it.
    • combinable with other parallelled techniques such as vLLM, DeepSpeed, Mamba, FlashAttention, quantization, and hardware optimization.

EAGLE-2 uses the confidence scores from the draft model to approximate acceptance rates, dynamically adjusting the draft tree structure, which further enhances performance.

  • EAGLE-2 is:
    • 4x faster than vanilla decoding (13B).
    • 1.4x faster than EAGLE-1 (13B).

EAGLE-3 removes the feature prediction constraint in EAGLE and simulates this process during training using training-time testing. Considering that top-layer features are limited to next-token prediction, EAGLE-3 replaces them with a fusion of low-, mid-, and high-level semantic features. EAGLE-3 further improves generation speed while ensuring lossless performance.

  • EAGLE-3 is:
    • 5.6 faster than vanilla decoding (13B).
    • 1.8x faster than EAGLE-1 (13B).
<p align="center"> <img src="./figs/e3.gif" alt="demogif" width="600"> </p>

Inference is conducted on 2x RTX 3090 GPUs at fp16 precision using the Vicuna 13B model.

Support

EAGLE has been merged in the following mainstream LLM serving frameworks (listed in alphabetical order).

  • <a href="https://rocm.blogs.amd.com/software-tools-optimization/mtp/README.html">AMD ROCm</a>
  • <a href="https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/eagle.html">AngelSlim</a>
  • <a href="https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html#eagle-speculative-decoding">AWS NeuronX Distributed Core</a>
  • <a href="https://github.com/OpenBMB/CPM.cu">CPM.cu</a>
  • <a href="https://github.com/intel/intel-extension-for-transformers/pull/1504">Intel® Extension for Transformers</a>
  • <a href="https://github.com/intel-analytics/ipex-llm/pull/11104">Intel® LLM Library for PyTorch</a>
  • <a href="https://llm.mlc.ai/docs/deploy/rest.html">MLC-LLM</a>
  • <a href="https://docs.nvidia.com/nemo-framework/user-guide/latest/model-optimization/speculative/speculative.html">NVIDIA NeMo Framework</a>
  • <a href="https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/eagle">NVIDIA TensorRT-LLM</a>
  • <a href="https://nvidia.github.io/TensorRT-Model-Optimizer/guides/7_speculative_decoding.html">NVIDIA TensorRT Model Optimizer</a>
  • <a href="https://paddlenlp.readthedocs.io/en/latest/llm/docs/predict/speculative_decoding.html">PaddleNLP</a>
  • <a href="https://docs.sglang.ai/advanced_features/speculative_decoding.html">SGLang</a>
  • <a href="https://github.com/sgl-project/SpecForge">SpecForge</a>
  • <a href="https://github.com/vllm-project/speculators">speculators</a>
  • <a href="https://github.com/vllm-project/vllm/pull/16937">vLLM</a>

Update

2025.9.18: EAGLE-3 is accepted to NeurIPS'25.

2025.7.23: We strongly recommend using SpecForge for out-of-the-box training of EAGLE-3 with SGLang.

2025.3.19: EAGLE-3 is released.

2024.8.8: We now support Qwen-2.

2024.6.27: EAGLE-2 is released.

2024.2.25: EAGLE is certified by the <a href="https://github.com/hemingkx/Spec-Bench/blob/main/Leaderboard.md">third-party</a> evaluation as the fastest speculative method.

2024.1.17: We now support Mixtral-8x7B-Instruct.

2023.12.8: EAGLE v1.0 is released.

Todo

  • [x] Support non-greedy inference (provably maintaining text distribution).
  • [x] Support more LLMs such as Mixtral 8x7B.
  • [x] Support LLaMA-3.
  • [x] Support Qwen-2.
  • [x] Support vLLM (please check <a href="https://github.com/vllm-project/vllm/pull/6830">vLLM</a>'s implementation).
  • [x] EAGLE-3.
  • [x] Training code of EAGLE-3.
  • [x] Support LLaMA-4.
  • [ ] Support official EAGLE-3 for Qwen-3.
  • [ ] EAGLE-4.

The default main branch is the implementation of EAGLE-3 and EAGLE-2. For using EAGLE-1, please switch to the v1 branch.

Contents

Setup & Installation

git clone https://github.com/SafeAILab/EAGLE.git
cd EAGLE
python -m venv ~/venvs/ea_env
source ~/venvs/ea_env/bin/activate
pip install -r requirements.txt

EAGLE-3 Weights

Note: This repository recognizes only official EAGLE-3 checkpoints. Performance of unofficial checkpoints may vary. If you want to compare with EAGLE-3, please compare with official checkpoints and official draft tree setups.

EAGLE-3 Models on Hugging Face

| Base Model | EAGLE-3 Model(s) | Official | |-----------|-----------------|----------| | Vicuna-13B v1.3<br>lmsys/vicuna-13b-v1.3 | yuhuili/EAGLE3-Vicuna1.3-13B | Yes | | LLaMA-3.1-8B-Instruct<br>meta-llama/Llama-3.1-8B-Instruct | yuhuili/EAGLE3-LLaMA3.1-Instruct-8B | Yes | | LLaMA-3.3-70B-Instruct<br>meta-llama/Llama-3.3-70B-Instruct | yuhuili/EAGLE3-LLaMA3.3-Instruct-70B | Yes | | DeepSeek-R1-Distill-LLaMA-8B<br>deepseek-ai/DeepSeek-R1-Distill-Llama-8B | yuhuili/EAGLE3-DeepSeek-R1-Distill-LLaMA-8B | Yes | | LLaMA-4-Scout-17B-16E-Instruct<br>meta-llama/Llama-4-Scout-17B-16E-Instruct | lmsys/sglang-EAGLE3-Llama-4-Scout-17B-16E-Instruct-v1 | No | | LLaMA-4-Maverick-17B-128E-Instruct<br>meta-llama/Llama-4-Maverick-17B-128E-Instruct | lmsys/sglang-EAGLE3-Llama-4-Maverick-17B-128E-Instruct-v1<br>nvidia/Llama-4-Maverick-17B-128E-Eagle3 | No | | Qwen3-1.7B<br>Qwen/Qwen3-1.7B | AngelSlim/Qwen3-1.7B_eagle3 | No | | Qwen3-4B<br>Qwen/Qwen3-4B | AngelSlim/Qwen3-4B_eagle3 | No | | Qwen3-8B<br>Qwen/Qwen3-8B | Tengyunw/qwen3_8b_eagle3<br>AngelSlim/Qwen3-8B_eagle3<br>Zjcxy-SmartAI/Eagle3-Qwen3-8B-zh | No | | Qwen3-14B<br>Qwen/Qwen3-14B | AngelSlim/Qwen3-14B_eagle3 | No | | Qwen3-30B-A3B<br>Qwen/Qwen3-30B-A3B | Tengyunw/qwen3_30b_moe_eagle3<br>[AngelSlim/Qwen3-a3B_eagle3](https://huggingface.co/AngelSlim/Qwen

Related Skills

View on GitHub
GitHub Stars2.2k
CategoryProduct
Updated1d ago
Forks264

Languages

Python

Security Score

85/100

Audited on Mar 27, 2026

No findings