LoPA
LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding
Install / Use
/learn @SJTU-DENG-Lab/LoPAREADME
https://github.com/user-attachments/assets/6fb2c8e9-23f9-4025-bda3-14ee7b839c9b
Lookahead Parallel Decoding (LoPA) is a training-free, plug-and-play algorithm designed to break the parallelism bottleneck in Diffusion Large Language Models (dLLMs). By identifying that parallelism is highly sensitive to the Token Filling Order (TFO), LoPA actively searches for optimal TFOs to maximize future confidence.
Key features of LoPA include:
- Massive Speedup: Increases the Tokens Per Forward pass (TPF) of D2F-Dream to 10.1 on GSM8K and D2F-DiffuCoder to 8.3 on HumanEval+.
- High Throughput: Achieves a single-sample throughput of 1073.9 tokens/s under multi-GPU deployment using a specialized Branch Parallel (BP) inference system.
- Training-Free: Works out-of-the-box with existing confidence-driven dLLMs (like D2F and Dream) without requiring weight updates.
<small style="color: gray;">Figure 1. Throughput performance of LoPA under guaranteed inference speed. LoPA accelerates the single-sample throughput for D2F-Dream to up to 1073.9 and 856.5 tokens/s on MBPP and GSM8K respectively, significantly outperforming baselines.</small>
</p>🔥 News
- Dec 22, 2025: We released the code and paper for LoPA-Dist-NV!
- Dec 18, 2025: We released the code and paper for LoPA!
- Dec 2025: LoPA achieves >1000 tokens/s on Ascend 910C hardware.
🔮 Future Works
-
Diffulex: We are working on a new inference framework for dLLMs, which is flexible and easy to extend. Diffulex supports multiple decoding strategies including D2F, BlockDiffusion, and Fast-dLLM-v2, which is soon to be released. You can find the code here.
-
LoPA-SDAR: We will explore adapting LoPA to SDAR and other confidence-driven diffusion language models to further demonstrate its generalizability and effectiveness across diverse model architectures.
Contents
🤔 How It Works
Standard dLLM decoding greedily fills tokens with the highest current confidence, which often leads to suboptimal paths that restrict future parallelism. LoPA solves this by "looking ahead":
- Anchor Branch: Maintains the standard confidence-driven path.
- Lookahead Branches: Spawns parallel branches exploring alternative high-confidence Token Filling Orders (TFOs).
- Parallel Verification: Verifies all branches in a single forward pass and selects the one with the highest Branch Confidence (potential for future parallelism).
<small style="color: gray;">Figure 2. Overview of Lookahead Parallel Decoding (LoPA). In each iteration, LoPA generates a anchor branch alongside multiple lookahead branches by independently sampling high-confidence positions. A branch confidence verification mechanism then evaluates all branches in parallel to select the optimal path.</small>
</p>📊 Performance Highlights
LoPA demonstrates significant improvements in Tokens Per Forward pass (TPF) and overall throughput across mathematical reasoning and code generation tasks. It establishes a clear, controllable speed-accuracy trade-off.
<p align="center"> <img src="docs/assets/img/figure4.png" width="100%" alt="Scaling Curves"><small style="color: gray;">Figure 3. Scaling Curves of LoPA. LoPA scales the TPF for D2F-Dream and D2F-DiffuCoder to up to 10.1 and 8.3 on GSM8k and HumanEval+ respectively, with comparable performance.</small>
</p> <p align="center"> <img src="docs/assets/img/figure2.png" width="100%" alt="Scaling Analysis"><small style="color: gray;">Figure 4. Scaling analysis of LoPA on D2F-Dream with varying branch counts. The results illustrate that LoPA effectively scales the TPF of D2F to a peak exceeding 10, thereby significantly reducing the total number of decoding steps.</small>
</p>Accuracy-Preserving Parallelism
<div align="center"> <strong>Table 1. Accuracy-preserving parallelism scaling of Dream on multiple benchmarks.</strong> <table style="width:100%; text-align: center; border-collapse: collapse;"> <thead> <tr style="background-color: #f2f2f2;"> <th rowspan="2" style="border: 1px solid #ddd; padding: 8px;">Model</th> <th rowspan="2" style="border: 1px solid #ddd; padding: 8px;">Decoding algo</th> <th colspan="2" style="border: 1px solid #ddd; padding: 8px;">MBPP 3-shot</th> <th colspan="2" style="border: 1px solid #ddd; padding: 8px;">Math 4-shot</th> <th colspan="2" style="border: 1px solid #ddd; padding: 8px;">HumanEval 0-shot</th> <th colspan="2" style="border: 1px solid #ddd; padding: 8px;">GSM8K 4-shot</th> </tr> <tr style="background-color: #f2f2f2;"> <th style="border: 1px solid #ddd; padding: 8px;">TPF</th> <th style="border: 1px solid #ddd; padding: 8px;">Score</th> <th style="border: 1px solid #ddd; padding: 8px;">TPF</th> <th style="border: 1px solid #ddd; padding: 8px;">Score</th> <th style="border: 1px solid #ddd; padding: 8px;">TPF</th> <th style="border: 1px solid #ddd; padding: 8px;">Score</th> <th style="border: 1px solid #ddd; padding: 8px;">TPF</th> <th style="border: 1px solid #ddd; padding: 8px;">Score</th> </tr> </thead> <tbody> <tr> <td style="border: 1px solid #ddd; padding: 8px;">Dream</td> <td style="border: 1px solid #ddd; padding: 8px;">Vanilla</td> <td style="border: 1px solid #ddd; padding: 8px;">1.0</td> <td style="border: 1px solid #ddd; padding: 8px;"><b>56.2</b></td> <td style="border: 1px solid #ddd; padding: 8px;">1.0</td> <td style="border: 1px solid #ddd; padding: 8px;">33.7</td> <td style="border: 1px solid #ddd; padding: 8px;">1.0</td> <td style="border: 1px solid #ddd; padding: 8px;">55.5</td> <td style="border: 1px solid #ddd; padding: 8px;">1.0</td> <td style="border: 1px solid #ddd; padding: 8px;">72.6</td> </tr> <tr> <td style="border: 1px solid #ddd; padding: 8px;">Dream</td> <td style="border: 1px solid #ddd; padding: 8px;">Fast-dLLM</td> <td style="border: 1px solid #ddd; padding: 8px;">1.9</td> <td style="border: 1px solid #ddd; padding: 8px;">55.6</td> <td style="border: 1px solid #ddd; padding: 8px;">1.9</td> <td style="border: 1px solid #ddd; padding: 8px;"><b>37.6</b></td> <td style="border: 1px solid #ddd; padding: 8px;">1.8</td> <td style="border: 1px solid #ddd; padding: 8px;">55.5</td> <td style="border: 1px solid #ddd; padding: 8px;">2.1</td> <td style="border: 1px solid #ddd; padding: 8px;">72.6</td> </tr> <tr> <td style="border: 1px solid #ddd; padding: 8px;">Dream</td> <td style="border: 1px solid #ddd; padding: 8px;">LoPA</td> <td style="border: 1px solid #ddd; padding: 8px;">3.3</td> <td style="border: 1px solid #ddd; padding: 8px;">54.8</td> <td style="border: 1px solid #ddd; padding: 8px;">3.4</td> <td style="border: 1px solid #ddd; padding: 8px;">37.0</td> <td style="border: 1px solid #ddd; padding: 8px;">2.9</td> <td style="border: 1px solid #ddd; padding: 8px;">53.0</td> <td style="border: 1px solid #ddd; padding: 8px;">3.1</td> <td style="border: 1px solid #ddd; padding: 8px;">73.3</td> </tr> <tr style="background-color: #fafafa;"> <td style="border: 1px solid #ddd; padding: 8px;">D2F-Dream</td> <td style="border: 1px solid #ddd; padding: 8px;">Vanilla</td> <td style="border: 1px solid #ddd; padding: 8px;">2.3</td> <td style="border: 1px solid #ddd; padding: 8px;">53.8</td> <td style="border: 1px solid #ddd; padding: 8px;">2.6</td> <td style="border: 1px solid #ddd; padding: 8px;">36.8</td> <td style="border: 1px solid #ddd; padding: 8px;">2.5</td> <td style="border: 1px solid #ddd; padding: 8px;"><b>56.1</b></td> <td style="border: 1px solid #ddd; padding: 8px;">3.1</td> <td style="border: 1px solid #ddd; padding: 8px;"><b>78.5</b></td> </tr> <tr style="background-color: #e6f7ff;"> <td style="border: 1px solid #ddd; padding: 8px;">D2F-Dream</td> <td style="border: 1px solid #ddd; padding: 8px;">LoPA (Ours)</td> <td style="border: 1px solid #ddd; padding: 8px;"><b>5.4</b></td> <td style="border: 1px solid #ddd; padding: 8px;">56.0</td> <td style="border: 1px solid #ddd; padding: 8px;"><b>8.0</b></td> <td style="border: 1px solid #ddd; padding: 8px;">35.2</td> <td style="border: 1px solid #ddd; padding: 8px;"><b>6.3</b></td> <td style="border: 1px solid #ddd; padding: 8px;"><b>56.1</b></td> <td style="border: 1px solid #ddd; padding: 8px;"><b>10.1</b></td> <td style="border: 1px solid #ddd; padding: 8px;">73.8</td> </tr> </tbody> </table> </div> <div align="center"> <strong>Table 2. Accuracy-preserving parallelism scaling of DiffuCoder.</strong> <table style="width:100%; text-align: center; border-collapse: collapse;"> <thead> <tr style="background-color: #f2f2f2;"> <th rowspan="2" style="border: 1px solid #ddd; pRelated Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
