Lpd
[ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Install / Use
/learn @mit-han-lab/LpdREADME
Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
Paper | Model
Demo
⚡️Check out the lightning speed of LPD!
https://github.com/user-attachments/assets/0117b50b-0f25-49b1-8530-9221916ce7bc
News
[2026/03] 🔥 Update the paper with additional 1024×1024 text-to-image generation results, expanded ablation studies, and a more comprehensive efficiency analysis.
[2026/02] 🔥 LPD has been accepted to ICLR 2026 and selected as Oral!
[2025/07] We release the code and models for LPD!
Abstract
We present Locality-aware Parallel Decoding (LPD) to accelerate autoregressive image generation. Traditional autoregressive image generation relies on next-patch prediction, a memory-bound process that leads to high latency. Existing works have tried to parallelize next-patch prediction by shifting to multi-patch prediction to accelerate the process, but only achieved limited parallelization. To achieve high parallelization while maintaining generation quality, we introduce two key techniques: (1) Flexible Parallelized Autoregressive Modeling, a novel architecture that enables arbitrary generation ordering and degrees of parallelization. It uses learnable position query tokens to guide generation at target positions while ensuring mutual visibility among concurrently generated tokens for consistent parallel decoding. (2) Locality-aware Generation Ordering, a novel schedule that forms groups to minimize intra-group dependencies and maximize contextual support, enhancing generation quality. With these designs, we reduce the generation steps from 256 to 20 (256x256 res.) and 1024 to 48 (512x512 res.) without compromising quality on the ImageNet class-conditional generation, and achieving at least 3.4x lower latency than previous parallelized autoregressive models.
<p align="left"> <img src="assets/speedup.png" width="450"> </p>Preparation
Environment Setup
git clone https://github.com/mit-han-lab/lpd
cd lpd
bash environment_setup.sh lpd
Models
Download the LlamaGen tokenizer and place it in tokenizers. Download LPD models from Huggingface.
| Model | #Para. | #Steps | FID-50K | IS | Latency(s) | Throughput(img/s) | |----------------------------------------------------------------|---------|---------|---------|-----------------|-------------|-------------------| | LPD-L-256 | 337M | 20 | 2.40 | 284.5 | 0.28 | 139.11 | | LPD-XL-256 | 752M | 20 | 2.10 | 326.7 | 0.41 | 75.20 | | LPD-XXL-256 | 1.4B | 20 | 2.00 | 337.6 | 0.55 | 45.07 | | LPD-L-256 | 337M | 32 | 2.29 | 282.7 | 0.46 | 110.34 | | LPD-XL-256 | 752M | 32 | 1.92 | 319.4 | 0.66 | 61.24 | | LPD-L-512 | 337M | 48 | 2.54 | 292.2 | 0.69 | 35.16 | | LPD-XL-512 | 752M | 48 | 2.10 | 326.0 | 1.01 | 18.18 |
Dataset
If you conduct training, please download ImageNet dataset and palce it in your IMAGENET_PATH. To accelerate training, we recommend precomputing the tokenizer latents and saving them to CACHED_PATH. Please set the --img_size to either 256 or 512.
torchrun --nproc_per_node=8 --nnodes=1 \
main_cache.py \
--img_size 256 --vqgan_path tokenizers/vq_ds16_c2i.pt \
--data_path ${IMAGENET_PATH} --cached_path ${CACHED_PATH}
<!-- [Download](https://huggingface.co/datasets/Efficient-Large-Model/imagenet-llamagen-cache) the pre-cached llamagen discrete tokens for ImageNet. Then unzip:
```
tar -xvf imagenet_llamagen_cache.tar -C /your-local-path/imagenet_llamagen_cache
``` -->
Usage
Evaluation
First, generate the LPD orders. Alternatively, you may download the pre-generated orders and place them in orders/lpd_orders_generated.
bash orders/run_lpd_order.sh
Then, run the evaluation scripts located in scripts/eval. For example, to evaluate LPD-L-256 using 20 steps:
bash scripts/eval/lpd_l_res256_steps20.sh
Note: Please set --pretrained_ckpt to the path of the downloaded LPD model, and specify --output_dir.
Training
Run the training scripts located in scripts/train. For example, to train LPD-L-256:
python scripts/cli/run.py -J lpd_l_256 -p your_slurm_partition -A your_slurm_account -N 4 bash scripts/train/lpd_l_256.sh
Acknowledgements
Thanks to MAR for the wonderful open-source codebase.
We thank MIT-IBM Watson AI Lab, National Science Foundation, Hyundai, and Amazon for supporting this research.
Citation
If you find LPD useful or relevant to your project and research, please kindly cite our paper:
@article{zhang2025locality,
title={Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation},
author={Zhang, Zhuoyang and Huang, Luke J and Wu, Chengyue and Yang, Shang and Peng, Kelly and Lu, Yao and Han, Song},
journal={arXiv preprint arXiv:2507.01957},
year={2025}
}
Related Skills
qqbot-channel
351.4kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
351.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
3.1kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
