PAR

[CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project

Generate Convert Improve

Install / Use

/learn @YuqingWang1029/PAR

About this skill

Quality Score

0/100

README

Parallelized Autoregressive Visual Generation

</div> <img width="1195" alt="image" src="https://github.com/user-attachments/assets/54db2bdc-40f8-47e2-81a3-aa26cbbcd611" />

BibTeX

@article{wang2024parallelized,
  title={Parallelized Autoregressive Visual Generation},
  author={Wang, Yuqing and Ren, Shuhuai and Lin, Zhijie and Han, Yujin and Guo, Haoyuan and Yang, Zhenheng and Zou, Difan and Feng, Jiashi and Liu, Xihui},
  journal={arXiv preprint arXiv:2412.15119},
  year={2024}
}

Getting Started

Requirements

Linux with Python ≥ 3.7
PyTorch ≥ 2.1
A100 GPUs

We use the same environment as LLamaGen. For more details, please refer to here.

VQ-VAE models

Method | params | tokens | rFID (256x256) | weight --- |:---:|:---:|:---:|:---: vq_ds16_c2i | 72M | 16x16 | 2.19 | vq_ds16_c2i.pt

AR models

Method | params | tokens | FID (256x256) | weight --- |:---:|:---:|:---:|:---:| PAR-XL-4x | 775M | 24x24 | 2.61 | PAR-XL-4x.pt PAR-XXL-4x | 1.4B | 24x24 | 2.35 | PAR-XXL-4x.pt PAR-3B-4x | 3.1B | 24x24 | 2.29 | PAR-3B-4x.pt PAR-3B-16x | 3.1B | 24x24 | 2.88 | PAR-3B-16x.pt

Please download the above models, put them in the folder ./pretrained_models

Pre-extract discrete codes of training images

bash scripts/autoregressive/extract_codes_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --data-path /path/to/imagenet/train --code-path /path/to/imagenet_code_c2i_flip_ten_crop --ten-crop --crop-range 1.1 --image-size 384

Train AR models with DDP

Before running, please change nnodes, nproc_per_node, node_rank, master_addr, master_port in .sh. The spe-token-num and ar-token-num represent the number of learnable tokens (n-1) and the number of tokens for parallel generation (n), respectively.

bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-XL

bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-XXL

bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-3B

Sampling


bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-3B-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-3B --image-size 384 --image-size-eval 256 --cfg-scale 1.345

bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-1B-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-XXL --image-size 384 --image-size-eval 256 --cfg-scale 1.435

bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-XL-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-XL --image-size 384 --image-size-eval 256 --cfg-scale 1.5

bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-3B-16x.pt --spe-token-num 15 --ar-token-num 16 --gpt-model GPT-3B --image-size 384 --image-size-eval 256 --cfg-scale 1.5

Evaluation

Before evaluation, please refer evaluation readme to install required packages.

python3 evaluations/c2i/evaluator.py VIRTUAL_imagenet256_labeled.npz samples/GPT-XXL-PAR-XXL-4x-size-384-size-256-VQ-16-topk-0-topp-1.0-temperature-1.0-cfg-1.435-seed-0.npz

Acknowledgments

The development of PAR is based on LlamaGen. We deeply appreciate this contribution to the community.

Related Skills

node-connect

352.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

352.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

352.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。