SkillAgentSearch skills...

PAR

[CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project

Install / Use

/learn @YuqingWang1029/PAR
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

Parallelized Autoregressive Visual Generation

<div align="center">

arXiv  project page 

</div> <img width="1195" alt="image" src="https://github.com/user-attachments/assets/54db2bdc-40f8-47e2-81a3-aa26cbbcd611" />

BibTeX

@article{wang2024parallelized,
  title={Parallelized Autoregressive Visual Generation},
  author={Wang, Yuqing and Ren, Shuhuai and Lin, Zhijie and Han, Yujin and Guo, Haoyuan and Yang, Zhenheng and Zou, Difan and Feng, Jiashi and Liu, Xihui},
  journal={arXiv preprint arXiv:2412.15119},
  year={2024}
}

Getting Started

Requirements

  • Linux with Python ≥ 3.7
  • PyTorch ≥ 2.1
  • A100 GPUs

We use the same environment as LLamaGen. For more details, please refer to here.

VQ-VAE models

Method | params | tokens | rFID (256x256) | weight --- |:---:|:---:|:---:|:---: vq_ds16_c2i | 72M | 16x16 | 2.19 | vq_ds16_c2i.pt

AR models

Method | params | tokens | FID (256x256) | weight --- |:---:|:---:|:---:|:---:| PAR-XL-4x | 775M | 24x24 | 2.61 | PAR-XL-4x.pt PAR-XXL-4x | 1.4B | 24x24 | 2.35 | PAR-XXL-4x.pt PAR-3B-4x | 3.1B | 24x24 | 2.29 | PAR-3B-4x.pt PAR-3B-16x | 3.1B | 24x24 | 2.88 | PAR-3B-16x.pt

Please download the above models, put them in the folder ./pretrained_models

Pre-extract discrete codes of training images

bash scripts/autoregressive/extract_codes_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --data-path /path/to/imagenet/train --code-path /path/to/imagenet_code_c2i_flip_ten_crop --ten-crop --crop-range 1.1 --image-size 384

Train AR models with DDP

Before running, please change nnodes, nproc_per_node, node_rank, master_addr, master_port in .sh. The spe-token-num and ar-token-num represent the number of learnable tokens (n-1) and the number of tokens for parallel generation (n), respectively.

bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-XL

bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-XXL

bash scripts/autoregressive/train_c2i.sh --cloud-save-path /path/to/cloud_disk --code-path /path/to/imagenet_code_c2i_flip_ten_crop --spe-token-num 3 --ar-token-num 4 --image-size 384 --gpt-model GPT-3B

Sampling


bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-3B-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-3B --image-size 384 --image-size-eval 256 --cfg-scale 1.345

bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-1B-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-XXL --image-size 384 --image-size-eval 256 --cfg-scale 1.435

bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-XL-4x.pt --spe-token-num 3 --ar-token-num 4 --gpt-model GPT-XL --image-size 384 --image-size-eval 256 --cfg-scale 1.5

bash scripts/autoregressive/sample_c2i.sh --vq-ckpt ./pretrained_models/vq_ds16_c2i.pt --gpt-ckpt ./pretrained_models/PAR-3B-16x.pt --spe-token-num 15 --ar-token-num 16 --gpt-model GPT-3B --image-size 384 --image-size-eval 256 --cfg-scale 1.5

Evaluation

Before evaluation, please refer evaluation readme to install required packages.

python3 evaluations/c2i/evaluator.py VIRTUAL_imagenet256_labeled.npz samples/GPT-XXL-PAR-XXL-4x-size-384-size-256-VQ-16-topk-0-topp-1.0-temperature-1.0-cfg-1.435-seed-0.npz

Acknowledgments

The development of PAR is based on LlamaGen. We deeply appreciate this contribution to the community.

Related Skills

View on GitHub
GitHub Stars183
CategoryDevelopment
Updated19d ago
Forks3

Languages

Python

Security Score

80/100

Audited on Mar 20, 2026

No findings