AiM
Official PyTorch Implementation of "Scalable Autoregressive Image Generation with Mamba"
Install / Use
/learn @hp-l33/AiMREADME
AiM: Scalable Autoregressive Image Generation with Mamba🐍
<div align="center"> </div> <p align="center"> <img src="figure/title.png" width=95%> <p> <p align="center" style="font-size: larger;"> <a href="https://arxiv.org/abs/2408.12245">Scalable Autoregressive Image Generation with Mamba</a> </p>💡 What is AiM
The first (as far as we know) Mamba 🐍 based autoregressive image generation model, offering competitive generation quality 💪 with diffusion models and faster inference speed ⚡️.
We also propose a more general form of adaLN, called adaLN-group, which balances parameter count and performance ⚖️. Notably, adaLN-group can be flexibly converted to adaLN and adaLN-single equivalently.
🔔 Update
- [2024-08-27] Improved HF integration, now supports
from_pretrainedfor direct model loading. - [2024-08-23] A minor bug in
train_stage2.pyhas been fixed. - [2024-08-23] Code and Model Release.
🚀 Getting Started
Train
Training AiM-B on 16 A800 GPUs takes approximately 16 hours. We also provide wandb logs for reference.
accelerate launch --num_processes=32 --num_machines=... --main_process_ip=... --main_process_port=... --machine_rank=... train_stage2.py --aim-model AiM-XL --dataset /your/data/path/ --vq-ckpt /your/ckpt/path/vq_f16.pt --batch-size 64 --lr 8e-4 --epochs 350
Inference
You can play with AiM in the or:
from aim import AiM
model = AiM.from_pretrained("hp-l33/aim-xlarge").cuda()
model.eval()
imgs = model.generate(batch=8, temperature=1, top_p=0.98, top_k=600, cfg_scale=5)
To reproduce the gFID of AiM, you can use the evaluation script of LlamaGen and set: temperature=1, top_p=1.0, top_k=0, cfg_scale=2.0 for AiM-B, cfg_scale=1.75 for AiM-L or AiM-XL
PS: The first time Mamba runs, it will invoke the triton compiler and autotune, so it may be slow. From the second run onwards, the inference speed will be very fast. See: https://github.com/state-spaces/mamba/issues/389#issuecomment-2171755306
🤗 Model Zoo
The model weights can be downloaded from the .
Model | params | FID | weight --- |:---:|:---:|:---:| AiM-B | 148M | 3.52 | aim-base AiM-L | 350M | 2.83 | aim-large AiM-XL | 763M | 2.56 | aim-xlarge
🌹 Acknowledgments
This project would not have been possible without the computational resources provided by Professor Guoqi Li and his team. We would also like to thank the following repositories and papers for their inspiration: VQGAN, Mamba, LlamaGen, VAR, DiT
📖 BibTeX
@misc{li2024scalableautoregressiveimagegeneration,
title={Scalable Autoregressive Image Generation with Mamba},
author={Haopeng Li and Jinyue Yang and Kexin Wang and Xuerui Qiu and Yuhong Chou and Xin Li and Guoqi Li},
year={2024},
eprint={2408.12245},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.12245},
}
Related Skills
qqbot-channel
352.2kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
352.2kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
3.1kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
