IFSQ
iFSQ & LlamaGen-REPA
Install / Use
/learn @Tencent-Hunyuan/IFSQREADME
iFSQ: Improving FSQ for Image Generation with 1 Line of Code<br><sub>Official PyTorch Implementation</sub>
<div align="center"> </div>This repository contains the official implementation of iFSQ and LlamaGen-REPA.
<img src="assets/ifsq_bench.png" width="800" />🚀 Method: The "1 Line of Code"
The key insight is replacing the $$y = \tanh(x)$$ function in the original FSQ with a distribution-matching activation $$y = 2.0 \cdot \sigma(1.6x) - 1$$ that maps unbounded Gaussian latents to a bounded uniform distribution.
🌟 Key Contributions
- 🪐 Methodology: We propose iFSQ, a distribution-aware improvement to FSQ. We resolve the conflict between information efficiency and reconstruction fidelity.
- ⚡️ Benchmarking: We use iFSQ as a unified tokenizer to benchmark AR against diffusion models.
- 💥 Insights:
- The optimal equilibrium between discrete and continuous representations lies at approximately 4 bits per dimension.
- AR models exhibit rapid initial convergence, whereas Diffusion models achieve a superior performance ceiling.
- 🛸 Extension: We introduce LlamaGen-REPA, adapting Representation Alignment to AR models to enhance semantic alignment.
🛠️ Setup
First, download and set up the repo:
git clone https://github.com/Tencent-Hunyuan/iFSQ.git
cd iFSQ
We provide an requirements.txt file that can be used to create the environment.
conda create -n ifsq python=3.10 -y
conda activate ifsq
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
🛠️ Usage
1. iFSQ
Train
cd ifsq
bash configs/ifsq_f16_d4_4bit/run.sh
Eval
We record validation metrics during training. We also provide scripts for standalone validation.
cd ifsq
torchrun --nproc_per_node=8 \
eval_ddp.py \
--imgnet_eval_path ${IMAGENETT_VAL} \
--coco_eval_path ${COCO2017_VAL} \
--model_name ImageFSQVAE \
--ckpt_path results/ifsq_f16_d4_4bit/checkpoint-10000.ckpt \
--model_config configs/ifsq_f16_d4_4bit/run.json \
--resolution 256 \
--dataset_num_worker 8 \
--eval_batch_size 64 \
--eval_lpips \
--eval_psnr \
--eval_ssim \
--eval_fid \
--ema
2. LlamaGen-REPA
Train
Using the iFSQ trained in the previous stage.
cd llamagen
torchrun --nproc_per_node=8 \
train.py --config configs/fsq17x4_large_repa8_0p5/config.yaml
iFSQ can also use multi-codebook, where each token is represented by multiple indices. For example, each token uses 2 indices.
cd llamagen
torchrun --nproc_per_node=8 \
train.py --config configs/fsq17x4_ds16_large_repa_d8_2p0_f2x2/config.yaml
If you want to use VQ-VAE in the original LlamaGen.
cd llamagen
torchrun --nproc_per_node=8 \
train.py --config configs/large_repa_d8_2p0/config.yaml
Eval
Generate 50k images and validate using torch_fid, while producing .npz files.
cd llamagen
torchrun --nproc_per_node=8 \
inference.py --config configs/large_repa_d8_2p0/config.yaml
Alternatively, we also provide tools for validation following ADM.
# we recommend cuda12.2
conda create -n adm_eval python=3.10 -y
conda activate adm_eval
pip install tensorflow==2.15.0 scipy requests tqdm numpy==1.23.5
pip install nvidia-pyindex
pip install nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz
python tools/evaluator.py \
VIRTUAL_imagenet256_labeled.npz \
/path/to/.npz
3. DiT-REPA
Train
cd dit
accelerate launch --num_processes 8 \
train.py --config configs/fsq17x4_large_repa8_0p5/run.yaml
Eval
Generate 50k images and validate using torch_fid, while producing .npz files.
cd dit
accelerate launch --num_processes 8 \
inference.py --config configs/fsq17x4_large_repa8_0p5/run.yaml
Alternatively, we also provide tools for validation following ADM.
conda create -n adm_eval python=3.10 -y
conda activate adm_eval
# cuda12.2
pip install tensorflow==2.15.0 scipy requests tqdm numpy==1.23.5
pip install nvidia-pyindex
pip install nvidia-cublas-cu12 nvidia-cuda-cupti-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12
wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz
python tools/evaluator.py \
VIRTUAL_imagenet256_labeled.npz \
/path/to/.npz
👍 Acknowledgement
This project builds upon the excellent work of the following repositories:
- WF-VAE: used as the main template for building our training codebase.
- LightningDiT: referenced for its well-organized configuration file structure.
- LlamaGen: referenced for the original model design and evaluation pipeline.
- DiT: referenced for the original model implementation and evaluation setup.
📝 Citation
If you find this work useful for your research, please consider citing our paper:
@misc{lin2026ifsqimprovingfsqimage,
title={iFSQ: Improving FSQ for Image Generation with 1 Line of Code},
author={Bin Lin and Zongjian Li and Yuwei Niu and Kaixiong Gong and Yunyang Ge and Yunlong Lin and Mingzhe Zheng and JianWei Zhang and Miles Yang and Zhao Zhong and Liefeng Bo and Li Yuan},
year={2026},
eprint={2601.17124},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.17124},
}
🔒 License
The majority of this project is licensed under Apache 2.0 License, detailed in LICENSE.txt.
Related Skills
node-connect
342.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.7kCommit, push, and open a PR
