D3QE
[ICCV 2025] D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
Install / Use
/learn @Zhangyr2022/D3QEREADME
[ICCV 2025] D³QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
<div align='center' style='margin-bottom:20px'> <a href='http://arxiv.org/abs/2510.05891'><img src='https://img.shields.io/badge/ArXiv-red?logo=arxiv'></a> <a href='https://ivg-yanranzhang.github.io/D3QE/'><img src='https://img.shields.io/badge/Visualization-green?logo=github'></a> <a href="https://github.com/Zhangyr2022/D3QE"><img src="https://img.shields.io/badge/Code-9E95B7?logo=github"></a> </div> <div align='center'> Dataset <b>ARForensics</b> is available at: <a href='https://huggingface.co/datasets/Yanran21/ARForensics'>[🤗 HuggingFace]</a> | <a href='https://www.modelscope.cn/datasets/YanranZhang/ARForensics'>[🤖 ModelScope]</a> </div> <div align=center> <img src='assets\image.png' width=320 height=320> </div>Created by Yanran Zhang, Bingyao Yu, Yu Zheng, Wenzhao Zheng, Yueqi Duan, Lei Chen, Jie Zhou, Jiwen Lu
🚨 Urgent Update!
The original test set in <a href='https://huggingface.co/datasets/Yanran21/ARForensics'>[🤗 HuggingFace]</a> , when unpacked, contained the following subfolders:
"Infinity", "Janus_Pro", "RAR", "MAR", "VAR", "LlamaGen", "Open_MAGVIT2",
which does not match the dataset used in our paper.
We have now replaced the "MAR" samples with "Switti" samples.
Please re-download the dataset and unpack it. The corrected test set now contains:
"Infinity", "Janus_Pro", "RAR", "Switti", "VAR", "LlamaGen", "Open_MAGVIT2".
Table of Contents
- Introduction
- News 🔥
- Quick Start
- Setup 🔧
- Dataset
- Training
- Evaluation
- Pretrained Models
- Acknowledgments
- Citation
- Contact
Introduction
D³QE is a detection method designed to identify images generated by visual autoregressive (AR) models. The core idea is to exploit discrete distribution discrepancies and quantization error patterns produced by tokenized autoregressive generation. Key highlights:
- Integrates dynamic codebook frequency statistics into a transformer attention module.
- Fuses semantic image features with latent representations of quantization/quantizer error.
- Demonstrates strong detection accuracy, cross-model generalization, and robustness to common real-world perturbations.
This repo contains the code, dataset, and scripts used in the paper to facilitate reproducible experiments.
<div align=center> <img src='assets\pipeline.png' width=600 height=300> </div>News 🔥
- 🆕 2025-10-09 — Our code is released.
- 🆕 2025-10-08 — arXiv preprint released.
- 🆕 2025-07-23 — Accepted to ICCV 2025!🔥
Quick Start
- Clone the repository:
git clone https://github.com/Zhangyr2022/D3QE
cd D3QE
- Create the environment and install dependencies:
conda create -n D3QE python=3.11 -y
conda activate D3QE
pip install -r requirements.txt
# If you have GPU(s), ensure CUDA and PyTorch are installed correctly for your environment.
-
Download the dataset (see Dataset below) and place it under
./data/ARForensics(or a path you prefer). Download the pretrained LlamaGen vqvae model vq_ds16_c2i.pt from LlamaGen and place it under./pretrained. -
Train a model:
bash train.sh
- Evaluate:
bash eval.sh
Dataset
<div align=center> <img src='assets\dataset.png' width=600 height=350> </div>We provide the ARForensics benchmark — the first large-scale dataset specifically for visual autoregressive model detection. 7 Autoregressive models included (diverse token/scale architectures): LlamaGen, VAR, Infinity, Janus-Pro, RAR, Switti, and Open-MAGVIT2.
Splits:
- Training: 100k LlamaGen images + 100k ImageNet images
- Validation: 10k LlamaGen images + 10k ImageNet images
- Test: balanced test set with 6k samples per model
Download: The dataset ARForensics is uploaded and available at: 🤗 HuggingFace | 🤖 ModelScope.
Folder structure (expected):
ARForensics/
├─ train/
│ ├─ 0_real/
│ └─ 1_fake/
├─ val/
│ ├─ 0_real/
│ └─ 1_fake/
└─ test/
├─ Infinity/
│ ├─ 0_real/
│ └─ 1_fake/
├─ Janus_Pro/
│ ├─ ..
├─ RAR/
├─ Switti/
├─ VAR/
├─ LlamaGen/
└─ Open_MAGVIT2/
Training
A provided training script train.sh wraps the typical training pipeline. You can tweak the hyper-parameters directly in the script or by editing the training config file used by the codebase. We train the model on a single GPU by default. (24GB GPU memory recommended)
Example:
bash train.sh
# or run the training entrypoint directly, e.g.
python train.py \
--name D3QE_rerun \
--dataroot /path/to/your/dataset \
--detect_method D3QE \
--blur_prob 0.1 \
--blur_sig 0.0,3.0 \
--jpg_prob 0.1 \
--jpg_method cv2,pil \
--jpg_qual 30,100 \
Evaluation
eval.py exposes many options to evaluate detection performance and robustness.
usage: eval.py [-h] [--rz_interp RZ_INTERP] [--batch_size BATCH_SIZE]
[--loadSize LOADSIZE] [--CropSize CROPSIZE] [--no_crop]
[--no_resize] [--no_flip] [--robust_all]
[--detect_method DETECT_METHOD] [--dataroot DATAROOT]
[--sub_dir SUB_DIR] [--model_path MODEL_PATH]
Key flags:
--batch_size(default: 64)--loadSize/--CropSizefor image preprocessing (defaults: 256 / 224)--robust_allto evaluate model robustness across different noises/attacks--sub_dirlist of subfolders in the test set (defaults to the 7 AR models)--model_pathpath to your trained model checkpoint
Example (evaluate D³QE):
There's an eval.sh with default settings you can adapt.
bash eval.sh
# or run evaluation directly
python eval.py \
--model_path /your/model/path \
--detect_method D3QE \
--batch_size 1 \
--dataroot /path/to/your/testset \
--sub_dir '["Infinity","Janus_Pro","RAR","Switti","VAR","LlamaGen","Open_MAGVIT2"]'
Pretrained Models
Pretrained model checkpoints are uploaded at: 🤗 Hugging Face
Acknowledgments
This codebase builds on and borrows design patterns from:
Thanks to the authors of those projects for making their code and models available.
Citation
If you use this repository or dataset in your research, please cite our paper:
@inproceedings{zhang2025d3qe,
title={D3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection},
author={Zhang, Yanran and Yu, Bingyao and Zheng, Yu and Zheng, Wenzhao and Duan, Yueqi and Chen, Lei and Zhou, Jie and Lu, Jiwen},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={16292--16301},
year={2025}
}
Contact
For questions, issues, or reproducibility requests, please open an issue on this repository or contact the authors (PRs and issues are welcome), or reach out to: zhangyr21@mails.tsinghua.edu.cn
