[ICCV 2025] D³QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

<div align='center' style='margin-bottom:20px'> <a href='http://arxiv.org/abs/2510.05891'><img src='https://img.shields.io/badge/ArXiv-red?logo=arxiv'></a>   <a href='https://ivg-yanranzhang.github.io/D3QE/'><img src='https://img.shields.io/badge/Visualization-green?logo=github'></a>   <a href="https://github.com/Zhangyr2022/D3QE"><img src="https://img.shields.io/badge/Code-9E95B7?logo=github"></a> </div> <div align='center'> Dataset <b>ARForensics</b> is available at: <a href='https://huggingface.co/datasets/Yanran21/ARForensics'>[🤗 HuggingFace]</a> | <a href='https://www.modelscope.cn/datasets/YanranZhang/ARForensics'>[🤖 ModelScope]</a> </div> <div align=center> <img src='assets\image.png' width=320 height=320> </div>

Created by Yanran Zhang, Bingyao Yu, Yu Zheng, Wenzhao Zheng, Yueqi Duan, Lei Chen, Jie Zhou, Jiwen Lu

🚨 Urgent Update!
The original test set in <a href='https://huggingface.co/datasets/Yanran21/ARForensics'>[🤗 HuggingFace]</a> , when unpacked, contained the following subfolders:
"Infinity", "Janus_Pro", "RAR", "MAR", "VAR", "LlamaGen", "Open_MAGVIT2",
which does not match the dataset used in our paper.

We have now replaced the "MAR" samples with "Switti" samples.
Please re-download the dataset and unpack it. The corrected test set now contains:
"Infinity", "Janus_Pro", "RAR", "Switti", "VAR", "LlamaGen", "Open_MAGVIT2".

Introduction
News 🔥
Quick Start
Setup 🔧
Dataset
Training
Evaluation
Pretrained Models
Acknowledgments
Citation
Contact

Introduction

D³QE is a detection method designed to identify images generated by visual autoregressive (AR) models. The core idea is to exploit discrete distribution discrepancies and quantization error patterns produced by tokenized autoregressive generation. Key highlights:

Integrates dynamic codebook frequency statistics into a transformer attention module.
Fuses semantic image features with latent representations of quantization/quantizer error.
Demonstrates strong detection accuracy, cross-model generalization, and robustness to common real-world perturbations.

This repo contains the code, dataset, and scripts used in the paper to facilitate reproducible experiments.

News 🔥

🆕 2025-10-09 — Our code is released.
🆕 2025-10-08 — arXiv preprint released.
🆕 2025-07-23 — Accepted to ICCV 2025!🔥

Quick Start

Clone the repository:

git clone https://github.com/Zhangyr2022/D3QE
cd D3QE

Create the environment and install dependencies:

conda create -n D3QE python=3.11 -y
conda activate D3QE
pip install -r requirements.txt
# If you have GPU(s), ensure CUDA and PyTorch are installed correctly for your environment.

Download the dataset (see Dataset below) and place it under ./data/ARForensics (or a path you prefer). Download the pretrained LlamaGen vqvae model vq_ds16_c2i.pt from LlamaGen and place it under ./pretrained.
Train a model:

bash train.sh

Evaluate:

bash eval.sh

Dataset

We provide the ARForensics benchmark — the first large-scale dataset specifically for visual autoregressive model detection. 7 Autoregressive models included (diverse token/scale architectures): LlamaGen, VAR, Infinity, Janus-Pro, RAR, Switti, and Open-MAGVIT2.

Splits:

Training: 100k LlamaGen images + 100k ImageNet images
Validation: 10k LlamaGen images + 10k ImageNet images
Test: balanced test set with 6k samples per model

Download: The dataset ARForensics is uploaded and available at: 🤗 HuggingFace | 🤖 ModelScope.

Folder structure (expected):

ARForensics/
├─ train/
│  ├─ 0_real/
│  └─ 1_fake/
├─ val/
│  ├─ 0_real/
│  └─ 1_fake/
└─ test/
   ├─ Infinity/
   │  ├─ 0_real/
   │  └─ 1_fake/
   ├─ Janus_Pro/
   │  ├─ ..
   ├─ RAR/
   ├─ Switti/
   ├─ VAR/
   ├─ LlamaGen/
   └─ Open_MAGVIT2/

Training

A provided training script train.sh wraps the typical training pipeline. You can tweak the hyper-parameters directly in the script or by editing the training config file used by the codebase. We train the model on a single GPU by default. (24GB GPU memory recommended)

Example:

bash train.sh
# or run the training entrypoint directly, e.g.
python train.py \
    --name D3QE_rerun \
    --dataroot /path/to/your/dataset \
    --detect_method D3QE \
    --blur_prob 0.1 \
    --blur_sig 0.0,3.0 \
    --jpg_prob 0.1 \
    --jpg_method cv2,pil \
    --jpg_qual 30,100 \

Evaluation

eval.py exposes many options to evaluate detection performance and robustness.

usage: eval.py [-h] [--rz_interp RZ_INTERP] [--batch_size BATCH_SIZE]
               [--loadSize LOADSIZE] [--CropSize CROPSIZE] [--no_crop]
               [--no_resize] [--no_flip] [--robust_all]
               [--detect_method DETECT_METHOD] [--dataroot DATAROOT]
               [--sub_dir SUB_DIR] [--model_path MODEL_PATH]

Key flags:

--batch_size (default: 64)
--loadSize / --CropSize for image preprocessing (defaults: 256 / 224)
--robust_all to evaluate model robustness across different noises/attacks
--sub_dir list of subfolders in the test set (defaults to the 7 AR models)
--model_path path to your trained model checkpoint

Example (evaluate D³QE):

There's an eval.sh with default settings you can adapt.

bash eval.sh
# or run evaluation directly
python eval.py \
    --model_path /your/model/path \
    --detect_method D3QE  \
    --batch_size 1 \
    --dataroot /path/to/your/testset \
    --sub_dir '["Infinity","Janus_Pro","RAR","Switti","VAR","LlamaGen","Open_MAGVIT2"]'

Pretrained Models

Pretrained model checkpoints are uploaded at: 🤗 Hugging Face

Acknowledgments

This codebase builds on and borrows design patterns from:

Thanks to the authors of those projects for making their code and models available.

Citation

If you use this repository or dataset in your research, please cite our paper:

@inproceedings{zhang2025d3qe,
  title={D3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection},
  author={Zhang, Yanran and Yu, Bingyao and Zheng, Yu and Zheng, Wenzhao and Duan, Yueqi and Chen, Lei and Zhou, Jie and Lu, Jiwen},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={16292--16301},
  year={2025}
}

Contact

For questions, issues, or reproducibility requests, please open an issue on this repository or contact the authors (PRs and issues are welcome), or reach out to: zhangyr21@mails.tsinghua.edu.cn

D3QE

Install / Use

README