FakeShield
π₯ [ICLR 2025] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
Install / Use
/learn @zhipeixu/FakeShieldREADME
Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, Jian Zhang
School of Electronic and Computer Engineering, Peking University
</div><details open><summary>π‘ We also have other Copyright Protection projects that may interest you β¨. </summary><p> <!-- may -->
AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection <br> Zhipei Xu, Xuanyu Zhang, Xing Zhou, Jian Zhang <br>
![]()
![]()
<br>
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection [CVPR 2024] <br> Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, Jian Zhang <br>
![]()
![]()
<br>
</p></details>OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking [CVPR 2025] <br> Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, Jian Zhang <br>
![]()
![]()
<br>
π° News
- [2026.02.21] π₯π₯π₯ We have updated the SD_Inpaint dataset on Hugging Face, and you can access it from here.
- [2025.04.23] π€ We have open-sourced the MMTD-Set-34k dataset on Hugging Face, and you can access it from here.
- [2025.02.14] π€ We ~~are progressively open-sourcing~~ have open-sourced all code & pre-trained model weights. Welcome to watch π this repository for the latest updates.
- [2025.01.23] πππ Our FakeShield has been accepted at ICLR 2025!
- [2024.10.03] π₯ We have released FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models. We present explainable IFDL tasks, constructing the MMTD-Set dataset and the FakeShield framework. Check out the paper. The code and dataset are coming soon
<img id="painting_icon" width="3%" src="https://cdn-icons-png.flaticon.com/128/1022/1022330.png"> FakeShield Overview
FakeShield is a novel multi-modal framework designed for explainable image forgery detection and localization (IFDL). Unlike traditional black-box IFDL methods, FakeShield integrates multi-modal large language models (MLLMs) to analyze manipulated images, generate tampered region masks, and provide human-understandable explanations based on pixel-level artifacts and semantic inconsistencies. To improve generalization across diverse forgery types, FakeShield introduces domain tags, which guide the model to recognize different manipulation techniques effectively. Additionally, we construct MMTD-Set, a richly annotated dataset containing multi-modal descriptions of manipulated images, fostering better interpretability. Through extensive experiments, FakeShield demonstrates superior performance in detecting and localizing various forgeries, including copy-move, splicing, removal, DeepFake, and AI-generated manipulations.

π Contributions
-
FakeShield Introduction. We introduce FakeShield, a multi-modal framework for explainable image forgery detection and localization, which is the first to leverage MLLMs for the IFDL task. We also propose Domain Tag-guided Explainable Forgery Detection Module(DTE-FDM) and Multimodal Forgery Localization Module (MFLM) to improve the generalization and robustness of the models
-
Novel Explainable-IFDL Task. We propose the first explainable image forgery detection and localization (e-IFDL) task, addressing the opacity of traditional IFDL methods by providing both pixel-level and semantic-level explanations.
-
MMTD-Set Dataset Construction. We create the MMTD-Set by enriching existing IFDL datasets using GPT-4o, generating high-quality βimage-mask-descriptionβ triplets for enhanced multimodal learning.
π οΈ Requirements and Installation
Note: If you want to reproduce the results from our paper, please prioritize using the Docker image to set up the environment. For more details, see this issue.
Installation via Pip
-
Ensure your environment meets the following requirements:
- Python == 3.9
- Pytorch == 1.13.0
- CUDA Version == 11.6
-
Clone the repository:
git clone https://github.com/zhipeixu/FakeShield.git cd FakeShield -
Install dependencies:
apt update && apt install git pip install -r requirements.txt ## Install MMCV git clone https://github.com/open-mmlab/mmcv cd mmcv git checkout v1.4.7 MMCV_WITH_OPS=1 pip install -e . -
Install DTE-FDM:
cd ../DTE-FDM pip install -e . pip install -e ".[train]" pip install flash-attn --no-build-isolation
Installation via Docker
-
Pull the pre-built Docker image:
docker pull zhipeixu/mflm:v1.0 docker pull zhipeixu/dte-fdm:v1.0 -
Clone the repository:
git clone https://github.com/zhipeixu/FakeShield.git cd FakeShield -
Run the container:
docker run --gpus all -it --rm \ -v $(pwd):/workspace/FakeShield \ zhipeixu/dte-fdm:latest /bin/bash docker run --gpus all -it --rm \ -v $(pwd):/workspace/FakeShield \ zhipeixu/mflm:latest /bin/bash -
Inside the container, navigate to the repository:
cd /workspace/FakeShield -
Install MMCV:
git clone https://github.com/open-mmlab/mmcv
π€ Prepare Model
-
Download FakeShield weights from Hugging Face
The model weights consist of three parts:
DTE-FDM,MFLM, andDTG. For convenience, we have packaged them together and uploaded them to the Hugging Face repository.We recommend using
huggingface_hubto download the weights:pip install huggingface_hub huggingface-cli download --resume-download zhipeixu/fakeshield-v1-22b --local-dir weight/ -
Download pretrained SAM weight
In MFLM, we will use the SAM pre-training weights. You can use
wgetto download thesam_vit_h_4b8939.pthmodel:wget https://huggingface.co/ybelkada/segment-anything/resolve/main/checkpoints/sam_vit_h_4b8939.pth -P weight/ -
Ensure the weights are placed correctly
Organize your
weight/folder as follows:FakeShield/ βββ weight/ β βββ fakeshield-v1-22b/ β β βββ DTE-FDM/ β β βββ MFLM/ β β βββ DTG.pth β βββ sam_vit_h_4b8939.pth
π Quick Start
CLI Demo
You can quickly run the demo script by executing:
bash scripts/cli_demo.sh
The cli_demo.sh script allows customization through the following environment variables:
WEIGHT_PATH: Path to the FakeShield weight directory (default:./weight/fakeshield-v1-22b)IMAGE_PATH: Path to the input image (default: `./play
