DIJA
(ICLR 2026 ๐ฅ) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"
Install / Use
/learn @ZichenWen1/DIJAREADME
Zichen Wen<sup>1,2</sup>, Jiashu Qu<sup>2</sup>, Dongrui Liu<sup>2*</sup>, Zhiyuan Liu<sup>1,2</sup>, Ruixi Wu<sup>1,2</sup>, Yicun Yang<sup>1</sup>, Xiangqi Jin<sup>1</sup>, <br> Haoyun Xu<sup>1</sup>, Xuyang Liu<sup>1</sup>, Weijia Li<sup>3,2</sup>, Chaochao Lu<sup>2</sup>, Jing Shao<sup>2</sup>, Conghui He<sup>2โ</sup>, Linfeng Zhang<sup>1โ</sup>,
<sup>1</sup>EPIC Lab, Shanghai Jiao Tong University, <sup>2</sup>Shanghai AI Laboratory, <br> <sup>3</sup>Sun Yat-sen University
โCorresponding authors, *Project lead
</h4> <div align="center"> </div>๐ฐ News
2026.02.10๐ค๐ค DIJA has been accepted by ICLR 2026!2025.09.30๐ค๐ค DIJA now supports Dream-Coder-v0-Instruct-7B, DiffuCoder-7B-Instruct, and DiffuCoder-7B-cpGRPO!2025.07.21๐ค๐ค Our paper is honored to be the #1 Paper of the day!2025.07.16๐ค๐ค We release our latest work DIJA, the first investigation into the safety issues of dLLMs. Code is available!
๐ Overview
- ๐ฅ This is the first investigation into the safety issues of dLLMs. We identify and characterize a novel attack pathway against dLLMs, rooted in their bidirectional and parallel decoding mechanisms.
- ๐ฅ We propose DIJA, an automated jailbreak attack pipeline that transforms vanilla jailbreak prompts into interleaved text-mask jailbreak prompts capable of eliciting harmful completions on dLLMs.
- ๐ฅ We conduct comprehensive experiments demonstrating the effectiveness of DIJA across multiple dLLMs compared with existing attack methods, highlighting critical gaps in current alignment strategies and exposing urgent security vulnerabilities in existing dLLM architectures that require immediate addressing.
๐ Performance
- ๐ฏ DIJA achieves the highest ASR-k across all benchmarks, indicating that dLLMs are highly unlikely to refuse answering dangerous or sensitive topics under the DIJA attack.
- ๐ฏ For the more secure Dream-Instruct, DIJA achieves an improvement of up to 78.5% in ASR-e on JailbreakBench over the best baseline, ReNeLLM, and a 37.7% improvement in StrongREJECT score.
๐ Preparation
- Clone this repository.
git clone https://github.com/ZichenWen1/DIJA
cd DIJA
- Install models
cd hf_models && bash model_download.sh
- Environment setup
conda create -n DIJA python=3.10 -y
conda activate DIJA
pip install -r requirements.txt
๐งช Usage and Evaluation
Parameters
- [Version]: You can set the version number for this run
- [Defense_method]: Choose whether to apply defense during the attack. Options: None, Self-reminder, RPO
- [Victim_model]: Select the targeted diffusion LLM. Options: llada_instruct, llada_1.5, dream_instruct, mmada_mixcot
HarmBench evaluation
# Interleaved mask-text prompt construction
cd run_harmbench
bash refine_prompt/run_refine.sh [Version]
# Jailbreak attack and evaluation
bash eval_harmbench.sh DIJA [Defense_method] [Victim_model] [Version]
JailbreakBench evaluation
# Interleaved mask-text prompt construction
cd run_jailbreakbench
bash refine_prompt/run_refine.sh [Version]
# Jailbreak attack and evaluation
bash eval_jailbreakbench.sh DIJA [Defense_method] [Victim_model] [Version]
StrongREJECT evaluation
# Interleaved mask-text prompt construction
cd run_strongreject
bash refine_prompt/run_refine.sh [Version]
# Jailbreak attack and evaluation
bash eval_strongreject.sh DIJA [Defense_method] [Victim_model] [Version]
๐ TODO
- [x] Release Inference and Evaluation Code
- [x] Support DiffuCoder, Dream-Coder
- [x] Release the interleaved mask-text prompt
- [ ] Support AdvBench evaluation
๐ License
This project is released under the Apache 2.0 license.
๐ Citation
Please consider citing our paper in your publications if our works help your research.
@article{wen2025devil,
title={The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs},
author={Wen, Zichen and Qu, Jiashu and Liu, Dongrui and Liu, Zhiyuan and Wu, Ruixi and Yang, Yicun and Jin, Xiangqi and Xu, Haoyun and Liu, Xuyang and Li, Weijia and others},
journal={arXiv preprint arXiv:2507.11097},
year={2025}
}
๐ Acknowledgments
Diffusion LLMs
We would like to express our sincere gratitude to the open-source contributions from the teams behind LLaDA, LLaDA-1.5, Dream, and MMaDA.
Jailbreak Benchmarks
We are deeply appreciative of the open-source efforts by the developers of HarmBench, JailbreakBench, and StrongREJECT.
๐ฉ Contact
For any questions about our paper or code, please email zichen.wen@outlook.com.
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot ๅฏๅชไฝๆถๅ่ฝๅใไฝฟ็จ <qqmedia> ๆ ็ญพ๏ผ็ณป็ปๆ นๆฎๆไปถๆฉๅฑๅ่ชๅจ่ฏๅซ็ฑปๅ๏ผๅพ็/่ฏญ้ณ/่ง้ข/ๆไปถ๏ผใ
