DIJA

(ICLR 2026 🔥) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"

Generate Convert Improve

Install / Use

/learn @ZichenWen1/DIJA

About this skill

Quality Score

0/100

README

<div align="center"> <h1 style="display: inline-block; margin: 0;">🎭 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs</h1> </div> <h4 align="center">

Zichen Wen1,2, Jiashu Qu2, Dongrui Liu2*, Zhiyuan Liu1,2, Ruixi Wu1,2, Yicun Yang1, Xiangqi Jin1, Haoyun Xu1, Xuyang Liu1, Weijia Li3,2, Chaochao Lu2, Jing Shao2, Conghui He2✉, Linfeng Zhang1✉,

1EPIC Lab, Shanghai Jiao Tong University, 2Shanghai AI Laboratory, 3Sun Yat-sen University

✉Corresponding authors, *Project lead

</h4> <div align="center">

</div>

📰 News

2026.02.10 🤗🤗 DIJA has been accepted by ICLR 2026!
2025.09.30 🤗🤗 DIJA now supports Dream-Coder-v0-Instruct-7B, DiffuCoder-7B-Instruct, and DiffuCoder-7B-cpGRPO!
2025.07.21 🤗🤗 Our paper is honored to be the #1 Paper of the day!
2025.07.16 🤗🤗 We release our latest work DIJA, the first investigation into the safety issues of dLLMs. Code is available!

👀 Overview

💥 This is the first investigation into the safety issues of dLLMs. We identify and characterize a novel attack pathway against dLLMs, rooted in their bidirectional and parallel decoding mechanisms.
💥 We propose DIJA, an automated jailbreak attack pipeline that transforms vanilla jailbreak prompts into interleaved text-mask jailbreak prompts capable of eliciting harmful completions on dLLMs.
💥 We conduct comprehensive experiments demonstrating the effectiveness of DIJA across multiple dLLMs compared with existing attack methods, highlighting critical gaps in current alignment strategies and exposing urgent security vulnerabilities in existing dLLM architectures that require immediate addressing.

📊 Performance

🎯 DIJA achieves the highest ASR-k across all benchmarks, indicating that dLLMs are highly unlikely to refuse answering dangerous or sensitive topics under the DIJA attack.
🎯 For the more secure Dream-Instruct, DIJA achieves an improvement of up to 78.5% in ASR-e on JailbreakBench over the best baseline, ReNeLLM, and a 37.7% improvement in StrongREJECT score.

🛠 Preparation

Clone this repository.

  git clone https://github.com/ZichenWen1/DIJA
  cd DIJA

Install models

  cd hf_models && bash model_download.sh

Environment setup

  conda create -n DIJA python=3.10 -y
  conda activate DIJA
  pip install -r requirements.txt

🧪 Usage and Evaluation

Parameters

[Version]: You can set the version number for this run
[Defense_method]: Choose whether to apply defense during the attack. Options: None, Self-reminder, RPO
[Victim_model]: Select the targeted diffusion LLM. Options: llada_instruct, llada_1.5, dream_instruct, mmada_mixcot

HarmBench evaluation

  # Interleaved mask-text prompt construction
  cd run_harmbench
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_harmbench.sh DIJA [Defense_method] [Victim_model] [Version]

JailbreakBench evaluation

  # Interleaved mask-text prompt construction
  cd run_jailbreakbench
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_jailbreakbench.sh DIJA [Defense_method] [Victim_model] [Version]

StrongREJECT evaluation

  # Interleaved mask-text prompt construction
  cd run_strongreject
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_strongreject.sh DIJA [Defense_method] [Victim_model] [Version]

📌 TODO

[x] Release Inference and Evaluation Code
[x] Support DiffuCoder, Dream-Coder
[x] Release the interleaved mask-text prompt
[ ] Support AdvBench evaluation

🔑 License

This project is released under the Apache 2.0 license.

📍 Citation

Please consider citing our paper in your publications if our works help your research.

@article{wen2025devil,
  title={The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs},
  author={Wen, Zichen and Qu, Jiashu and Liu, Dongrui and Liu, Zhiyuan and Wu, Ruixi and Yang, Yicun and Jin, Xiangqi and Xu, Haoyun and Liu, Xuyang and Li, Weijia and others},
  journal={arXiv preprint arXiv:2507.11097},
  year={2025}
}

👍 Acknowledgments

Diffusion LLMs

We would like to express our sincere gratitude to the open-source contributions from the teams behind LLaDA, LLaDA-1.5, Dream, and MMaDA.

Jailbreak Benchmarks

We are deeply appreciative of the open-source efforts by the developers of HarmBench, JailbreakBench, and StrongREJECT.

📩 Contact

For any questions about our paper or code, please email zichen.wen@outlook.com.

Related Skills

node-connect

347.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

107.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。