SkillAgentSearch skills...

DIJA

(ICLR 2026 ๐Ÿ”ฅ) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"

Install / Use

/learn @ZichenWen1/DIJA
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<div align="center"> <h1 style="display: inline-block; margin: 0;">๐ŸŽญ The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs</h1> </div> <h4 align="center">

Zichen Wen<sup>1,2</sup>, Jiashu Qu<sup>2</sup>, Dongrui Liu<sup>2*</sup>, Zhiyuan Liu<sup>1,2</sup>, Ruixi Wu<sup>1,2</sup>, Yicun Yang<sup>1</sup>, Xiangqi Jin<sup>1</sup>, <br> Haoyun Xu<sup>1</sup>, Xuyang Liu<sup>1</sup>, Weijia Li<sup>3,2</sup>, Chaochao Lu<sup>2</sup>, Jing Shao<sup>2</sup>, Conghui He<sup>2โœ‰</sup>, Linfeng Zhang<sup>1โœ‰</sup>,

<sup>1</sup>EPIC Lab, Shanghai Jiao Tong University, <sup>2</sup>Shanghai AI Laboratory, <br> <sup>3</sup>Sun Yat-sen University

โœ‰Corresponding authors, *Project lead

</h4> <div align="center">

arXiv License zhihu GitHub issues GitHub Stars

</div>

๐Ÿ“ฐ News

  • 2026.02.10 ๐Ÿค—๐Ÿค— DIJA has been accepted by ICLR 2026!
  • 2025.09.30 ๐Ÿค—๐Ÿค— DIJA now supports Dream-Coder-v0-Instruct-7B, DiffuCoder-7B-Instruct, and DiffuCoder-7B-cpGRPO!
  • 2025.07.21 ๐Ÿค—๐Ÿค— Our paper is honored to be the #1 Paper of the day!
  • 2025.07.16 ๐Ÿค—๐Ÿค— We release our latest work DIJA, the first investigation into the safety issues of dLLMs. Code is available!

๐Ÿ‘€ Overview

  • ๐Ÿ’ฅ This is the first investigation into the safety issues of dLLMs. We identify and characterize a novel attack pathway against dLLMs, rooted in their bidirectional and parallel decoding mechanisms.
  • ๐Ÿ’ฅ We propose DIJA, an automated jailbreak attack pipeline that transforms vanilla jailbreak prompts into interleaved text-mask jailbreak prompts capable of eliciting harmful completions on dLLMs.
  • ๐Ÿ’ฅ We conduct comprehensive experiments demonstrating the effectiveness of DIJA across multiple dLLMs compared with existing attack methods, highlighting critical gaps in current alignment strategies and exposing urgent security vulnerabilities in existing dLLM architectures that require immediate addressing.
<p align='center'> <img src='./assets/attack_cases.jpg' alt='mask' width='850px'> </p>

๐Ÿ“Š Performance

  • ๐ŸŽฏ DIJA achieves the highest ASR-k across all benchmarks, indicating that dLLMs are highly unlikely to refuse answering dangerous or sensitive topics under the DIJA attack.
  • ๐ŸŽฏ For the more secure Dream-Instruct, DIJA achieves an improvement of up to 78.5% in ASR-e on JailbreakBench over the best baseline, ReNeLLM, and a 37.7% improvement in StrongREJECT score.
<p align='center'> <img src='./assets/harmbench_exp.jpg' alt='mask' width='850px'> </p> <p align='center'> <img src='./assets/jailbreakbench_exp.jpg' alt='mask' width='850px'> </p> <p align='center'> <img src='./assets/strongreject_exp.jpg' alt='mask' width='850px'> </p>

๐Ÿ›  Preparation

  1. Clone this repository.
  git clone https://github.com/ZichenWen1/DIJA
  cd DIJA
  1. Install models
  cd hf_models && bash model_download.sh
  1. Environment setup
  conda create -n DIJA python=3.10 -y
  conda activate DIJA
  pip install -r requirements.txt

๐Ÿงช Usage and Evaluation

Parameters

  • [Version]: You can set the version number for this run
  • [Defense_method]: Choose whether to apply defense during the attack. Options: None, Self-reminder, RPO
  • [Victim_model]: Select the targeted diffusion LLM. Options: llada_instruct, llada_1.5, dream_instruct, mmada_mixcot

HarmBench evaluation

  # Interleaved mask-text prompt construction
  cd run_harmbench
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_harmbench.sh DIJA [Defense_method] [Victim_model] [Version]

JailbreakBench evaluation

  # Interleaved mask-text prompt construction
  cd run_jailbreakbench
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_jailbreakbench.sh DIJA [Defense_method] [Victim_model] [Version]

StrongREJECT evaluation

  # Interleaved mask-text prompt construction
  cd run_strongreject
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_strongreject.sh DIJA [Defense_method] [Victim_model] [Version]

๐Ÿ“Œ TODO

  • [x] Release Inference and Evaluation Code
  • [x] Support DiffuCoder, Dream-Coder
  • [x] Release the interleaved mask-text prompt
  • [ ] Support AdvBench evaluation

๐Ÿ”‘ License

This project is released under the Apache 2.0 license.

๐Ÿ“ Citation

Please consider citing our paper in your publications if our works help your research.

@article{wen2025devil,
  title={The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs},
  author={Wen, Zichen and Qu, Jiashu and Liu, Dongrui and Liu, Zhiyuan and Wu, Ruixi and Yang, Yicun and Jin, Xiangqi and Xu, Haoyun and Liu, Xuyang and Li, Weijia and others},
  journal={arXiv preprint arXiv:2507.11097},
  year={2025}
}

๐Ÿ‘ Acknowledgments

Diffusion LLMs

We would like to express our sincere gratitude to the open-source contributions from the teams behind LLaDA, LLaDA-1.5, Dream, and MMaDA.

Jailbreak Benchmarks

We are deeply appreciative of the open-source efforts by the developers of HarmBench, JailbreakBench, and StrongREJECT.

๐Ÿ“ฉ Contact

For any questions about our paper or code, please email zichen.wen@outlook.com.

Related Skills

View on GitHub
GitHub Stars77
CategoryDevelopment
Updated1d ago
Forks3

Languages

Python

Security Score

100/100

Audited on Apr 2, 2026

No findings