UMK

Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models

The implementation of our multimodal jailbreak code is based on the work of Visual-Adversarial-Examples-Jailbreak-Large-Language-Models . Gratitude is extended to the original authors for their valuable contributions and commitment to open source.

Basic Setup

The fundamental setup tasks (e.g., environment setup and pretrained weights preparation) can be easily accomplished by referring to the guidelines provided in the aforementioned project: Visual-Adversarial-Examples-Jailbreak-Large-Language-Models .

Attack on MiniGPT-4

After injecting toxic semantics into the adversarial image using the VAJM method, use the following multimodal attack strategy to maximize the probability of the model following the malicious instructions:

python minigpt_vlm_attack.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --n_iters 5000  --alpha 1 --save_dir vlm_unconstrained

Evaluation

We provide the test code for using off-the-shelf adversarial examples on two different datasets:

Evaluation on VAJM test set

python minigpt_test_manual_prompts_vlm.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --image_path  adversarial_images/bad_vlm_prompt.bmp

Evaluation on Advbench

python minigpt_test_advbench.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --image_path  adversarial_images/bad_vlm_prompt.bmp

UMK

Install / Use

README

UMK

Basic Setup

Attack on MiniGPT-4

Evaluation

Evaluation on VAJM test set

Evaluation on Advbench