SkillAgentSearch skills...

UMK

Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models

Install / Use

/learn @roywang021/UMK
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

UMK

Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models

image The implementation of our multimodal jailbreak code is based on the work of Visual-Adversarial-Examples-Jailbreak-Large-Language-Models . Gratitude is extended to the original authors for their valuable contributions and commitment to open source.

Basic Setup

The fundamental setup tasks (e.g., environment setup and pretrained weights preparation) can be easily accomplished by referring to the guidelines provided in the aforementioned project: Visual-Adversarial-Examples-Jailbreak-Large-Language-Models .

Attack on MiniGPT-4

After injecting toxic semantics into the adversarial image using the VAJM method, use the following multimodal attack strategy to maximize the probability of the model following the malicious instructions:

python minigpt_vlm_attack.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --n_iters 5000  --alpha 1 --save_dir vlm_unconstrained

Evaluation

We provide the test code for using off-the-shelf adversarial examples on two different datasets:

Evaluation on VAJM test set

python minigpt_test_manual_prompts_vlm.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --image_path  adversarial_images/bad_vlm_prompt.bmp

Evaluation on Advbench

python minigpt_test_advbench.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0 --image_path  adversarial_images/bad_vlm_prompt.bmp
View on GitHub
GitHub Stars32
CategoryDevelopment
Updated21d ago
Forks1

Languages

Python

Security Score

75/100

Audited on Mar 7, 2026

No findings