SkillAgentSearch skills...

OpenAttack

An Open-Source Package for Textual Adversarial Attack.

Install / Use

/learn @thunlp/OpenAttack
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p align="center"> <img src="docs/source/images/logo.svg" width = "400" alt="OpenAttack Logo" align=center /> </p> <p align="center"> <a target="_blank"> <img src="https://github.com/thunlp/OpenAttack/workflows/Test/badge.svg?branch=master" alt="Github Runner Covergae Status"> </a> <a href="https://openattack.readthedocs.io/" target="_blank"> <img src="https://readthedocs.org/projects/openattack/badge/?version=latest" alt="ReadTheDoc Status"> </a> <a href="https://pypi.org/project/OpenAttack/" target="_blank"> <img src="https://img.shields.io/pypi/v/OpenAttack?label=pypi" alt="PyPI version"> </a> <a href="https://github.com/thunlp/OpenAttack/releases" target="_blank"> <img src="https://img.shields.io/github/v/release/thunlp/OpenAttack" alt="GitHub release (latest by date)"> </a> <a target="_blank"> <img alt="GitHub" src="https://img.shields.io/github/license/thunlp/OpenAttack"> </a> <a target="_blank"> <img src="https://img.shields.io/badge/PRs-Welcome-red" alt="PRs are Welcome"> </a> <br><br> <a href="https://openattack.readthedocs.io/" target="_blank">Documentation</a> • <a href="#features--uses">Features & Uses</a> • <a href="#usage-examples">Usage Examples</a> • <a href="#attack-models">Attack Models</a> • <a href="#toolkit-design">Toolkit Design</a> <br> </p>

OpenAttack is an open-source Python-based textual adversarial attack toolkit, which handles the whole process of textual adversarial attacking, including preprocessing text, accessing the victim model, generating adversarial examples and evaluation.

Features & Uses

OpenAttack has the following features:

⭐️ Support for all attack types. OpenAttack supports all types of attacks including sentence-/word-/character-level perturbations and gradient-/score-/decision-based/blind attack models;

⭐️ Multilinguality. OpenAttack supports English and Chinese now. Its extensible design enables quick support for more languages;

⭐️ Parallel processing. OpenAttack provides support for multi-process running of attack models to improve attack efficiency;

⭐️ Compatibility with 🤗 Hugging Face. OpenAttack is fully integrated with 🤗 Transformers and Datasets libraries;

⭐️ Great extensibility. You can easily attack a customized <u>victim model</u> on any customized <u>dataset</u> or develop and evaluate a customized <u>attack model</u>.

OpenAttack has a wide range of uses, including:

✅ Providing various handy baselines for attack models;

✅ Comprehensively evaluating attack models using its thorough evaluation metrics;

✅ Assisting in quick development of new attack models with the help of its common attack components;

✅ Evaluating the robustness of a machine learning model against various adversarial attacks;

✅ Conducting adversarial training to improve robustness of a machine learning model by enriching the training data with generated adversarial examples.

Installation

1. Using pip (recommended)

pip install OpenAttack

2. Cloning this repo

git clone https://github.com/thunlp/OpenAttack.git
cd OpenAttack
python setup.py install

After installation, you can try running demo.py to check if OpenAttack works well:

python demo.py

demo

Usage Examples

Attack Built-in Victim Models

OpenAttack builds in some commonly used NLP models like BERT (Devlin et al. 2018) and RoBERTa (Liu et al. 2019) that have been fine-tuned on some commonly used datasets (such as SST-2). You can effortlessly conduct adversarial attacks against these built-in victim models.

The following code snippet shows how to use PWWS, a greedy algorithm-based attack model (Ren et al., 2019), to attack BERT on the SST-2 dataset (the complete executable code is here).

import OpenAttack as oa
import datasets # use the Hugging Face's datasets library
# change the SST dataset into 2-class
def dataset_mapping(x):
    return {
        "x": x["sentence"],
        "y": 1 if x["label"] > 0.5 else 0,
    }
# choose a trained victim classification model
victim = oa.DataManager.loadVictim("BERT.SST")
# choose 20 examples from SST-2 as the evaluation data 
dataset = datasets.load_dataset("sst", split="train[:20]").map(function=dataset_mapping)
# choose PWWS as the attacker and initialize it with default parameters
attacker = oa.attackers.PWWSAttacker()
# prepare for attacking
attack_eval = OpenAttack.AttackEval(attacker, victim)
# launch attacks and print attack results 
attack_eval.eval(dataset, visualize=True)
<details> <summary><strong>Customized Victim Model</strong></summary>

The following code snippet shows how to use PWWS to attack a customized sentiment analysis model (a statistical model built in NLTK) on SST-2 (the complete executable code is here).

import OpenAttack as oa
import numpy as np
import datasets
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer


# configure access interface of the customized victim model by extending OpenAttack.Classifier.
class MyClassifier(oa.Classifier):
    def __init__(self):
        # nltk.sentiment.vader.SentimentIntensityAnalyzer is a traditional sentiment classification model.
        nltk.download('vader_lexicon')
        self.model = SentimentIntensityAnalyzer()
    
    def get_pred(self, input_):
        return self.get_prob(input_).argmax(axis=1)

    # access to the classification probability scores with respect input sentences
    def get_prob(self, input_):
        ret = []
        for sent in input_:
            # SentimentIntensityAnalyzer calculates scores of “neg” and “pos” for each instance
            res = self.model.polarity_scores(sent)

            # we use 𝑠𝑜𝑐𝑟𝑒_𝑝𝑜𝑠 / (𝑠𝑐𝑜𝑟𝑒_𝑛𝑒𝑔 + 𝑠𝑐𝑜𝑟𝑒_𝑝𝑜𝑠) to represent the probability of positive sentiment
            # Adding 10^−6 is a trick to avoid dividing by zero.
            prob = (res["pos"] + 1e-6) / (res["neg"] + res["pos"] + 2e-6)

            ret.append(np.array([1 - prob, prob]))
        
        # The get_prob method finally returns a np.ndarray of shape (len(input_), 2). See Classifier for detail.
        return np.array(ret)

def dataset_mapping(x):
    return {
        "x": x["sentence"],
        "y": 1 if x["label"] > 0.5 else 0,
    }
    
# load some examples of SST-2 for evaluation
dataset = datasets.load_dataset("sst", split="train[:20]").map(function=dataset_mapping)
# choose the costomized classifier as the victim model
victim = MyClassifier()
# choose PWWS as the attacker and initialize it with default parameters
attacker = oa.attackers.PWWSAttacker()
# prepare for attacking
attack_eval = oa.AttackEval(attacker, victim)
# launch attacks and print attack results 
attack_eval.eval(dataset, visualize=True)
</details> <details> <summary><strong>Customized Dataset</strong></summary>

The following code snippet shows how to use PWWS to attack an existing fine-tuned sentiment analysis model on a customized dataset (the complete executable code is here).

import OpenAttack as oa
import transformers
import datasets

# load a fine-tuned sentiment analysis model from Transformers (you can also use our fine-tuned Victim.BERT.SST)
tokenizer = transformers.AutoTokenizer.from_pretrained("echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid")
model = transformers.AutoModelForSequenceClassification.from_pretrained("echarlaix/bert-base-uncased-sst2-acc91.1-d37-hybrid", num_labels=2, output_hidden_states=False)
victim = oa.classifiers.TransformersClassifier(model, tokenizer, model.bert.embeddings.word_embeddings)

# choose PWWS as the attacker and initialize it with default parameters
attacker = oa.attackers.PWWSAttacker()

# create your customized dataset
dataset = datasets.Dataset.from_dict({
    "x": [
        "I hate this movie.",
        "I like this apple."
    ],
    "y": [
        0, # 0 for negative
        1, # 1 for positive
    ]
})

# prepare for attacking
attack_eval = oa.AttackEval(attacker, victim, metrics = [oa.metric.EditDistance(), oa.metric.ModificationRate()])
# launch attacks and print attack results
attack_eval.eval(dataset, visualize=True)
</details> <details> <summary><strong>Multiprocessing</strong></summary>

OpenAttack supports convenient multiprocessing to accelerate the process of adversarial attacks. The following code snippet shows how to use multiprocessing in adversarial attacks with Genetic (Alzantot et al. 2018), a genetic algorithm-based attack model (the complete executable code is here).

import OpenAttack as oa
import datasets

def dataset_mapping(x):
    return {
        "x": x["sentence"],
        "y": 1 if x["label"] > 0.5 else 0,
    }

victim = oa.loadVictim("BERT.SST")
dataset = datasets.load_dataset("sst", split="train[:20]").map(function=dataset_mapping)
attacker = oa.attackers.GeneticAttacker()
attack_eval = oa.AttackEval(attacker, victim)
# Using multiprocessing simply by specify num_workers
attack_eval.eval(dataset, visualize=True, num_workers=4)
</details> <details> <summary><strong>Chinese Attack</strong></summary>

OpenAttack now supports adversarial attacks against English and Chinese victim models. Here is an example code of conducting adversarial attacks against a Chinese review classification model using PWWS.

</details> <details> <summary><strong>Customized Attack Model</strong></summary>

OpenAttack incorporates many handy components that can be easily assembled in

Related Skills

View on GitHub
GitHub Stars772
CategoryDevelopment
Updated8d ago
Forks128

Languages

Python

Security Score

100/100

Audited on Mar 23, 2026

No findings