TAADpapers

Must-read Papers on Textual Adversarial Attack and Defense

Generate Convert Improve

Install / Use

/learn @thunlp/TAADpapers

About this skill

Quality Score

0/100

README

Must-read Papers on Textual Adversarial Attack and Defense (TAAD)

This list is currently maintained by Chenghao Yang at UChicago.

Other previous main contributors including Fanchao Qi, and Yuan Zang when they were at THUNLP.

We thank all the great contributors very much.

0. Toolkits
1. Survey Papers
2. Attack Papers (classified according to perturbation level)
3. Defense Papers
4. Certified Robustness
5. Benchmark and Evaluation
6. Other Papers
Contributors

0. Toolkits

RobustQA: A Framework for Adversarial Text Generation Analysis on Question Answering Systems. Yasaman Boreshban, Seyed Morteza Mirbostani, Seyedeh Fatemeh Ahmadi, Gita Shojaee, Fatemeh Kamani, Gholamreza Ghassem-Sani, Seyed Abolghasem Mirroshandel. EMNLP 2022 Demo. [codebase] [pdf]
SeqAttack: On Adversarial Attacks for Named Entity Recognition. Walter Simoncini, Gerasimos Spanakis. EMNLP 2021 Demo. [website] [pdf]
OpenAttack: An Open-source Textual Adversarial Attack Toolkit. Guoyang Zeng, Fanchao Qi, Qianrui Zhou, Tingji Zhang, Bairu Hou, Yuan Zang, Zhiyuan Liu, Maosong Sun. ACL-IJCNLP 2021 Demo. [website] [doc] [pdf]
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, Yanjun Qi. EMNLP 2020 Demo. [website] [doc] [pdf]

1. Survey Papers

Measure and Improve Robustness in NLP Models: A Survey. Xuezhi Wang, Haohan Wang, Diyi Yang. NAACL 2022. [pdf]
Towards a Robust Deep Neural Network in Texts: A Survey. Wenqi Wang, Lina Wang, Benxiao Tang, Run Wang, Aoshuang Ye. TKDE 2021. [pdf]
Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey. Wei Emma Zhang, Quan Z. Sheng, Ahoud Alhazmi, Chenliang Li. ACM TIST 2020. [pdf]
Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. Han Xu, Yao Ma, Hao-chen Liu, Debayan Deb, Hui Liu, Ji-liang Tang, Anil K. Jain. International Journal of Automation and Computing 2020. [pdf]
Analysis Methods in Neural Language Processing: A Survey. Yonatan Belinkov, James Glass. TACL 2019. [pdf]

2. Attack Papers

Each paper is attached to one or more following labels indicating how much information the attack model knows about the victim model: gradient (=white, all information), score (output decision and scores), decision (only output decision) and blind (nothing)

2.1 Sentence-level Attack

Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models. Jieyu Lin, Jiajie Zou, Nai Ding. ACL-IJCNLP 2021. blind [pdf]
Grey-box Adversarial Attack And Defence For Sentiment Classiﬁcation. Ying Xu, Xu Zhong, Antonio Jimeno Yepes, Jey Han Lau. NAACL-HLT 2021. gradient [pdf] [code]
Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs. Kuan-Hao Huang and Kai-Wei Chang. EACL 2021. [pdf] [code]
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation. Tianlu Wang, Xuezhi Wang, Yao Qin, Ben Packer, Kang Lee, Jilin Chen, Alex Beutel, Ed Chi. EMNLP 2020. score [pdf]
T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack. Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li. EMNLP 2020. gradient [pdf] [code]
Adversarial Attack and Defense of Structured Prediction Models. Wenjuan Han, Liwen Zhang, Yong Jiang, Kewei Tu. EMNLP 2020. blind [pdf] [code]
MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models. Thai Le, Suhang Wang, Dongwon Lee. ICDM 2020. gradient [pdf] [code]
Improving the Robustness of Question Answering Systems to Question Paraphrasing. Wee Chung Gan, Hwee Tou Ng. ACL 2019. blind [pdf] [data]
Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering. Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, Jordan Boyd-Graber. TACL 2019. score [pdf]
PAWS: Paraphrase Adversaries from Word Scrambling. Yuan Zhang, Jason Baldridge, Luheng He. NAACL-HLT 2019. blind [pdf] [dataset]
Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent. Minhao Cheng, Wei Wei, Cho-Jui Hsieh. NAACL-HLT 2019. gradient score [pdf] [code]
Semantically Equivalent Adversarial Rules for Debugging NLP Models. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. ACL 2018. decision [pdf] [code]
Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge. Pasquale Minervini, Sebastian Riedel. CoNLL 2018. score [pdf] [code&data]
Robust Machine Comprehension Models via Adversarial Training. Yicheng Wang, Mohit Bansal. NAACL-HLT 2018. decision [pdf] [dataset]
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer. NAACL-HLT 2018. blind [pdf] [code&data]
Generating Natural Adversarial Examples. Zhengli Zhao, Dheeru Dua, Sameer Singh. ICLR 2018. decision [pdf] [code]
Adversarial Examples for Evaluating Reading Comprehension Systems. Robin Jia, Percy Liang. EMNLP 2017. score decision blind [pdf] [code]
Adversarial Sets for Regularising Neural Link Predictors. Pasquale Minervini, Thomas Demeester, Tim Rocktäschel, Sebastian Riedel. UAI 2017. score [pdf] [code]

2.2 Word-level Attack

Confidence Elicitation: A New Attack Vector for Large Language Models. Brian Formento, Chuan-Sheng Foo, See-Kiong Ng. ICLR 2025. score[pdf][code]
HyGloadAttack: Hard-label black-box textual adversarial attacks via hybrid optimization. Zhaorong Liu, Xi Xiong, Yuanyuan Li, Yan Yu, Jiazhong Lu, Shuai Zhang, Fei Xiong. Neural Networks 2024.descision[pdf][code]
Expanding Scope: Adapting English Adversarial Attacks to Chinese. Hanyu Liu, Chengyuan Cai, Yanjun Qi. Findings of ACL 2023.decision[pdf][code]
Adversarial Text Generation by Search and Learning. Guoyi Li, Bingkang Shi, Zongzhen Liu, Dehan Kong, Yulei Wu, Xiaodan Zhang, Longtao Huang, Honglei Lyu. Findings of ACL 2023.score[pdf][code]
Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework. Lifan Yuan, Yichi Zhang, Yangyi Chen, Wei Wei. Findings of ACL 2023. decision[pdf][[code]([https://github.com/jhl-hust/texthacker](http

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

isf-agent

a repo for an agent that helps researchers apply for isf funding

thunlp

View profile

View on GitHub

GitHub Stars1.6k

CategoryEducation

Updated2d ago

Forks194

thunlp/TAADpapers

Languages

Python

Security Score

100/100

Audited on Mar 30, 2026

No findings