TAADpapers
Must-read Papers on Textual Adversarial Attack and Defense
Install / Use
/learn @thunlp/TAADpapersREADME
Must-read Papers on Textual Adversarial Attack and Defense (TAAD)
This list is currently maintained by Chenghao Yang at UChicago.
Other previous main contributors including Fanchao Qi, and Yuan Zang when they were at THUNLP.
We thank all the great contributors very much.
Contents
- 0. Toolkits
- 1. Survey Papers
- 2. Attack Papers (classified according to perturbation level)
- 3. Defense Papers
- 4. Certified Robustness
- 5. Benchmark and Evaluation
- 6. Other Papers
- Contributors
0. Toolkits
- RobustQA: A Framework for Adversarial Text Generation Analysis on Question Answering Systems. Yasaman Boreshban, Seyed Morteza Mirbostani, Seyedeh Fatemeh Ahmadi, Gita Shojaee, Fatemeh Kamani, Gholamreza Ghassem-Sani, Seyed Abolghasem Mirroshandel. EMNLP 2022 Demo. [codebase] [pdf]
- SeqAttack: On Adversarial Attacks for Named Entity Recognition. Walter Simoncini, Gerasimos Spanakis. EMNLP 2021 Demo. [website] [pdf]
- OpenAttack: An Open-source Textual Adversarial Attack Toolkit. Guoyang Zeng, Fanchao Qi, Qianrui Zhou, Tingji Zhang, Bairu Hou, Yuan Zang, Zhiyuan Liu, Maosong Sun. ACL-IJCNLP 2021 Demo. [website] [doc] [pdf]
- TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, Yanjun Qi. EMNLP 2020 Demo. [website] [doc] [pdf]
1. Survey Papers
- Measure and Improve Robustness in NLP Models: A Survey. Xuezhi Wang, Haohan Wang, Diyi Yang. NAACL 2022. [pdf]
- Towards a Robust Deep Neural Network in Texts: A Survey. Wenqi Wang, Lina Wang, Benxiao Tang, Run Wang, Aoshuang Ye. TKDE 2021. [pdf]
- Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey. Wei Emma Zhang, Quan Z. Sheng, Ahoud Alhazmi, Chenliang Li. ACM TIST 2020. [pdf]
- Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. Han Xu, Yao Ma, Hao-chen Liu, Debayan Deb, Hui Liu, Ji-liang Tang, Anil K. Jain. International Journal of Automation and Computing 2020. [pdf]
- Analysis Methods in Neural Language Processing: A Survey. Yonatan Belinkov, James Glass. TACL 2019. [pdf]
2. Attack Papers
Each paper is attached to one or more following labels indicating how much information the attack model knows about the victim model: gradient (=white, all information), score (output decision and scores), decision (only output decision) and blind (nothing)
2.1 Sentence-level Attack
- Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models. Jieyu Lin, Jiajie Zou, Nai Ding. ACL-IJCNLP 2021.
blind[pdf] - Grey-box Adversarial Attack And Defence For Sentiment Classification. Ying Xu, Xu Zhong, Antonio Jimeno Yepes, Jey Han Lau. NAACL-HLT 2021.
gradient[pdf] [code] - Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs. Kuan-Hao Huang and Kai-Wei Chang. EACL 2021. [pdf] [code]
- CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation. Tianlu Wang, Xuezhi Wang, Yao Qin, Ben Packer, Kang Lee, Jilin Chen, Alex Beutel, Ed Chi. EMNLP 2020.
score[pdf] - T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack. Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li. EMNLP 2020.
gradient[pdf] [code] - Adversarial Attack and Defense of Structured Prediction Models. Wenjuan Han, Liwen Zhang, Yong Jiang, Kewei Tu. EMNLP 2020.
blind[pdf] [code] - MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models. Thai Le, Suhang Wang, Dongwon Lee. ICDM 2020.
gradient[pdf] [code] - Improving the Robustness of Question Answering Systems to Question Paraphrasing. Wee Chung Gan, Hwee Tou Ng. ACL 2019.
blind[pdf] [data] - Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering. Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, Jordan Boyd-Graber. TACL 2019.
score[pdf] - PAWS: Paraphrase Adversaries from Word Scrambling. Yuan Zhang, Jason Baldridge, Luheng He. NAACL-HLT 2019.
blind[pdf] [dataset] - Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent. Minhao Cheng, Wei Wei, Cho-Jui Hsieh. NAACL-HLT 2019.
gradientscore[pdf] [code] - Semantically Equivalent Adversarial Rules for Debugging NLP Models. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. ACL 2018.
decision[pdf] [code] - Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge. Pasquale Minervini, Sebastian Riedel. CoNLL 2018.
score[pdf] [code&data] - Robust Machine Comprehension Models via Adversarial Training. Yicheng Wang, Mohit Bansal. NAACL-HLT 2018.
decision[pdf] [dataset] - Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer. NAACL-HLT 2018.
blind[pdf] [code&data] - Generating Natural Adversarial Examples. Zhengli Zhao, Dheeru Dua, Sameer Singh. ICLR 2018.
decision[pdf] [code] - Adversarial Examples for Evaluating Reading Comprehension Systems. Robin Jia, Percy Liang. EMNLP 2017.
scoredecisionblind[pdf] [code] - Adversarial Sets for Regularising Neural Link Predictors. Pasquale Minervini, Thomas Demeester, Tim Rocktäschel, Sebastian Riedel. UAI 2017.
score[pdf] [code]
2.2 Word-level Attack
- Confidence Elicitation: A New Attack Vector for Large Language Models. Brian Formento, Chuan-Sheng Foo, See-Kiong Ng. ICLR 2025.
score[pdf][code] - HyGloadAttack: Hard-label black-box textual adversarial attacks via hybrid optimization. Zhaorong Liu, Xi Xiong, Yuanyuan Li, Yan Yu, Jiazhong Lu, Shuai Zhang, Fei Xiong. Neural Networks 2024.
descision[pdf][code] - Expanding Scope: Adapting English Adversarial Attacks to Chinese. Hanyu Liu, Chengyuan Cai, Yanjun Qi. Findings of ACL 2023.
decision[pdf][code] - Adversarial Text Generation by Search and Learning. Guoyi Li, Bingkang Shi, Zongzhen Liu, Dehan Kong, Yulei Wu, Xiaodan Zhang, Longtao Huang, Honglei Lyu. Findings of ACL 2023.
score[pdf][code] - Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework. Lifan Yuan, Yichi Zhang, Yangyi Chen, Wei Wei. Findings of ACL 2023.
decision[pdf][[code]([https://github.com/jhl-hust/texthacker](http
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
isf-agent
a repo for an agent that helps researchers apply for isf funding
