SkillAgentSearch skills...

TAADpapers

Must-read Papers on Textual Adversarial Attack and Defense

Install / Use

/learn @thunlp/TAADpapers

README

Must-read Papers on Textual Adversarial Attack and Defense (TAAD)

This list is currently maintained by Chenghao Yang at UChicago.

Other previous main contributors including Fanchao Qi, and Yuan Zang when they were at THUNLP.

We thank all the great contributors very much.

Contents

0. Toolkits

  1. RobustQA: A Framework for Adversarial Text Generation Analysis on Question Answering Systems. Yasaman Boreshban, Seyed Morteza Mirbostani, Seyedeh Fatemeh Ahmadi, Gita Shojaee, Fatemeh Kamani, Gholamreza Ghassem-Sani, Seyed Abolghasem Mirroshandel. EMNLP 2022 Demo. [codebase] [pdf]
  2. SeqAttack: On Adversarial Attacks for Named Entity Recognition. Walter Simoncini, Gerasimos Spanakis. EMNLP 2021 Demo. [website] [pdf]
  3. OpenAttack: An Open-source Textual Adversarial Attack Toolkit. Guoyang Zeng, Fanchao Qi, Qianrui Zhou, Tingji Zhang, Bairu Hou, Yuan Zang, Zhiyuan Liu, Maosong Sun. ACL-IJCNLP 2021 Demo. [website] [doc] [pdf]
  4. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, Yanjun Qi. EMNLP 2020 Demo. [website] [doc] [pdf]

1. Survey Papers

  1. Measure and Improve Robustness in NLP Models: A Survey. Xuezhi Wang, Haohan Wang, Diyi Yang. NAACL 2022. [pdf]
  2. Towards a Robust Deep Neural Network in Texts: A Survey. Wenqi Wang, Lina Wang, Benxiao Tang, Run Wang, Aoshuang Ye. TKDE 2021. [pdf]
  3. Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey. Wei Emma Zhang, Quan Z. Sheng, Ahoud Alhazmi, Chenliang Li. ACM TIST 2020. [pdf]
  4. Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. Han Xu, Yao Ma, Hao-chen Liu, Debayan Deb, Hui Liu, Ji-liang Tang, Anil K. Jain. International Journal of Automation and Computing 2020. [pdf]
  5. Analysis Methods in Neural Language Processing: A Survey. Yonatan Belinkov, James Glass. TACL 2019. [pdf]

2. Attack Papers

Each paper is attached to one or more following labels indicating how much information the attack model knows about the victim model: gradient (=white, all information), score (output decision and scores), decision (only output decision) and blind (nothing)

2.1 Sentence-level Attack

  1. Using Adversarial Attacks to Reveal the Statistical Bias in Machine Reading Comprehension Models. Jieyu Lin, Jiajie Zou, Nai Ding. ACL-IJCNLP 2021. blind [pdf]
  2. Grey-box Adversarial Attack And Defence For Sentiment Classification. Ying Xu, Xu Zhong, Antonio Jimeno Yepes, Jey Han Lau. NAACL-HLT 2021. gradient [pdf] [code]
  3. Generating Syntactically Controlled Paraphrases without Using Annotated Parallel Pairs. Kuan-Hao Huang and Kai-Wei Chang. EACL 2021. [pdf] [code]
  4. CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation. Tianlu Wang, Xuezhi Wang, Yao Qin, Ben Packer, Kang Lee, Jilin Chen, Alex Beutel, Ed Chi. EMNLP 2020. score [pdf]
  5. T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack. Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li. EMNLP 2020. gradient [pdf] [code]
  6. Adversarial Attack and Defense of Structured Prediction Models. Wenjuan Han, Liwen Zhang, Yong Jiang, Kewei Tu. EMNLP 2020. blind [pdf] [code]
  7. MALCOM: Generating Malicious Comments to Attack Neural Fake News Detection Models. Thai Le, Suhang Wang, Dongwon Lee. ICDM 2020. gradient [pdf] [code]
  8. Improving the Robustness of Question Answering Systems to Question Paraphrasing. Wee Chung Gan, Hwee Tou Ng. ACL 2019. blind [pdf] [data]
  9. Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering. Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, Jordan Boyd-Graber. TACL 2019. score [pdf]
  10. PAWS: Paraphrase Adversaries from Word Scrambling. Yuan Zhang, Jason Baldridge, Luheng He. NAACL-HLT 2019. blind [pdf] [dataset]
  11. Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent. Minhao Cheng, Wei Wei, Cho-Jui Hsieh. NAACL-HLT 2019. gradient score [pdf] [code]
  12. Semantically Equivalent Adversarial Rules for Debugging NLP Models. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. ACL 2018. decision [pdf] [code]
  13. Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge. Pasquale Minervini, Sebastian Riedel. CoNLL 2018. score [pdf] [code&data]
  14. Robust Machine Comprehension Models via Adversarial Training. Yicheng Wang, Mohit Bansal. NAACL-HLT 2018. decision [pdf] [dataset]
  15. Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer. NAACL-HLT 2018. blind [pdf] [code&data]
  16. Generating Natural Adversarial Examples. Zhengli Zhao, Dheeru Dua, Sameer Singh. ICLR 2018. decision [pdf] [code]
  17. Adversarial Examples for Evaluating Reading Comprehension Systems. Robin Jia, Percy Liang. EMNLP 2017. score decision blind [pdf] [code]
  18. Adversarial Sets for Regularising Neural Link Predictors. Pasquale Minervini, Thomas Demeester, Tim Rocktäschel, Sebastian Riedel. UAI 2017. score [pdf] [code]

2.2 Word-level Attack

  1. Confidence Elicitation: A New Attack Vector for Large Language Models. Brian Formento, Chuan-Sheng Foo, See-Kiong Ng. ICLR 2025. score[pdf][code]
  2. HyGloadAttack: Hard-label black-box textual adversarial attacks via hybrid optimization. Zhaorong Liu, Xi Xiong, Yuanyuan Li, Yan Yu, Jiazhong Lu, Shuai Zhang, Fei Xiong. Neural Networks 2024.descision[pdf][code]
  3. Expanding Scope: Adapting English Adversarial Attacks to Chinese. Hanyu Liu, Chengyuan Cai, Yanjun Qi. Findings of ACL 2023.decision[pdf][code]
  4. Adversarial Text Generation by Search and Learning. Guoyi Li, Bingkang Shi, Zongzhen Liu, Dehan Kong, Yulei Wu, Xiaodan Zhang, Longtao Huang, Honglei Lyu. Findings of ACL 2023.score[pdf][code]
  5. Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework. Lifan Yuan, Yichi Zhang, Yangyi Chen, Wei Wei. Findings of ACL 2023. decision[pdf][[code]([https://github.com/jhl-hust/texthacker](http

Related Skills

View on GitHub
GitHub Stars1.6k
CategoryEducation
Updated2d ago
Forks194

Languages

Python

Security Score

100/100

Audited on Mar 30, 2026

No findings