SocialBotDetectionPapers

Important papers on SocialBots detection

SocialBotDetectionPapers

Introduction

This is a paper list and other useful sources about Social bot deteting.

Overview and Statistics

Keywords Convention

which mainly focus on user info features.

which mainly focus on the text features.

which mainly focus on social graph and use the graph-based methods.

which mainly focus on temporal patterns.

Conference Rank (A, B, C) from China Computer Federation.

Toolkits

Datasets

BotRepository

cresci-2015

Description: A dataset of (i) genuine and (ii) fake Twitter accounts, manually annotated. Released in CSV format.

Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., & Tesconi, M. (2015). Fame for sale: efficient detection of fake Twitter followers. Decision Support Systems, 80, 56-71. [pdf]
cresci-2017

Description: A dataset of (i) genuine, (ii) traditional, and (iii) social spambot Twitter accounts, annotated by CrowdFlower contributors. Released in CSV format.

Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., & Tesconi, M. (2017, April). The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. In Proceedings of the 26th International Conference on World Wide Web Companion (pp. 963-972). ACM. [pdf]

Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., & Tesconi, M. (2017). Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Transactions on Dependable and Secure Computing. [pdf]
caverlee-2011

Description: This social honeypot dataset collected from December 30, 2009 to August 2, 2010 on Twitter. The dataset contains 22,223 content polluters, their number of followings over time, 2,353,473 tweets, and 19,276 legitimate users, their number of followings over time and 3,259,693 tweets.

Lee, Kyumin, Brian David Eoff, and James Caverlee. "Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter." ICWSM. 2011. [pdf]
varol-2017

Description: This dataset contains annotation of 2573 Twitter accounts. Annotation and data crawl is completed in April 2016.

Varol, Onur, Emilio Ferrara, Clayton A. Davis, Filippo Menczer, and Alessandro Flammini. "Online Human-Bot Interactions: Detection, Estimation, and Characterization." ICWSM (2017). [pdf]
gilani-2017

Description: Manually annotated human and bot accounts. Labels and user objects.

Gilani, Zafar, Reza Farahbakhsh, Gareth Tyson, Liang Wang, and Jon Crowcroft. "Of bots and humans (on twitter)." In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 349-354. ACM, 2017. [pdf]
cresci-stock-2018

Description: Automated accounts that act in coordinate fashion. Labels and user objects.

Cresci, Stefano, Fabrizio Lillo, Daniele Regoli, Serena Tardelli, and Maurizio Tesconi. "$ FAKE: Evidence of Spam and Bot Activity in Stock Microblogs on Twitter." In Twelfth International AAAI Conference on Web and Social Media. 2018. [pdf]

Cresci, Stefano, Fabrizio Lillo, Daniele Regoli, Serena Tardelli, and Maurizio Tesconi. "Cashtag piggybacking: Uncovering spam and bot activity in stock microblogs on Twitter." ACM Transactions on the Web (TWEB) 13, no. 2 (2019): 11. [pdf]
midterm-2018

Description: Manually labeled human and bot accounts from 2018 US midterm elections. Labels and processed user objects.

Yang, Kai-Cheng, Onur Varol, Pik-Mai Hui, and Filippo Menczer. "Scalable and generalizable social bot detection through data selection." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 1096-1103. 2020. [pdf]
pronbots-2019

Description: Pronbots shared by Andy Patel (github.com/r0zetta/pronbot2). Labels and user objects.

Yang, Kai‐Cheng, Onur Varol, Clayton A. Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. "Arming the public with artificial intelligence to counter social bots." Human Behavior and Emerging Technologies 1, no. 1 (2019): 48-61. [pdf]
celebrity-2019

Description: Celebrity accounts collected as authentic users. Labels and user objects.

Yang, Kai‐Cheng, Onur Varol, Clayton A. Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. "Arming the public with artificial intelligence to counter social bots." Human Behavior and Emerging Technologies 1, no. 1 (2019): 48-61. [pdf]
vendor-purchased-2019

Description: Fake follower accounts purchased from several companies. Labels and user objects.

Yang, Kai‐Cheng, Onur Varol, Clayton A. Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. "Arming the public with artificial intelligence to counter social bots." Human Behavior and Emerging Technologies 1, no. 1 (2019): 48-61. [pdf]
botometer-feedback-2019

Description: Botometer feedback accounts manually labeled by K.C. Yang. Labels and user objects.

Yang, Kai‐Cheng, Onur Varol, Clayton A. Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. "Arming the public with artificial intelligence to counter social bots." Human Behavior and Emerging Technologies 1, no. 1 (2019): 48-61. [pdf]
political-bots-2019

Description: Automated political accounts run by @rzazula (now suspended), shared by @josh_emerson on Twitter. Labels and user objects.

Yang, Kai‐Cheng, Onur Varol, Clayton A. Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. "Arming the public with artificial intelligence to counter social bots." Human Behavior and Emerging Technologies 1, no. 1 (2019): 48-61. [pdf]
cresci-rtbust-2019

Description: Manually annotated bot and human accounts. Labels and user objects.

Mazza, Michele, Stefano Cresci, Marco Avvenuti, Walter Quattrociocchi, and Maurizio Tesconi. "Rtbust: Exploiting temporal patterns for botnet detection on twitter." In Proceedings of the 10th ACM Conference on Web Science, pp. 183-192. 2019. [pdf]
botwiki-2019

Description: Self-identified bots from https://botwiki.org. Labels and user objects.

Yang, Kai-Cheng, Onur Varol, Pik-Mai Hui, and Filippo Menczer. "Scalable and generalizable social bot detection through data selection." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 1096-1103. 2020. [pdf]
verified-2019

Description: Verified human accounts. Labels and user objects.

Yang, Kai-Cheng, Onur Varol, Pik-Mai Hui, and Filippo Menczer. "Scalable and generalizable social bot detection through data selection." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 1096-1103. 2020. [pdf]
Kaiser

Description: 27 manually annotated German bots, 532 official accounts of German members of parliament, 516 accounts of members of the 115th U.S. Congress

Rauchfleisch, Adrian; Kaiser, Jonas, 2020, "The False positive problem of automatic bot detection in social science research", https://doi.org/10.7910/DVN/XVCKRS, Harvard Dataverse, V2. [pdf]
Astroturf

Description: Hyper-active political bots participating in follow trains and/or systematically deleting high volumes of content

ohsen Sayyadiharikandeh, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, and Filippo Menczer. "Detection of Novel Social Bots by Ensembles of Specialized Classifiers." CIKM. 2020. [pdf]

The above dataset can be downloaded along with [Bot Repository].

TwiBot

TwiBot-20

Description: TwiBot-20 is a comprehensive sample of the Twittersphere and it is representative of the current generation of Twitter bots and genuine users. To download the full dataset, please contact the creator directly. [dataset]

Shangbin Feng, Herun Wan, Ningnan Wang, Jundong Li, and Minnan Luo. "TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark." CIKM. 2021. [pdf]
TwiBot-22

Description:TwiB

SocialBotsDetectionPapers

Install / Use

README