Tweetbotornot2
🔍🐦🤖 Detect Twitter Bots!
Install / Use
/learn @mkearney/Tweetbotornot2README
<!-- README.md is generated from README.Rmd. Please edit that file -->
Use
Use
tweetbotornot2 <img src="man/figures/logo.png" width="160px" align="right" />
<!-- badges: start --> <!-- badges: end -->{tweetbotornot2} provides an out-of-the-box classifier for
detecting Twitter bots that is easy to use,
interpretable, scalable, and
performant. It also provides a convenient
interface for accessing the botometer API.
Installation
<!-- Install the released version of tweetbotornot2 from [CRAN](https://CRAN.R-project.org) with: --> <!-- ``` r --> <!-- ## install from CRAN --> <!-- install.packages("tweetbotornot2") --> <!-- ``` -->Install the development version of {tweetbotornot2} from
Github with:
## install {remotes} if not already
if (!"remotes" %in% installed.packages()) {
install.packages("remotes")
}
## install from github
remotes::install_github("mkearney/tweetbotornot2")
Predict
Use predict_bot() to run the built-in bot classifier
Provide a vector or data frame of Twitter handles and predict_bot()
will return the estimated probability of each account being a bot.
## vector of screen names
screen_names <- c(
"American__Voter", ## (these ones should be bots)
"MagicRealismBot",
"netflix_bot",
"mitchhedbot",
"rstats4ds",
"thinkpiecebot",
"tidyversetweets",
"newstarsbot",
"CRANberriesFeed",
"AOC", ## (these ones should NOT be bots)
"realDonaldTrump",
"NateSilver538",
"ChadPergram",
"kumailn",
"mindykaling",
"hspter",
"rdpeng",
"kearneymw",
"dfreelon",
"AmeliaMN",
"winston_chang"
)
## data frame with screen names **must be named 'screen_name'**
screen_names_df <- data.frame(screen_name = screen_names)
## vector -> bot estimates
predict_bot(screen_names)
#> user_id screen_name prob_bot
#> 1: 829792389925597184 American__Voter 0.99923730
#> 2: 3701125272 MagicRealismBot 0.99886143
#> 3: 1203840834 netflix_bot 0.85550964
#> 4: 214244836 mitchhedbot 0.99847370
#> 5: 1075011651366199297 rstats4ds 0.99878043
#> 6: 3325527710 thinkpiecebot 0.99953938
#> 7: 935569091678691328 tidyversetweets 0.99963319
#> 8: 780707721209188352 newstarsbot 0.99973100
#> 9: 233585808 CRANberriesFeed 0.99852484
#> 10: 138203134 AOC 0.00082178
#> 11: 25073877 realDonaldTrump 0.00126745
#> 12: 16017475 NateSilver538 0.00203745
#> 13: 16187637 ChadPergram 0.00385066
#> 14: 28406270 kumailn 0.00056573
#> 15: 23544596 mindykaling 0.00087570
#> 16: 24228154 hspter 0.00045269
#> 17: 9308212 rdpeng 0.00398646
#> 18: 2973406683 kearneymw 0.01408189
#> 19: 93476253 dfreelon 0.00055131
#> 20: 19520842 AmeliaMN 0.00769005
#> 21: 1098742782 winston_chang 0.00111468
#> user_id screen_name prob_bot
## data.frame -> bot estimates
#predict_bot(screen_names_df)
This also works on Twitter user IDs.
## vector of user IDs (strings of numbers, ranging from 2-19 digits)
user_ids <- rtweet::lookup_users(screen_names)[["user_id"]]
## data frame with user IDs **must be named 'user_id'**
user_ids_df <- data.frame(user_id = users)
## vector -> bot estimates
predict_bot(user_ids)
## data.frame -> bot estimates
predict_bot(user_ids_df)
The input given to predict_bot() can also be Twitter data returned by
{rtweet}, i.e.,
rtweet::get_timelines()<sup>1</sup>.
## timeline data returned by {rtweet}
twtdat <- rtweet::get_timelines(screen_names, n = 200, check = FALSE)
## generate predictions from twitter data frame
predict_bot(twtdat)
#> user_id screen_name prob_bot
#> 1: 829792389925597184 American__Voter 0.99923730
#> 2: 3701125272 MagicRealismBot 0.99886143
#> 3: 1203840834 netflix_bot 0.85550964
#> 4: 214244836 mitchhedbot 0.99847370
#> 5: 1075011651366199297 rstats4ds 0.99878043
#> 6: 3325527710 thinkpiecebot 0.99953938
#> 7: 935569091678691328 tidyversetweets 0.99963319
#> 8: 780707721209188352 newstarsbot 0.99973100
#> 9: 233585808 CRANberriesFeed 0.99852484
#> 10: 138203134 AOC 0.00082178
#> 11: 25073877 realDonaldTrump 0.00126745
#> 12: 16017475 NateSilver538 0.00203745
#> 13: 16187637 ChadPergram 0.00385066
#> 14: 28406270 kumailn 0.00056573
#> 15: 23544596 mindykaling 0.00087570
#> 16: 24228154 hspter 0.00045269
#> 17: 9308212 rdpeng 0.00398646
#> 18: 2973406683 kearneymw 0.01408189
#> 19: 93476253 dfreelon 0.00055131
#> 20: 19520842 AmeliaMN 0.00769005
#> 21: 1098742782 winston_chang 0.00111468
#> user_id screen_name prob_bot
Explain
Use explain_bot() to see the contributions made by each feature
View prediction contributions from top five features (for each user) in the model
## view top feature contributions in prediction for each user
explain_bot(twtdat)[
order(screen_name,
-abs(value)), ][
feature %in% feature[1:5],
.SD, on = "feature" ][1:50, -1]
#> screen_name prob_bot feature value feature_description
#> 1: AOC 0.00082178 twt_srctw -4.074586 Tweet source of Twitter (official)
#> 2: AOC 0.00082178 twt_srcna -0.788900 Tweet source of unknown
#> 3: AOC 0.00082178 usr_fllws -0.539794 User followers
#> 4: AOC 0.00082178 twt_rtwts -0.453744 Tweet via retweets
#> 5: AOC 0.00082178 twt_quots -0.276252 Tweet via quotes
#> 6: AmeliaMN 0.00769005 twt_srctw -2.392487 Tweet source of Twitter (official)
#> 7: AmeliaMN 0.00769005 twt_srcna -0.716127 Tweet source of unknown
#> 8: AmeliaMN 0.00769005 twt_rtwts -0.461190 Tweet via retweets
#> 9: AmeliaMN 0.00769005 twt_quots -0.308175 Tweet via quotes
#> 10: AmeliaMN 0.00769005 usr_fllws 0.050839 User followers
#> 11: American__Voter 0.99923730 twt_srctw 2.053514 Tweet source of Twitter (official)
#> 12: American__Voter 0.99923730 twt_srcna 1.149764 Tweet source of unknown
#> 13: American__Voter 0.99923730 twt_rtwts 0.357076 Tweet via retweets
#> 14: American__Voter 0.99923730 usr_fllws 0.113606 User followers
#> 15: American__Voter 0.99923730 twt_quots 0.020683 Tweet via quotes
#> 16: CRANberriesFeed 0.99852484 twt_srctw 2.343053 Tweet source of Twitter (official)
#> 17: CRANberriesFeed 0.99852484 twt_srcna 1.026885 Tweet source of unknown
#> 18: CRANberriesFeed 0.99852484 twt_rtwts 0.340709 Tweet via retweets
#> 19: CRANberriesFeed 0.99852484 usr_fllws 0.081496 User followers
#> 20: CRANberriesFeed 0.99852484 twt_quots 0.009263 Tweet via quotes
#> 21: ChadPergram 0.00385066 twt_srctw -4.741660 Tweet source of Twitter (official)
#> 22: ChadPergram 0.00385066 twt_srcna -0.573186 Tweet source of unknown
#> 23: ChadPergram 0.00385066 twt_rtwts 0.470594 Tweet via retweets
#> 24: ChadPergram 0.00385066 usr_fllws -0.271190 User followers
#> 25: ChadPergram 0.00385066 twt_quots 0.016482 Tweet via quotes
#> 26: MagicRealismBot 0.99886143 twt_srctw 2.114994 Tweet source of Twitter (official)
#> 27: MagicRealismBot 0.99886143 twt_srcna 1.112244 Tweet source of unknown
#> 28: MagicRealismBot 0.99886143 usr_fllws -0.596811 User followers
#> 29: MagicRealismBot 0.99886143 twt_rtwts 0.321603 Tweet via retweets
#> 30: MagicRealismBot 0.99886143 twt_quots
