SkillAgentSearch skills...

MGTAB

A Multi-relational Graph-Based Twitter Account Detection Benchmark

Install / Use

/learn @GraphDetec/MGTAB
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

MGTAB

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Introduction

MGTAB is the first standardized graph-based benchmark for stance and bot detection. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. For more details, please refer to the MGTAB paper.

Distribution of labels in annotations.

<table> <thead> <tr> <td colspan=3 align="center"><b>Stance</b></td> <td colspan=3 align="center"><b>Bot</b></td> </tr> </thead> <tbody> <tr> <td colspan=1 align="center">Lable</td> <td colspan=1 align="center">Count</td> <td colspan=1 align="center">Percentage</td> <td colspan=1 align="center">Lable</td> <td colspan=1 align="center">Count</td> <td colspan=1 align="center">Percentage</td> </tr> <tr> <td colspan=1 align="center">neutral</td> <td colspan=1 align="center">3,776</td> <td colspan=1 align="center">37.02</td> <td colspan=1 align="center">human</td> <td colspan=1 align="center">7,451</td> <td colspan=1 align="center">73.06</td> </tr> <tr> <td colspan=1 align="center">against</td> <td colspan=1 align="center">3,637</td> <td colspan=1 align="center">35.66</td> <td colspan=1 align="center">bot</td> <td colspan=1 align="center">2,748</td> <td colspan=1 align="center">26.94</td> </tr> <tr> <td colspan=1 align="center">support</td> <td colspan=1 align="center">2,786</td> <td colspan=1 align="center">27.32</td> <td colspan=3 align="center"> </td> </tr> </tbody> </table> MGTAB contains 10,199 expert-annotated users, and 400,000 additional unlabelled users in MGTAB-large compared to MGTAB.

Multiple relations in the MGTAB.

Our proposed dataset has seven types of user relationships.

<table> <thead> <tr> <td colspan=8 align="center"><b>MGTAB</b></td> </tr> </thead> <tbody> <tr> <td colspan=1 align="center">Edge type</td> <td colspan=1 align="center">followers</td> <td colspan=1 align="center">friends</td> <td colspan=1 align="center">mention</td> <td colspan=1 align="center">reply</td> <td colspan=1 align="center">quoted</td> <td colspan=1 align="center">URL</td> <td colspan=1 align="center">hashtag</td> </tr> <tr> <td colspan=1 align="center">Numbers</td> <td colspan=1 align="center">308,120</td> <td colspan=1 align="center">412,575</td> <td colspan=1 align="center">114,516</td> <td colspan=1 align="center">223,466</td> <td colspan=1 align="center">77,631</td> <td colspan=1 align="center">263,800</td> <td colspan=1 align="center">300,000</td> </tr> <thead> <tr> <td colspan=8 align="center"><b>MGTAB-large</b></td> </tr> </thead> <tr> <td colspan=1 align="center">Edge type</td> <td colspan=1 align="center">followers</td> <td colspan=1 align="center">friends</td> <td colspan=1 align="center">mention</td> <td colspan=1 align="center">reply</td> <td colspan=1 align="center">quoted</td> <td colspan=1 align="center">URL</td> <td colspan=1 align="center">hashtag</td> </tr> <tr> <td colspan=1 align="center">Numbers</td> <td colspan=1 align="center">31,990,488</td> <td colspan=1 align="center">49,668,723</td> <td colspan=1 align="center">7,135,192</td> <td colspan=1 align="center">1,018,834</td> <td colspan=1 align="center">182,296</td> <td colspan=1 align="center">51,281</td> <td colspan=1 align="center">7,950,896</td> </tr> </tbody> </table>

Enviromment

python 3.7
scikit-learn 1.0.2
torch 1.8.1+cu111
torch_cluster-1.5.9
torch_scatter-2.0.6
torch_sparse-0.6.9
torch_spline_conv-1.2.1
torch-geometric 2.0.4
pytorch-lightning 1.5.0

Train Model

To start training process:

Train GNN models

python MGTAB-GNN.py  --task stance --model GCN --relation_select 0 1 --random_seed 0 1 2 3 4
python MGTAB-GNN.py  --task bot --model RGCN --relation_select 0 1 --random_seed 0 1 2 3 4

Train Machine Learning models

python MGTAB-ML.py  --task stance --models_list 1 2 3  --random_seed 0 1 2 3 4
python MGTAB-ML.py  --task bot --models_list 4 5 6 7  --random_seed 0 1 2 3 4

Train GNN models parallel using multi-gpu

python GNN_sample_large.py  --task bot --relation_select 0 1 2 3 4 4 6 --model RGT --GPU_num 4
python GNN_sample_large.py  --task bot --relation_select 0 1 2 3 4 --model SHGN --GPU_num 4
python GNN_sample_large.py  --task stance --relation_select 0 1 --model GCN --GPU_num 4
python GNN_sample_large.py  --task stance --relation_select 0 --model GAT --GPU_num 4

Baseline performance

Stance detection performance on MGTAB

| methods | type | accuracy | precision | recall | f1-score | | ------------------- | ---- | --------------------- | -------------------- | -------------------- | -------------------- | | AdaBoost | F | 74.59</br> ${1.41}$ | 74.60</br> ${1.35}$ | 74.02</br> ${1.61}$ | 73.88</br> ${1.47}$ | | Random Forest | F | 79.62</br> ${0.68}$ | 80.04</br> ${0.43}$ | 78.83</br> ${0.98}$ | 79.04</br> ${0.82}$ | | Decision Tree | F | 66.92</br> ${0.93}$ | 66.34</br> ${1.02}$ | 66.23</br> ${1.06}$ | 66.03</br> ${0.84}$ | | SVM | F | 81.23</br> ${0.66}$ | 81.40</br> ${0.71}$ | 80.86</br> ${1.09}$ | 80.71</br> ${0.78}$ | | KNN | F | 76.25</br> ${1.32}$ | 75.54</br> ${1.41}$ | 75.70</br> ${1.37}$ | 75.48</br> ${1.37}$ | | Logistic Regression | F | 79.51</br> ${1.00}$ | 79.33</br> ${0.98}$ | 78.83</br> ${1.17}$ | 78.98</br> ${1.11}$ | | GCN | G | 81.35</br> ${0.58}$ | 81.08</br> ${0.30}$ | 80.19</br> ${0.56}$ | 80.08</br> ${0.56}$ | | GrapgSAGE | G | 83.33</br> ${1.22}$ | 82.52</br> ${1.63}$ | 83.45</br> ${0.63}$ | 82.72</br> ${1.34}$ | | GAT | G | 82.19</br> ${1.23}$ | 81.72</br> ${1.19}$ | 81.68</br> ${1.16}$ | 81.04</br> ${1.24}$ | | HGT | G | 83.29</br> ${0.44}$ | 81.63</br> ${0.58}$ | 81.51</br> ${0.76}$ | 81.82</br> ${0.34}$ | | S-HGN | G | 85.32</br> ${0.53}$ | 83.93</br> ${0.67}$ | 83.65</br> ${0.65}$ | 84.42</br> ${0.43}$ | | BotRGCN | G | 84.71</br> ${1.43}$ | 83.43</br> ${1.23}$ | 84.08</br> ${0.94}$ | 84.30</br> ${1.44}$ | | RGT | G | 87.78</br> ${0.43}$ | 85.22</br> ${0.89}$ | 84.40</br> ${0.74}$ | 86.86</br> ${0.43}$ |

Bot detection performance on MGTAB

| methods | type | accuracy | precision | recall | f1-score | | ------------------- | ---- | -------------------- | -------------------- | -------------------- | -------------------- | | AdaBoost | F | 90.12</br> ${0.92}$ | 88.51</br> ${1.33}$ | 89.10</br> ${0.92}$ | 87.71</br> ${1.10}$ | | Random Forest | F | 89.52</br> ${0.44}$ | 88.92</br> ${0.49}$ | 86.72</br> ${1.15}$ | 86.83</br> ${0.53}$ | | Decision Tree | F | 87.13</br> ${0.51}$ | 83.81</br> ${0.72}$ | 83.39</br> ${1.06}$ | 83.70</br> ${0.74}$ | | SVM | F | 88.68</br> ${1.40}$ | 85.73</br> ${1.84}$ | 85.73</br> ${1.84}$ | 85.31</br> ${1.73}$ | | KNN | F | 85.78</br> ${0.84}$ | 82.28</br> ${1.22}$ | 80.49</br> ${0.64}$ | 81.28</br> ${0.66}$ | | Logistic Regression | F | 88.49</br> ${1.31}$ | 85.69</br> ${1.69}$ | 84.41</br> ${1.96}$ | 84.97</br> ${1.67}$ | | GCN | G | 85.81</br> ${1.32}$ | 77.40</br> ${2.12}$ | 84.37</br> ${1.73}$ | 78.33</br> ${1.67}$ | | GrapgSAGE | G | 88.71</br> ${1.24}$ | 85.33</br> ${1.83}$ | 86.15</br> ${2.55}$ | 85.44</br> ${1.08}$ | | GAT | G | 86.96</br> ${1.28}$ | 79.71</br> ${2.96}$ | 84.88</br> ${1.52}$ | 82.33</br> ${2.12}$ | | HGT | G | 90.28</br> ${0.29}$ | 85.35</br> ${0.33}$ | 85.97</br> ${0.41}$ | 87.52</br> ${0.37}$ | | S-HGN | G | 91.42</br> ${0.43}$ | 87.40</br> ${0.67}$ | 86.73</br> ${0.64}$ | 88.72</br> ${0.58}$ | | BotRGCN | G | 89.60</br> ${0.82}$ | 85.21</br> ${1.81}$ | 87.07</br> ${1.38}$ | 87.16</br> ${0.74}$ | | RGT | G | 92.12</br> ${0.37}$ | 88.08</br> ${0.43}$ | 86.64</br> ${0.25}$ | 90.41</br> ${0.47}$ |

Licensing

The MGTAB dataset uses the CC BY-NC-ND 4.0 license. Implemented code in the MGTAB evaluation framework uses the MIT license.

Datasets download

For SemEval-2016 T6, visit the SemEval2016 repository. For SemEval-2019 T7, visit the SemEval2019 github repository. For TwiBot-20, visit the TwiBot-20 github repository. For TwiBot-22, visit the TwiBot-22 github repository. For other bot detection datasets, please visit the Bot Repository.

MGTAB is available at Google Drive. MGTAB-large (contains 400,000 unlabeled us

View on GitHub
GitHub Stars67
CategoryDevelopment
Updated3d ago
Forks11

Languages

Python

Security Score

85/100

Audited on Mar 28, 2026

No findings