Professional Documents
Culture Documents
Tito B Rezende, José Nascimento, Levy G Chaves, Darlinne H P Soto, Soroor Salavati
Institute of Computing, University of Campinas (Unicamp), Campinas, Brazil
{t025327, j170862, l264958, d264955, s264967}@dac.unicamp.br
https://github.com/stefanomozart/twitter_bot_detection
1
Motivation
● Bots can be used to spread fake news, manipulate public
opinion, fake hashtag trends;
2
Related work
● Related work can be split in three categories:
○ Account/Profile based;
○ Topological.
3
Pipeline
4
Dataset
Consists of a group of 3 publicly available datasets
botwiki-2019 698 0
5
EDA: t-SNE
6
Features
Selected Features: Engineered features:
7
Feature scaling
● No feature scaling
● Standard score:
● Min-max:
8
Parameter Tuning
Search for the best hyper-parameters for each model.
9
Model Selection
● KNN
● Logistic Regression
● SVM
● Decision Tree
● Random Forest
● Bagging: Bootstrap Aggregating Tree Based Estimator
● XGBoost: Gradient Boosting Decision Tree
10
Results
● We present the best tuned model for each
classifier
● XGBoost was the best amongst all classifiers
11
Conclusions
- Ensemble methods gave better results, probably because they
better handle the non-linearities in the data, agreeing with the
findings of EDA made with t-SNE;
12
Future work
13
Thank
YOU!!!