Welcome to Scribd!

Skip carousel

Tunning ML Models To Detect Twitter Bots

Uploaded by

stefano mozart

0% found this document useful (0 votes)

12 views14 pages

Article presented at the WCNP

Original Title

Tunning ML models to detect twitter bots

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Article presented at the WCNP

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

12 views14 pages

Tunning ML Models To Detect Twitter Bots

Uploaded by

stefano mozart

Article presented at the WCNP

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 14

Search inside document

Tuning machine learning models to

detect bots on Twitter

Stefano M P C Souza,
Department of Electrical Engineering, University of Brasília (UnB), Brasília, Brazil
stefanomozart@ieee.org

Tito B Rezende, José Nascimento, Levy G Chaves, Darlinne H P Soto, Soroor Salavati
Institute of Computing, University of Campinas (Unicamp), Campinas, Brazil
{t025327, j170862, l264958, d264955, s264967}@dac.unicamp.br

https://github.com/stefanomozart/twitter_bot_detection

1
Motivation
● Bots can be used to spread fake news, manipulate public
opinion, fake hashtag trends;

● However, not all bots are malicious.

2
Related work
● Related work can be split in three categories:

○ Account/Proﬁle based;

○ Tweet text based;

○ Topological.

● State-of-the art models are built with feature engineering

techniques speciﬁc for a single platform.

3
Pipeline

4
Dataset
Consists of a group of 3 publicly available datasets

Dataset #bots #humans

botwiki-2019 698 0

cresci-rtbust-2019 353 340

cresci-stock-2018 7,102 6,174

Total 8,153 6,514

5
EDA: t-SNE

- There is no clear linear plane that could split the groups;

- Hence, we can expect poor results on classiﬁers with linear

assumptions such as the logistic regressor.

6
Features
Selected Features: Engineered features:

● Statuses count ● Screen name length

● Followers count ● Screen name number of
● Friends count digits
● Favourites count ● Name length
● Listed count ● Name number of digits
● Default profile ● Description length
● Profile uses
background image
● verified

7
Feature scaling
● No feature scaling
● Standard score:

● Min-max:

8
Parameter Tuning
Search for the best hyper-parameters for each model.

9
Model Selection

● KNN
● Logistic Regression
● SVM
● Decision Tree
● Random Forest
● Bagging: Bootstrap Aggregating Tree Based Estimator
● XGBoost: Gradient Boosting Decision Tree

10
Results
● We present the best tuned model for each
classiﬁer
● XGBoost was the best amongst all classiﬁers

11
Conclusions
- Ensemble methods gave better results, probably because they
better handle the non-linearities in the data, agreeing with the
ﬁndings of EDA made with t-SNE;

- No signiﬁcant difference between XGboost, RF and Bagging;

- Feature normalization does not have a major effect on the

accuracy of tree based models;

- The proposed pipeline has been proved to be a simple way to

compare many ML models and tuning strategies. Pipeline
steps can be easily replaced or enhanced to evaluate and
compare other techniques.

12
Future work

- Include different datasets;

- Modify the pipeline to include NLP;

- Work with different classes of bots as other

modes of operation, such as real accounts with
few automated posts, are becoming more
popular;

- Include datasets from different platforms.

13
Thank
YOU!!!

Image credits: Unplash and morgueﬁle 14

Periodic Table Lab Answers
Document3 pages
Periodic Table Lab Answers
Idan Levy
No ratings yet
Not All Companies Are Listed Here.: Nutanix
Document6 pages
Not All Companies Are Listed Here.: Nutanix
sai k
No ratings yet
PDS - October 26, 1 - 32 AM
Document29 pages
PDS - October 26, 1 - 32 AM
ananya.shah2811
No ratings yet
Practical 1to10
Document32 pages
Practical 1to10
hetprajapati2004217
No ratings yet
Moizkhan Dataanalyst
Document2 pages
Moizkhan Dataanalyst
Sana Ali
No ratings yet
Xgboost Algorithm: Long May She Reign!: The New Queen of Machine Learning Algorithms Taking Over The World
Document8 pages
Xgboost Algorithm: Long May She Reign!: The New Queen of Machine Learning Algorithms Taking Over The World
pankaj_97
No ratings yet
Adolfo CAmacho Yague - GCP
Document8 pages
Adolfo CAmacho Yague - GCP
T Chandra sekhar
No ratings yet
First
Document35 pages
First
thesoulmatecreation
No ratings yet
Ip Project
Document16 pages
Ip Project
G S RAMAN NAIDU
No ratings yet
ACKNOWLEGMENT Ip
Document14 pages
ACKNOWLEGMENT Ip
sayantuf17
No ratings yet
Chapter 2: Technologies: What Is Yolov4?
Document6 pages
Chapter 2: Technologies: What Is Yolov4?
Đào Quỳnh Như
No ratings yet
CSE1902 PPT 20BCE1811
Document17 pages
CSE1902 PPT 20BCE1811
Madhumitha Rajagopal
No ratings yet
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
Document14 pages
House-Price-Prediction-Using-Regression-Techniques Retouch - Removed
krushnapalsinhvaghela76
No ratings yet
Ip Project 2ND Year
Document18 pages
Ip Project 2ND Year
abhi
No ratings yet
Tim Resume 2023
Document2 pages
Tim Resume 2023
Timoth Dev
No ratings yet
Aastha Mahajan Python File
Document17 pages
Aastha Mahajan Python File
aasthamahajan2003
No ratings yet
Rohan Rajput Resume
Document1 page
Rohan Rajput Resume
Prishita Kapoor
No ratings yet
Week 2.vikas Gupta
Document3 pages
Week 2.vikas Gupta
Vikas Gupta
No ratings yet
Technical Skills
Document5 pages
Technical Skills
Kaushal Kishore
No ratings yet
Resume Template
Document4 pages
Resume Template
Anjali Sharma
No ratings yet
Presentation - of - Project - Report - Vikas
Document19 pages
Presentation - of - Project - Report - Vikas
Vikash sharma
No ratings yet
Data Science
Document7 pages
Data Science
rana sami
No ratings yet
Algorithms and Data Structures
Document6 pages
Algorithms and Data Structures
belinda abigael
No ratings yet
Will Affective Computing Emerge From Foundation Models and General Ai? A First Evaluation On Chatgpt
Document9 pages
Will Affective Computing Emerge From Foundation Models and General Ai? A First Evaluation On Chatgpt
Celia Martinez
No ratings yet
Data Scientist Nanodegree Syllabus: Before You Start
Document5 pages
Data Scientist Nanodegree Syllabus: Before You Start
Aditya the Retro
No ratings yet
IJISRT23MAR956
Document8 pages
IJISRT23MAR956
Lekhana M
No ratings yet
XG Boost
Document22 pages
XG Boost
johnrofa
No ratings yet
Professional Synopsis:: Kunal Anarse
Document2 pages
Professional Synopsis:: Kunal Anarse
Ajitsingh Jagtap
No ratings yet
Image Re-Ranking Based On Topic Diversity
Document11 pages
Image Re-Ranking Based On Topic Diversity
Priya Dharshini
No ratings yet
Geetha Polaboina - Data Analyst - CV
Document4 pages
Geetha Polaboina - Data Analyst - CV
Krishna Keerthi
100% (1)
IMDB Scraping & Analysis
Document5 pages
IMDB Scraping & Analysis
varun goel
No ratings yet
Chapter - 2: Data Science & Python
Document17 pages
Chapter - 2: Data Science & Python
Mubaraka Kundawala
No ratings yet
Ganesh
Document28 pages
Ganesh
Srikanth Reddy
No ratings yet
Main PART PDF
Document46 pages
Main PART PDF
Ishan Patwal
No ratings yet
Machine Learning and Artificial Intelligence
Document19 pages
Machine Learning and Artificial Intelligence
Bijay Chandra Das TECH-08
No ratings yet
Unit2 PDS
Document17 pages
Unit2 PDS
Aryan bariya
No ratings yet
Hands On Machine Learning With Scikit Learn and Tensorflow
Document31 pages
Hands On Machine Learning With Scikit Learn and Tensorflow
sumasuthan
0% (1)
Technology Skills
Document6 pages
Technology Skills
Kaushal Kishore
No ratings yet
19 Data Science and Machine Learning Tools For People Who Don't Know Programming
Document8 pages
19 Data Science and Machine Learning Tools For People Who Don't Know Programming
Nikhitha Pai
No ratings yet
Data Science ML Full Stack 2022 GitHub
Document9 pages
Data Science ML Full Stack 2022 GitHub
DIVYESH SHRIVASTAV
No ratings yet
CHATGPT DALL.E 3: Complete Guide. Third Edition
From Everand
CHATGPT DALL.E 3: Complete Guide. Third Edition
Hesham Mohamed Elsherif
No ratings yet
OpenAssistant Roadmap
Document12 pages
OpenAssistant Roadmap
MERM 13
No ratings yet
Data Career Skills Checklist
Document19 pages
Data Career Skills Checklist
igor barbosa
No ratings yet
Grishma Jena Machine Learning 101 - Qcon SF 0
Document38 pages
Grishma Jena Machine Learning 101 - Qcon SF 0
Mudit Jain
No ratings yet
IRJET Price Prediction and Analysis of F
Document7 pages
IRJET Price Prediction and Analysis of F
anand.dhawale
No ratings yet
01 Phan Tich Dau Tu Nang Cao - CRISP Trong KHDL
Document37 pages
01 Phan Tich Dau Tu Nang Cao - CRISP Trong KHDL
TUYỀN TRỊNH BÍCH
No ratings yet
S Ghai: Python Developer / Data Analyst
Document5 pages
S Ghai: Python Developer / Data Analyst
Apurva Dharia
No ratings yet
ML - AI Roadmap
Document14 pages
ML - AI Roadmap
sanot31159
No ratings yet
Data Science Product Development Lecture 1
Document39 pages
Data Science Product Development Lecture 1
Daniyal Raza
No ratings yet
NguyenTriDan - AI Engineering Intern - CV
Document1 page
NguyenTriDan - AI Engineering Intern - CV
hieudinh13082002
No ratings yet
Dnyaneshwar Ds
Document2 pages
Dnyaneshwar Ds
akhil pathak
No ratings yet
Data Science
Document6 pages
Data Science
anthony
No ratings yet
Data Science Course Curriculum 27 Feb 2023
Document21 pages
Data Science Course Curriculum 27 Feb 2023
thugorigin
No ratings yet
Algorithmic Trading Stock Price Model
Document9 pages
Algorithmic Trading Stock Price Model
IJRASETPublications
100% (1)
Data Science Product Development Lecture 2
Document30 pages
Data Science Product Development Lecture 2
Daniyal Raza
No ratings yet
Toronto Data Online Curriculum
Document11 pages
Toronto Data Online Curriculum
Brume
No ratings yet
Rohit's Resume
Document1 page
Rohit's Resume
Saiprakash Bandi
No ratings yet
Data Analyst Syllabus
Document25 pages
Data Analyst Syllabus
sagar
No ratings yet
Topic Analysis Presentation
Document23 pages
Topic Analysis Presentation
Nader AlFakeeh
No ratings yet
Payroll System Ip
Document38 pages
Payroll System Ip
srivathsan0104
No ratings yet
ds2 3 Mapreduce
Document41 pages
ds2 3 Mapreduce
Kristóf Kássa
No ratings yet
Chrysler CDS System - Bulletin2
Document6 pages
Chrysler CDS System - Bulletin2
Martin Boiani
100% (1)
IoT Security Checklist Web 10 17 r1
Document39 pages
IoT Security Checklist Web 10 17 r1
Subin
No ratings yet
Data Mining in IoT
Document29 pages
Data Mining in IoT
Rohit Mukherjee
100% (1)
Shoshana Bulka Pragmatica
Document17 pages
Shoshana Bulka Pragmatica
Jessica Jones
No ratings yet
Passage Planning: Dr. Arwa Hussein
Document15 pages
Passage Planning: Dr. Arwa Hussein
Arwa Hussein
100% (3)
Mossbauer Spectros
Document7 pages
Mossbauer Spectros
cyrimathew
No ratings yet
NCERT Solutions For Class 10 Maths Chapter 5 Arithmetic Progression (Ex 5.1) Exercise 5.1
Document8 pages
NCERT Solutions For Class 10 Maths Chapter 5 Arithmetic Progression (Ex 5.1) Exercise 5.1
Akash Das
No ratings yet
Pizza Restaurant PowerPoint Templates
Document49 pages
Pizza Restaurant PowerPoint Templates
Aindrila Bera
No ratings yet
Evan Lagueux - H Argument Essay
Document7 pages
Evan Lagueux - H Argument Essay
api-692561087
No ratings yet
Gandhi and The Non-Cooperation Movement
Document6 pages
Gandhi and The Non-Cooperation Movement
Aliya Khan
No ratings yet
AVEVA LFM - Data Summary v2
Document6 pages
AVEVA LFM - Data Summary v2
Joshua Hobson
No ratings yet
Democracy or Aristocracy?: Yasir Masood
Document4 pages
Democracy or Aristocracy?: Yasir Masood
Ajmal Khan
No ratings yet
Manual
Document24 pages
Manual
Cristian Valencia
No ratings yet
RS-All Digital PET 2022 Flyer
Document25 pages
RS-All Digital PET 2022 Flyer
roman
No ratings yet
AIP 2020 FINAL June
Document5 pages
AIP 2020 FINAL June
VINA ARIETA
No ratings yet
Unilever Pakistan
Document26 pages
Unilever Pakistan
Elie Mints
100% (3)
A Method For Prediction of Gas/Gas Ejector Performance
Document6 pages
A Method For Prediction of Gas/Gas Ejector Performance
dhavalesh
No ratings yet
EDAG0007
Document5 pages
EDAG0007
krunal
No ratings yet
Debate Brochure PDF
Document2 pages
Debate Brochure PDF
Shehzada Farhaan
No ratings yet
S4 Computer Sciences Exercises PDF
Document2 pages
S4 Computer Sciences Exercises PDF
Henriette Desanges Uwayo
No ratings yet
ENSC1001 Unit Outline 2014
Document12 pages
ENSC1001 Unit Outline 2014
TheColonel999
No ratings yet
A Structural Modelo of Limital Experienci Un Tourism
Document15 pages
A Structural Modelo of Limital Experienci Un Tourism
cecorredor
No ratings yet
3-A Y 3-B Brenda Franco Díaz
Document4 pages
3-A Y 3-B Brenda Franco Díaz
BRENDA FRANCO DIAZ
No ratings yet
Request For Proposals/quotations
Document24 pages
Request For Proposals/quotations
Karl Anthony Rigoroso Margate
No ratings yet
Performance Task 2
Document3 pages
Performance Task 2
Edrose Wycoco
No ratings yet
Behavior Intervention Menu
Document56 pages
Behavior Intervention Menu
api-479527084
100% (4)
M.SC Food Nutrition
Document44 pages
M.SC Food Nutrition
asu reddy
No ratings yet
WCDMA Radio Access Overview
Document8 pages
WCDMA Radio Access Overview
DocMaster
No ratings yet
MC4 CoCU 6 - Welding Records and Report Documentation
Document8 pages
MC4 CoCU 6 - Welding Records and Report Documentation
nizam1372
100% (1)