Evaluating The Performance of Heterogeneous and Homogeneous Ensemble-Based Models For Twitter Spam Classification

Innovative Computing Review (ICR)
Volume 2 Issue 2, Fall 2022

ISSN(P): 2791-0024 ISSN(E): 2791-0032
Homepage: https://journals.umt.edu.pk/index.php/ICR
Article QR
Evaluating the Performance of Heterogeneous and Homogeneous

Title: Ensemble-based Models for Twitter Spam Classification
Author (s): A. O. Ameen1, A. M. Oyelakin2, I. K. Ajiboye3, I. S. Olatinwo4, K. Y. Obiwusi5,

T. S. Ogundele2
Affiliation (s): 1
University of Ilorin, Ilorin, Nigeria
2
Al-Hikmah University, Ilorin, Nigeria
3
Abdulraheem College of Advanced Studies, Nigeria
4
Federal Polytechnic, Offa, Nigeria
5
Summit University, Offa, Nigeria
DOI: https://doi.org/10.32350.icr.22.01
History: Received: September 20, 2022, Revised: October 28, 2022, Accepted: December 5, 2022
Citation: A. O. Ameen, A. M. Oyelakin, I. K. Ajiboye, I. S. Olatinwo, K. Y. Obiwusi, and

T. S. Ogundele, “Evaluating the performance of heterogeneous and homogeneous
ensemble-based models for twitter spam classification,” UMT Artif. Intell. Rev.,
vol. 2, no. 2, pp. 01-16, 2022, doi: https://doi.org/10.32350.icr.22.01
Copyright: © The Authors

Licensing: This article is open access and is distributed under the terms of
Creative Commons Attribution 4.0 International License
Conflict of
Interest: Author(s) declared no conflict of interest
A publication of
School of Systems and Technology
University of Management and Technology, Lahore, Pakistan
Evaluating the Performance of Heterogeneous and
Homogeneous Ensemble-based Models for Twitter Spam
Classification
A. O. Ameen1, A. M. Oyelakin2 , I. K. Ajiboye 3, I. S. Olatinwo4,
K. Y.Obiwusi5, and T. S. Ogundele2
Department of Computer Science, University of Ilorin, Ilorin, Nigeria
1
2
Department of Computer Science, Al-Hikmah University, Ilorin, Nigeria
3
Computer Science Unit, Abdulraheem College of Advanced Studies
An Affiliate of Al-Hikmah Univerisity, Ilorin, Nigeria
4
Department of Computer Science, Federal Polytechnic, Offa, Nigeria
5
Department of Mathematics and Computer Science, Summit University,
Offa, Nigeria
Abstract – Spam based attacks are homogenous and heterogeneous
growing in various social networks. algorithms, behave in Twitter spam
Social network spam is a type of classification. ANOVA-F test was
unwanted content that appears used for selecting the most
on social networking sites, such as promising features in the dataset.
Facebook, Twitter, Instagram, and Then, homogeneous tree-based
others. This study used two Random Forest (RF) ensemble and a
categories of ensemble algorithms to heterogeneous ensemble vote
build Twitter spam classification classifier were employed for the
models. These algorithms worked by classification of Twitter spam. Tree-
combining the strengths of based algorithms were used to build
individual learning algorithms and a homogeneous twitter spam
then reporting their total detection model, while a
performances. In ensemble learning, combination of Support Vector
models are formed from data based Machine (SVM) and Decision Tree
on the assumption that combining (DT) algorithms was used for
the output of multiple models is building the heterogeneous model
better than using a single classifier. (using maximum voting classifier).
Hence, this study used a labeled The current study found that the
public dataset for machine learning- performance of the Twitter spam
based Twitter spam detection. detection models were promising. In
Several studies have investigated the all, the heterogeneous model
classification of Twitter spam from recorded better performance with
the available datasets. However, regards to accuracy, precision,
there is a paucity of works that recall, and F1-score than the model
investigated how machine learning- built with homogeneous base
based models, built with classifier.
*
Corresponding Author: moyelakin80@gmail.com
Innovative Computing Review
2
Ameen et al.
Index Terms- ensemble investigated how machine learning-

classification, predictive accuracy, based models, built with
social network, Twitter spam homogenous and heterogeneous
detection algorithms, behave in Twitter spam
I. Introduction classification. Homogenous
ensembles are ensembles of the
Twitter is a very popular social
same classifiers, while
networking platform in the internet
heterogeneous ensembles are built
space. It has suffered from several
from different base learners [5–7].
social spam attacks in recent years.
Spam based attacks on social sites This study used the
have been reported in literature in homogenous Random Forest (RF)
many ways [1]. Several studies ensemble as well as two
reported that these attacks are heterogenous ensemble algorithms
carried out through bulk (Decision Trees and Support
messages, profanity, insults, hate Vector Machine) based on
speech, malicious links, fraudulent maximum voting for the
reviews, fake friends, classification of Twitter spam.
and personally identifiable ANOVA-F test was used to handle
information [1]. For the detection the feature selection and ensemble
of some intrusions in the networks, approaches employed for the
Machine Learning (ML) techniques automatic classification of the
were found to be very powerful, as evidence. This work seeks to
compared to signature based extend the current authors’
approaches [2]. Social spam is a previous study in the area of
type of spam content that appears Twitter spam classification.
on social media sites, such as Generally, ensemble algorithms are
Facebook, Twitter, and others and of different types. This work
may include any website with user- focuses on investigating how two
generated content [3]. Generally, different categories of ensembles
ML algorithms learn from a large can correctly classify Twitter spam
set of existing data and make in a better way. In a heterogeneous
predictions about new data based ensemble-based model, two single
on their leanring [4]. learners are used for building the
ensemble. The algorithms used in
Several studies investigated the
the current heterogeneous ensemble
classification of Twitter spam from
were Support Vector Machine
the available datasets. However,
there is a paucity of works that
School of System and Technology
3
Evaluating the Performance of Heterogeneous...
(SVM) and Decision Tree (DT) graph-based features were also

algorithms. proposed. The authors built a
prototype to analyse the data set
Ensemble algorithms work by
combining the strengths of and evaluate the performance of the
detection system. Classic
individual learning algorithms and
evaluation metrics were used to
then reporting their total
compare the performance of
performances. The current study
various traditional classification
used a dataset that is publicly
methods. Experimental results
available. This work seeks to
showed that the Bayesian classifier
extend a study by the authors in [8]
had the best overall performance in
which advocated the use of two
term of F-measure. The results also
separate homogeneous ensembles
showed that the spam detection
for Twitter spam classification. In
system can achieve 89% precision.
the current study, the first ensemble
was built from DTs as base Furthermore, another study [10]
learners, while the second one was used four Machine Learning (ML)
built from SVM and DTs using techniques including Support
maximum voting approaches. A Vector Machine (SVM), Neural
Voting Classifier (VC) is an Network (NN), Random Forest
ensemble technique that combines (RF), and Gradient Boosting (GB)
the predictions of various models to build four different Twitter spam
which together predict an output detection models. The system
class based on their highest works by using a structure which
probability. DTs and SV M are all takes the client and tweet based
supervised learning algorithms used highlights together with the tweet
widely in various classification content to group the tweets. The
tasks. study reported that Neural Network
(NN) had a precision of 91.65%
II. Related Work
and outperformed the current
The authors in [9] proposed the arrangement by about 18%.
use of a directed social graph Another system that focused on
model for the detection of Twitter detecting spam more speedily
spam. The methodology involved through the creation of a large-
exploring the “follower” and scale annotated dataset for spam
“friend” relationships among users account detection on Twitter was
using a graph technique. Then, proposed by [11]. The authors
based on Twitter’s spam policy, argued that the system is more
novel content-based features and
4
Ameen et al.
effective as compared to the http://nsclab.org/nsclab/resources/.

existing approaches. The files in the larger dataset were
originally available in ARFF
The researchers in [12] built a
format. The first step in the
large dataset of over 600 million
methodology involved changing
public tweets. Then, they labelled
the files into CSV format. The
up to 6.5 million spam tweets and
feature set in the dataset is shown
extracted 12 lightweight features.
in Table I. Each line represents a
Moreover, they applied a ground
tweet from the collection. The
truth mechanism through the use of
dataset was grouped into four and
Trend Micros Web Reputation
twelve light weight statistical
Service as proposed by [13].
features were generated, as shown
Experiments were conducted using
in Table I. The last column in the
six ML algorithms under various
dataset is the tweet class (spammer
conditions. It was argued that the
or non-spammer). Exploratory data
approach is effective for Twitter
analysis revealed that the dataset is
spam detection.
binary in nature and it allows a
A similar study [14] carried out machine learning-based model to
a review of spam attacks on the classify Twitter tweets as spams or
social media platforms. It focused non-spams. As argued further by
on reporting the issues related to [11], two datasets were sampled for
social spam detection, as well as a continuous period of time, while
the directions that future researches the other two were randomly
can take. The study reported that sampled. Despite the fact that the
social media spam can be datasets contained a smaller feature
manifested in many ways, sample space, selecting the most
including bulk messages, promising features for building the
profanity, insults, hate speech, dataset is a good step. Feature
malicious links, fraudulent reviews, subset selection is a process where
fake friends, and personally the most promising features are
identifiable information [14]. automatically selected in the data
III. Methodology that contribute most to the target
variables. Thus, feature selection
A. Twitter Spam Dataset Source involves the process of selecting a
This study used a twitter spam subset of relevant features for use in
dataset developed by [11]. The machine learning-based model
dataset is publicly available at building.

5
Table I
Dataset Features and their Description
S/N Attribute Name Description of Attributes
1 account_age The age (days) of an account since its creation
until the time of sending the most recent tweet
2 no_follower The number of followers of this twitter user
3 no_following The number of followings/friends of this
twitter user
4 no_userfavourites The number of favourites this twitter user
received
5 no_lists The number of lists this twitter user added
6 no_tweets The number of tweets this twitter user sent
7 no_retweets The number of retweets
8 no_hashtag The number of hashtags included in this tweet
9 no_usermention The number of user mentions included in this
tweet
10 no_urls The number of URLs included in this tweet
11 no_char The number of characters in this tweet
12 no_digits The number of digits in this tweet
Table II
Sample Size and Featureset in the Datasets
No. of Binary Class
Derived Name for the Input Features
S/N Instances in the (Spam or Non-
Dataset in the Dataset
Dataset spam)
1 TweetContinous1(Dataset1) 10,000 12 YES
2 TweetRandom1(Dataset2) 10,000 12 YES
3 TweetContinous2
100,000 12 YES
(Dataset3)
4 TweetRandom2 (Dataset4) 100,000 12 YES
The number of features and the number and types of tweet

instances in each one of the patterns contained therein. The
datasets are also depicted in Table datasets were labeled as Dataset 1,
II. The values were obtained from Dataset 2, Dataset 3, and Dataset 4.
the exploratory data analysis The dataset files were converted
carried out. Twitter spam datasets from the ARFF format to the CSV
were grouped into four based on format. Firstly, exploratory data
2
Ameen et al.
analysis was carried out. The with the target output as

essence was to understand the categorical. No missing values
dataset patterns in a better way and were found in the datasets and all
be able to gain further insights existing values were used for
regarding how to use the available making decisions about how to
samples and features. The build spam detection models.
characteristics of the four groups of Minimal pre-processing was
the tweet datasets are summarised carried out using the encoding of
in Table II. the target class. This is because the
target class in a text in categorical
Exploratory data analysis also
format.
revealed that each dataset contains
numeric values as input features,
B. Visualisation of the Patterns in the Dataset
Fig. 1.Distributions chart in dataset 1
Fig. 2. Distributions chart in dataset 2

7
Figures 1-4 depict the one. Furthermore, the sample of

distribution of data in each dataset. data types in each dataset is shown
It is evident that data patterns differ below in Figure 5.
from the first dataset to the fourth
Fig. 5. Data distribution in dataset 1

8
Ameen et al.
It is evident from Figure 5 that D. Homogenous and

the input attributes are in a numeric Heterogeneous Twitter Spam
form, while the target class is Detection Models
categorical. Thus, the target class As argued by Benjamin et al.
has to be encoded as a pre- [3], machine learning-based social
processing step prior to building site spam detection can be binary
the Twitter spam model from each class based or multiclass based.
dataset.
The twitter spam detection models
C. Feature Selection Technique developed in this study are binary,
Used since the datasets are binary
(spammer, non-spammer) in nature.
The feature selection technique
The first model (homogenous
used in this study is ANOVA-F
test. The choice of algorithm is model) was built from the default
base learners of Random Forest
based on the suitability of the said
technique in view of the (RF) algorithm called Decision
Trees (DTs), while the second
availability of numerical input
model (heterogeenous model) was
variables and a classification of
target variable. The approach built using a combination of
identified nine features as most Suppoert Vector Machine (SVM)
relevant for Twitter spam and Logisitc Regression (LR). The
result in the second model is a
classification. These features were
consequence of majority voting.
settled for based on their ranking.
All the base algorithms used are
The authors in [15] emphasised the
supervised learning algorithms. RF
essence of feature selection and
is an ensemble of DTs that make
feature extraction in machine
use of the bagging technique [16–
learning-based studies. Despite the
18]. The algorithm creates DTs on
fact that the feature set in the
data samples for prediction and
selected dataset is not too large,
this study considered it important selects the best result through
voting. Given a set of Twitter
to select the most promising
tweets, the goal was to identify
features for building the Twitter
Twitter spam based on the patterns
spam detection models, so as to
captured by the ML algorithms
guide against the problem of using
all features in machine learning- from the datasets.
based model building.

9
Fig. 6. Methodological flow of activities in the two Twitter spam detection

models
Figure 6 is used to illustrate the E. Evaluation Metrics
different stages in the two machine The metrics used for evaluating
learning-based Twitter spam the RF-based model in this study
detection models. Python was used are accuracy, precision, recall, and
for the implementation of various F1-score. Brief explanation of each
stages in the models. The basic of the metrics is as follows:
stages in the machine learning-
based model building, as argued by i. Accuracy: The ratio of the
[19], were followed in model number of correctly classified
implementation. Since the problem cases to the total numbe of
at hand is of a binary type, the cases under evaluation.
target was to accurately classify the ii. Precision: The ability of a
Twitter spam evidence. The study classification model to return
used learning algorithms for only relevant instances.
automatically classifying the
labeled datasets into spam and non- iii. Recall: The ability of the
spam categories. The hyper classifier to capture all the
parameters of the model were tuned relevant instances.
each time until a better result was
achieved.
10
Ameen et al.
iv. F1-score: The weighted average IV. Results

of the recall and precision of
The results of the RF-based
the respective class. The values Twitter spam detection model were
of the metrics can be obtained
recorded and they occupied four
by using equations (1) to (4). decimal places, as shown in Table
(i) Accuracy = (TP+TN) III. Similarly, the results of
(TP+TN+FP+FN) (1) heterogeneous ensemble based on
the voting method are shown in
(ii) Precision = TP
Table IV. The study used a
(TP+FP) (2)
repeatable train-test split approach
(iii)Recall = TP
in all scenarios for evaluating the
(TP+FN) (3)
Twitter spam classification models.
(iv) F1-score = 2× (Precision X Recall)
(Precision + Recall)
(4)
Table III
Classification Results of the Homogenoeus RF-based Model
Model
S/N Learning Algorithm Metric
Performances
CASE 1 (5k continous-Dataset 1)
1 Heterogenoeus Random Forest Algorithm Accuracy 0.9736
Precision 0.9644
2 Heterogenoeus Random Forest Algorithm
Score
3 Heterogenoeus Random Forest Algorithm Recall 0.9731
4 Heterogenoeus Random Forest Algorithm F1-score 0.9675
CASE 2 (95k continous-Dataset 2)
5 Homogenoeus Random Forest Algorithm Accuracy 0.9737
Precision 0.9696
Score
CASE 3 (5k random-Dataset 3)
Precision 0.9426
Score

11
Model
Performances
CASE 4 (95k random-Dataset 4)
Precision 0.9676
Score
Table IV
Classification Results of the Heterogeneous Model Based on Maximum
Voting
Model
Performance
CASE 1 (5k continous-Dataset1)
1 Heterogenoeus Ensemble Algorithm Accuracy 0.9922
Precision
2 Heterogenoeus Ensemble Algorithm 0.9918
Score
3 Heterogenoeus Ensemble Algorithm Recall 0.9922
4 Heterogenoeus Ensemble Algorithm F1-score 0.9917
CASE 2 (95k continous-Dataset2)
Precision
Score
CASE 3 (5k random-Dataset3)
Precision
Score
CASE 4 (95k random-Dataset4)
Precision
Score

12
Ameen et al.
Model
Performance
V. Discussion the four selected metrics of
accuracy, precision, recall, and F1-
The study carried out
score. It was observed that the
exploratory data analysis which
feature selection and ensemble
provided useful information
classification methods used
regarding how to use the datasets to
contributed largely to the good
build Twitter spam detection
performance of the two models.
models. Having gained better
The results obtained for different
insights into four different datasets
cases remain promising for both
as released by [11] through
homogenoeus and heterogeneous
experimental analysis, two
ensembles. However, it was
ensemble learning algorithms were
observed that the model built with
used for building Twitter spam
heterogeneous machine-learning
detection models. Then, ANOVA-F
algorithms (Support Vector
technique was used for selecting
Machine and Decision Trees)
the most promising features. The
outperformed the ones built with
selected attributes were used to
homogeneous algorithms (Tree-
build the two models. The
based).
algorithms used to build these
models were based on A. Conclusion
homogeneous and heterogeneous A general introduction to the
approaches. The results obtained Twitter spam classification
from the two Twitter spam problem as well as the promises of
classification models are described using machine learning-based
in Table III and Table IV, techniques for the identification of
respectively. During Twitter spam attacks was made.
experimentation, the current study Data pre-processing and feature
used varying training and test-split
selection approaches were used to
ratios to achieve the validation of
feed the two ensemble algorithms
the models. Good results were with the data available in a good
recorded at the split ratios of 75:25
form. This study focused on
for training and testing sets,
building two different ensemble-
respectively. The performance of
based models in four different
the models was checked based on
cases using four groups of Twitter
13
spam datasets. The datasets used in vol. 186, Art. no. 115742, Dec.
this study were binary in nature. 2021, doi:
Several experiments were carried https://doi.org/10.1016/j.eswa.2
out which invovled using the same 021.115742
type of base classifiers (Decision [2] A. Pektaş and T. Acarman,
Tress) to build a RF-based model.
“Botnet detection based on
Furthermore, SVM-based and DT-
network flow summary and
based classifers were used to build deep learning,” Int. J. Netw.
a heterogeneous model. During the
Manag., vol. 28, no. 6, pp. 1–
experiments, varying random split 15., July 2018, doi:
ratios were used to achieve model https://doi.org/10.1002/nem.20
validation. Good results were 39
recorded at the split ratios of 75:25
for training and testing sets, [3] B. Markines, C. Cattuto, and F.
respectively. Good performance of Menczer, “Social spam
the models was judged based on the detection,” Proc. 5th Int.
four selected metrics of accuracy, Workshop Advers. Inform.
precision, recall, and F1-score. It Retrien. Web – AIRWeb, 2009.
was observed that the model built [4] S. Penchikala, “Big data
with heterogeneous machine processing with apache spark-
learning-based algorithms part 4,” Spark Mach. Lear.,
outperformed the ones built with 2016
homgeneous (Tree-based)
algorithms. [5] D. Opitz, and R. Maclin,
“Popular ensemble methods:
Acknowledgement An empirical study,” J. Artif.
Authors wish to akcnowldge the Intell. Res., vol. 11, pp. 169–
constructive comments of 198, 1999.
anonynous reviewers who made it [6] R. Polikar. (2006). Ensemble
possible for them to achieve the based systems in decision
improved manuscript. making, IEEE Circuits and
References Systems Magazine. 6 (3): 21–
45. doi:10.1109/MCAS.2006.16
[1] S. Rao, A. K. Verma, and T. 88199. S2CID 18032543.
Bhatia, “A review on social
spam detection: Challenges, [7] Rokach, L., “Ensemble-based
open issues, and future classifiers”. Artificial
directions,” Expert Sys. Appl.. Intelligence

14
Ameen et al.
Review2010. 33 (1–2): 1–39, spam detection,” IEEE Int.

2010, Conf. Commun. Info. Syst.
doi: https://doi.org/10.1007/s10 Secur. Symp. London, UK,
462-009-9124-7 2015, pp. 7065–7070, doi:
https://doi/org/10.1109/ICC.201
[8] R. G. Jimoh et al.,
5.7249453
“Experimental evaluation of
ensemble learning-based [13] J. Oliver, P. Pajares, C. Ke,
models for twitter spam C. Chen, and Y. Xiang, “An
classification,” 5th Information indepth analysis of abuse on
Technology for Education and twitter,” Trend Micro Res.
Development (ITED), 2022. Paper, 2014
[9] A. H. Wang, “Machine learning [14] S. Rao, A. K. Verma, and T.
for the detection of spam in Bhatia, “A review on social
twitter networks,” paper spam detection: Challenges,
th
presented at 7 International open issues, and future
Joint Conference, ICETE, directions,” Expert Syst. Appl.,
Athens, Greece, July 26–28, 2021, vol. 186, Art. no.
2010 . 115742, doi:
https://doi.org/10.1016/j.eswa.2
[10] D. Thilagavathy, A.
021.115742
Muthumanickam, S.
Naveenkumar, and A. U. [15] A. M. Oyelakin and R.G.
Kumar, “Spam detection in Jimoh, “A survey of feature
twitter using light weight extraction and feature selection
detectors,” Int. J. Sci. Res. techniques used in machine
Comput. Sci. Appl. Manag. learning-based botnet detection
Stud., vol. 8, no. 2, 2019. schemes,” VAWKUM Transac.
Comput. Sci., vol. 9, no. 1, pp.
[11] F. Concone, G. Lo Re, M.
1-7, 2021.
Morana, C. Ruocco, “Twitter
spam account detection by [16] L. Breiman, “Bagging
effective labeling,” in Proc. 3th predictors,” Mach. Learn., vol.
Italian Conf. Cyber. Secu., Pisa, 26, no. 2, pp. 123–140, 1996.
Italy, Feb. 13–15, 2019. https://doi.org/10.1007/BF0005
8655
[12] C. Chen, J. Zhang, X. Chen,
Y. Xiang, and W. Zhou, “6 [17] L. Breiman, “Random forests,”
Million spam tweets: A large Mach. Learn., vol. 45, no. 1,
ground truth for timely twitter pp. 5–32, Oct. 2001.
15
https://doi.org/10.1023/A:1010 [19] M. Swamynathan, Mastering

933404324 machine learning with Python
in six steps, A practical
[18] G. Brown, “Ensemble
implementation guide to
learning,” in Encyclopedia of
predictive data analytics using
Machine Learning. Springer.
Boston, MA, USA, 2010. Python. Apress, Berkeley, CA.

16

Evaluating The Performance of Heterogeneous and Homogeneous Ensemble-Based Models For Twitter Spam Classification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluating The Performance of Heterogeneous and Homogeneous Ensemble-Based Models For Twitter Spam Classification

Uploaded by

Copyright:

Available Formats

Innovative Computing Review (ICR)

Volume 2 Issue 2, Fall 2022

Evaluating the Performance of Heterogeneous and Homogeneous

Author (s): A. O. Ameen1, A. M. Oyelakin2, I. K. Ajiboye3, I. S. Olatinwo4, K. Y. Obiwusi5,

Citation: A. O. Ameen, A. M. Oyelakin, I. K. Ajiboye, I. S. Olatinwo, K. Y. Obiwusi, and

Copyright: © The Authors

Index Terms- ensemble investigated how machine learning-

(SVM) and Decision Tree (DT) graph-based features were also

effective as compared to the http://nsclab.org/nsclab/resources/.

School of System and Technology

The number of features and the number and types of tweet

analysis was carried out. The with the target output as

Fig. 1.Distributions chart in dataset 1

Fig. 2. Distributions chart in dataset 2

Fig. 3. Distributions chart in dataset 3

Fig. 4. Distributions chart in dataset 4

Figures 1-4 depict the one. Furthermore, the sample of

Fig. 5. Data distribution in dataset 1

Innovative Computing Review

It is evident from Figure 5 that D. Homogenous and

School of System and Technology

Fig. 6. Methodological flow of activities in the two Twitter spam detection

iv. F1-score: The weighted average IV. Results

School of System and Technology

Innovative Computing Review

Innovative Computing Review

Review2010. 33 (1–2): 1–39, spam detection,” IEEE Int.

https://doi.org/10.1023/A:1010 [19] M. Swamynathan, Mastering

Innovative Computing Review

You might also like