You are on page 1of 30

Journal of ICT, 22, No.

1 (January) 2023, pp: 1–30

JOURNAL OF INFORMATION AND


COMMUNICATION TECHNOLOGY
https://e-journal.uum.edu.my/index.php/jict

How to cite this article:


Okwonu, F. Z., Ahad, N. A., Hamid, H., Muda, N., & Sharipov, O. S. (2023).
Enhanced robust univariate classification methods for solving outliers and overfitting
problems. Journal of Information and Communication Technology, 22(1), 1-30.
https://doi.org/10.32890/jict2023.22.1.1

Enhanced Robust Univariate Classification Methods for


Solving Outliers and Overfitting Problems

Friday Zinzendoff Okwonu, 2Nor Aishah Ahad,


1
2Hashibah Hamid, 3Nora Muda &
4Olimjon Shukurovich Sharipov
1Department of Mathematics, Faculty of Science,

Delta State University, Nigeria


2School of Quantitative Sciences,

College of Arts and Sciences,


Universiti Utara Malaysia, Malaysia
3School of Mathematical Sciences,

Faculty of Science and Technology,


Universiti Kebangsaan Malaysia, Malaysia
4Department of Statistics,

School of Mathematics, National University of Uzbekistan named


after Mirzo Ulugbek, Uzbekistan

*okwonufz@delsu.edu.ng; aishah@uum.edu.my; hashibah@uum.


edu.my; noramuda@ukm.edu.my; osharipov@mail.ru
*Corresponding author

Received: 16/2/2022 Revised: 22/9/2022 Accepted: 30/10/2022 Published: 19/1/2023

ABSTRACT

The robustness of some classical univariate classifiers is hampered


if the data are contaminated. Overfitting is another hiccup when the
data sets are uncontaminated with a considerable sample size. The
performance of the classification models can be easily biased by
1
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

the outliers’ problems, of which the constructed model tends to be


overfitted. Previous studies often used the Bayes Classifier (BC) and
the Predictive Classifier (PC) to address two groups of univariate
classification problems. Unfortunately for substantial large sample
sizes and uncontaminated data, the BC method overfits when the
Optimal Probability of Exact Classification (OPEC) is used as an
evaluation benchmark. Meanwhile, for small sample sizes, the BC
and PC methods are extremely susceptible to outliers. To overcome
these two problems, we proposed two methods: the Smart Univariate
Classifier (SUC) and the hybrid classifier. The latter is a combination
of the SUC and the BC methods, known as the Smart Univariate
Bayes Classifier (SUBC). The performance of the new classification
methods was evaluated and compared with the conventional BC and
PC methods using the OPEC as a benchmark value. To validate the
performance of these classification methods, the Probability of Exact
Classification (PEC) was compared with the OPEC value. The results
showed that the proposed methods outperformed the conventional
BC and PC methods based on the real data sets applied. Numerical
results also revealed that the SUC method could solve the overfitting
problem. The results further indicated that the two proposed methods
were robust against outliers. Therefore, these new methods are
helpful when practitioners are confronted with overfitting and data
contamination problems.

Keywords: Bayes, Predictive, Outliers, Overfitting, Classification.

INTRODUCTION

In practice, we often have difficulty allocating or classifying an object


into one of two groups based on the object score. In many instances,
a comparison between the object score and the benchmark value is
made to assign an object to the correct group (Wald, 1944; Song et
al., 2020). Classification methods can be applied to many studies to
identify unique membership (Pang et al., 2019; Hamid et al., 2018;
Okwonu et al., 2012). For instance, El Abbassi et al. (2021) applied
a univariate classifier to nanoelectronics and spectroscopy to classify
relevant information from the data set. Classification can be applied
to determine ICT knowledge awareness (Dávideková et al., 2019).
Jimoh et al. (2022) applied classification methods to classify malaria
infection. The classification method can also be used to classify
students as first-class or second-class upper based on their final CGPA.
2
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

However, the classification benchmark value must be well defined


and compared before assigning the object to the respective groups
(Hamid et al., 2016). Thus, this benchmark formulation criterion is
crucial in classification tasks.

Classification methods often require groups and variables of interest


to be defined prior to model construction. In some cases, two or
more variables could be of interest to the researcher to obtain group
information. In contrast, other researchers may be interested in a single
variable in order to ascertain actual or predicted group membership.

The classical classification methods often show a minimum


misclassification rate if the data set follows a normal distribution
pattern. On the other hand, if this pattern is violated, the inference
breaks down as we might draw erroneous inferences and wrong
conclusions (Das & Imon, 2016). Therefore, normality should be taken
seriously, because if this assumption does not hold, it is impossible to
draw accurate and reliable conclusions (Field, 2009; Oztuna et al.,
2006). Departure from the normality for any samples demonstrated
that the Type I error rate is affected (Blanca et al., 2017; Cain, Zhang
& Yuan, 2017). The group variables of the study determine the
importance of the research inference. In this case, the dimension of
the variables may be categorised into univariate or multivariate. The
other aspect is the homoscedasticity of the variance when discussing
the robustness of these techniques. Besides multivariate classification,
univariate classification is essential and widely applied in different
fields of study.

In recent times, the paradigm of univariate classification focused on


different techniques that have been proposed using time series data
(Thet Zan & Yamana, 2016; Sun et al., 2014). For example, the Bayes
rule is a unique classifier based on group membership probabilities
(Taheri & Mammadov, 2013). The predictive classifier (Huberty &
Holmes, 1983) relies on the average between group means. These
techniques have been applied to classification problems in different
fields of studie (Chattopadhyay et al., 2012; Wei, 2018; Banik, 2019;
Barbini et al., 2013; Aborisade & Anwar, 2018; Bharadwaj & Shao,
2019; Trovato, 2016; Berry, 2004; Ma et al., 2011; Theodoridis &
Koutroumbas, 2009).

Conventionally, the univariate classification for two groups is


performed using two groups: students’ -test and power analysis. The
3
2019; Trovato, 2016; Berry, 2004; Ma et al., 2011; Theodoridis &
Koutroumbas, Journal
2009). of ICT, 22, No. 1 (January) 2023, pp: 1–30
Conventionally, the univariate classification for two groups is
performed
proceduresusing two
of this groups:
test students’
and analysis relyand
𝑡𝑡-test
often power
on the analysis.value,
probability The
procedures of this test and analysis often rely on the probability
power values, and sample size to make an inference. Donoho and Jin value,
power
(2008)values,
devisedandother
sample size to make
procedures an inference.
to classify Donohodata
dimensional and sets
Jin
(2008) devised other procedures to classify 𝑘𝑘 dimensional
without utilising the covariance matrix. Relying on this complicated data sets
without utilising the covariance matrix. Relying on this complicated
classification method, Huberty and Holmes (1983) proposed a
classification method, Huberty and Holmes (1983) proposed a
univariate variable classifier for two groups by comparing the
univariate variable classifier for two groups by comparing the
univariate variable’s value with the average of the two group means.
univariate variable’s value with the average of the two group means.
This technique was proposed as an alternative to power analysis.
This technique was proposed as an alternative to power analysis.
The Bayes
The Bayes Classifier
Classifier isis assumed
assumed to to be
be robust
robust because
because itit uses
uses the
the
posterior probability to assign an object to the actual
posterior probability to assign an object to the actual group. The group. The
robustnessofofthis
robustness thismethod
methodcan canbebe attributed
attributed to the
to the non-application
non-application of
any measure
of any of central
measure tendencies,
of central such as
tendencies, theas
such mean, which which
the mean, is easilyis
influenced by the by
easily influenced outliers. Intuitively,
the outliers. outliersoutliers
Intuitively, in oneingroup may
one group
increase the probability
may increase of that group
the probability of thatover the over
group other the
and other
may influence
and may
the
influence the correct classification of the object into its actual group.The
correct classification of the object into its actual group. The
evaluation
evaluationofofthis
this technique
technique is conventionally optimal if the the Probability
Probability
ofofExact
ExactClassification
Classification (PEC)
(PEC) isis measured by ∑2𝑖𝑖=1 𝑃𝑃𝑖𝑖 = 1, where 𝑃𝑃𝑖𝑖
isisthe
thegroup
groupmembership
membershipprobabilities.
probabilities.The Thepitfall
pitfall with
with evaluating
evaluating the
the
Bayes procedure using the Optimal Probability of Exact Classification
Bayes procedure using the Optimal Probability of Exact Classification
(OPEC)
(OPEC) isis overfitting,
overfitting, especially
especially whenwhen thethe data
data set
set satisfies
satisfies the
the
normality
normality assumption.
assumption. Outliers
Outliers often
often influence
influence the the univariate
univariate
predictive
predictiveand
andthe theBayes
Bayesclassifiers.
classifiers. These
These methods
methods areare sensitive
sensitive to
to
overfitting when the data sets are normally distributed.
overfitting when the data sets are normally distributed.
In this paper, we proposed two methods, namely the Smart Univariate
In this paper,
Classifier (SUC) weand
proposed two classifier,
the hybrid methods, namely
which wasthe aSmart Univariate
combination of
Classifier (SUC) and the hybrid classifier, which was a
the Smart Univariate Classifier (SUC) and the Bayes Classifier (BC).combination of
the Smart
This newly Univariate
constructedClassifier (SUC)was
hybrid method andcalled
the Bayes Classifier
the Smart (BC).
Univariate
This newly constructed hybrid method was called the
Bayes Classifier (SUBC). The two proposed methods were designed Smart Univariate
toBayes Classifier
address (SUBC). The
the weaknesses two conventional
of the proposed methods were designed
univariate methods
to address the weaknesses of the conventional univariate
concerning the sensitivity to outliers and overfitting problems. methods
The
concerning
SUC methodthe sensitivity
mimicked to outliers
Fisher’s linearand overfitting
classifier, problems.
while the SUBCThe
SUC method
method depended mimicked Fisher’s linear and
on data transformation classifier,
plug-inwhile
using the
the SUBC
Bayes
method depended
classifier. on two
First, these datamethods
transformation andoutliers
extracted plug-in by
using the Bayes
applying the
classifier. First, these two methods extracted outliers
F-weight to determine the inlier of each data set. Then, the secondby applying the
F-weight to determine the inlier of each data set. Then, the second
phase of the proposed methods transformed the outlier data points 4
into inlier data points before the classifier was applied to determine
group membership. Finally, the hybrid method applied the Bayes
rule to perform group classification on the transformed F-Weighted
data set. The main contribution of this paper is the robustness of

4
phase of the proposed methods transformed the outlier data points into
inlier data points before
Journal the22,classifier
of ICT, was applied
No. 1 (January) 2023, pp:to1–determine
30 group
membership. Finally, the hybrid method applied the Bayes rule to
perform group classification on the transformed F-Weighted data set.
the
Thetwo proposed
main methods
contribution to solve
of this papertheisoverfitting problem
the robustness and two
of the the
transformation
proposed methods of outliers
to solveto inliers. Furthermore,
the overfitting we investigated
problem and the
the classification
transformation performance
of outliers of the
to inliers. proposed we
Furthermore, methods and the
investigated
conventional
classification classification
performance methods of the using continuous
proposed methods and and
discrete
the
conventional classification methods using continuous and discrete
variables.
variables.
The next part of this paper discusses the Predictive Classification
The next followed
method, part of this
by paper
the Bayesdiscusses the Predictive
Classifier. The proposedClassification
Smart
method, followed by the Bayes Classifier. The proposed Smart
Univariate Classifier and Smart Univariate Bayes Classifier are
Univariate Classifier and Smart Univariate Bayes Classifier are
outlined in the next section, followed by data collection and
outlined in the next section, followed by data collection and
comparative performance analysis. The performance comparison
comparative performance analysis. The performance comparison
through the probability of exact classification (hit ratio) and analysis
through the probability of exact classification (hit ratio) and analysis is
is then presented, and finally the conclusion is conferred in the last
then presented, and finally the conclusion is conferred in the last
section.
section.

METHODOLOGY
METHODOLOGY
In
In this
this section,
section, we
we discuss
discuss the
the conventional
conventional methods,
methods, Predictive
Predictive
Classifier
Classifier (PC) and Bayes Classifier (BC), with the proposed proposed
(PC) and Bayes Classifier (BC), with the methods,
methods, Smart Univariate
Smart Univariate ClassifierClassifier
(SUC) and (SUC) and Univariate
Smart Smart Univariate
Bayes
Bayes Classifier
Classifier (SUBC).
(SUBC). The The different
different classifiers
classifiers arearebased
based on
on step
step
computational
computational procedures defined in
procedures defined in the
the following
following sections.
sections.

Predictive Classifier

We describe
We describe the
the predictive
predictive classifier
classifier according
according to
to Huberty
Huberty and
and Holmes
Holmes
(1983). This technique classifies an object 𝑋𝑋𝑖𝑖 to Group 1 (∆1 )
according to Equation 1.
(𝑋𝑋̅1 +𝑋𝑋̅2 )
𝑋𝑋𝑖𝑖 < 2 (1)
where 𝑋𝑋̅𝑖𝑖 is the mean vectors of Group ∆𝑖𝑖 as shown in Equation 2.
𝑛𝑛
𝑖𝑖 𝑥𝑥
∑𝑘𝑘=1
𝑋𝑋̅𝑖𝑖 =
𝑘𝑘
, 𝑖𝑖 = 1, 2 (2)
𝑛𝑛𝑖𝑖

Otherwise
Otherwiseclassify
classify𝑋𝑋𝑋𝑋𝑖𝑖𝑖𝑖 to
toGroup
Group22(∆ Equation
Equation111is
(∆22).).Equation isisvery
very
verysensitive
sensitivetoto
outliers
outliersas
asthe
as themean
the meanisisis
mean easily
easilyperturbed
perturbed
easily perturbed with
with aaslight
with slight change
change
a slight in
inthe
change the the5
indata
data
set.
dataThe
set. The basic idea
basicbasic
set. The of
ideaidea of Equation
Equation
of Equation11 is
1isisvariable
variableswap,
variable swap,in
swap, in which
which the
which the
independent
independentvariable
variableisis swapped
swapped with
withthe
with thedependent
dependentvariable
variableand
andvice
vice
versa.
versa.This
This procedure
procedure mimics mimics thethepoint
the pointbiserial
point biserialcorrelation
correlationcoefficient
coefficient
in
inwhich
whichaavariable
variableswap
swapisisapplicable
applicableas asshown
shownin inEquation
Equation3. 3.

𝑟𝑟𝑟𝑟𝑝𝑝,𝑏𝑏
(𝑥𝑥̅ −𝑥𝑥̅11))√√𝑝𝑝×𝑞𝑞
(𝑥𝑥̅22−𝑥𝑥̅ 𝑝𝑝×𝑞𝑞
(3)
(3) 5
𝑝𝑝,𝑏𝑏 =
=
√𝑠𝑠𝑧𝑧2𝑧𝑧2
√𝑠𝑠
𝑛𝑛𝑛𝑛11 𝑛𝑛𝑛𝑛22
where
where𝑛𝑛𝑛𝑛 = +𝑛𝑛𝑛𝑛22,,𝑝𝑝𝑝𝑝 =
= 𝑛𝑛𝑛𝑛11+ = and
and 𝑞𝑞𝑞𝑞 =
= are
arethe
thegroup
groupprobabilities
probabilitiesof
of
set. outliers
The as theideamean is easily perturbed with aswap, slight in change in the
the data
outliers
outliers
set.set. asbasic
as the mean
thebasicmean isof Equation
ofeasily perturbed
perturbed 1 iswith
1 with
variable aa slight
slight change
change in in
which
inthethedata data
set.The
The
independent basic
The idea
basic
variable
ideaidea
is
of
swappedofEquation
Equation Equation with 1 is the 1is isvariable
variable variable
dependent swap, swap,
swap,
variable which
inwhich
and whichthethethe
vice
set.
set. The
The basic
independent
independent basic idea
variable
variable of Equation
is
ismimics
swapped swapped with 11with isthe
is variable
variable
the dependent
dependent swap,
swap, inin which
variable
variable which
andand the
the
vice vice
versa. independent
This procedure variable is swapped
the point with biserialthe dependent
correlation variable
coefficient and vice
independent
independent
versa.versa.This This variable
variable
procedure
Journal
procedure ofis swapped
ICT,
mimicsmimics
22, No.
the with
with
the
1 point point
(January) the
thebiserial dependent
dependent
biserial
2023, pp: variable
variable

correlation
1
correlation 30 and
and vice
vice
coefficient
coefficient
in versa.
which This
a variable procedure
swap mimics
is applicable the point biserial correlation coefficient
versa.
versa. in in
in which This
This
which procedure
procedure
a variable
a variable swapmimics
swap is the
is applicable applicablepointas
point as
showncorrelation
biserial
biserial
as
shown shown
in Equation
correlation
in inEquation
in Equation
3. 
coefficient
coefficient
3. 3. 3.  
which a variable swap is applicable as shown Equation
in
in which
which aa variable
variable swap is applicable applicable (𝑥𝑥̅2 −𝑥𝑥̅1 )as as
√ shown
shown
𝑝𝑝×𝑞𝑞 in
in Equation
Equation 3.3. 
𝑟𝑟𝑝𝑝,𝑏𝑏 = (𝑥𝑥̅2(𝑥𝑥̅ −𝑥𝑥̅1𝑝𝑝×𝑞𝑞
1 )22√−𝑥𝑥̅
−𝑥𝑥̅2(𝑥𝑥̅ )√𝑝𝑝×𝑞𝑞 (3)
𝑟𝑟𝑝𝑝,𝑏𝑏𝑟𝑟𝑝𝑝,𝑏𝑏
=𝑟𝑟𝑝𝑝,𝑏𝑏= (𝑥𝑥̅2=
(𝑥𝑥̅ −𝑥𝑥̅ √𝑠𝑠
−𝑥𝑥̅ )
11 √)𝑧𝑧2
√√𝑠𝑠𝑝𝑝×𝑞𝑞
𝑝𝑝×𝑞𝑞
2
1 )√𝑝𝑝×𝑞𝑞 (3)(3)(3)
𝑟𝑟𝑝𝑝,𝑏𝑏
𝑛𝑛
= 2 √𝑠𝑠
𝑧𝑧 √𝑠𝑠𝑧𝑧
22
𝑧𝑧 2
(3)
(3) (3)
√𝑠𝑠𝑛𝑛
√𝑠𝑠
where 𝑛𝑛 = 𝑛𝑛1 + 𝑛𝑛2 , 𝑝𝑝 = 𝑛𝑛𝑛𝑛11 and 𝑛𝑛1𝑛𝑛 𝑞𝑞 = 𝑛𝑛 𝑧𝑧𝑧𝑧2 𝑛𝑛
are the group probabilities of
where where 𝑛𝑛 =𝑛𝑛 𝑛𝑛𝑛𝑛=1= 𝑛𝑛+1𝑛𝑛𝑛𝑛+2+ 𝑛𝑛2𝑛𝑛,=𝑝𝑝, 𝑛𝑛𝑝𝑝
= and and 𝑞𝑞 𝑛𝑛𝑛𝑛𝑞𝑞
= are𝑛𝑛are
2
2thethe group probabilities of ofof
1 , 𝑝𝑝 group probabilities
where 1 𝑞𝑞 = 𝑛𝑛2 =
2 𝑛𝑛11 2= 𝑛𝑛 𝑛𝑛 and 22 𝑛𝑛 are the group probabilities
Group
where 𝑛𝑛𝑛𝑛
where Group
1= and
= 𝑛𝑛𝑛𝑛11Group
1 and 𝑛𝑛22,2,𝑝𝑝 and
+Group = and
2,
𝑠𝑠𝑧𝑧2andis the
𝑠𝑠is𝑧𝑧2𝑠𝑠the
is
𝑞𝑞 =
2 the
pooled
= pooled 𝑛𝑛
arecovariance
are 𝑛𝑛the
the groupof
group
covariance
Groups
probabilities
probabilities
of Groups
1 andof
1of and
Group
2. To 1 and
Group
avoid Group
1having
and 2,negative
Group
a and 2,𝑛𝑛𝑛𝑛𝑠𝑠and
2𝑧𝑧 𝑧𝑧 ispooled
value the
of
𝑛𝑛𝑛𝑛 pooled
the covariance covariance
correlation ofcoefficient,
Groups
of Groups1 andthe 1 and
Group
Group
2. 2.
To To11
avoid and
and
avoid Group
Group
having having 2,
a and
a 𝑠𝑠
negative
negative
2 is
𝑧𝑧 valuevaluethe value pooled
pooled
of of
the covariance
covariance
the correlation
correlation of
of Groups
Groups
coefficient,
coefficient, 1 1 and
and
the thethe
mean 2.
of To the avoid having
firstfirst
group a negative
is always
𝑧𝑧
greater of
than the correlation
thethe mean of of coefficient,
thethe second
2.
2. To
To
mean
meanmean avoid
avoid of
of theoffirst having
having
the a negative
group is value
value
always ofof
greater the
the correlation
correlation
than coefficient,
meancoefficient, the
the
second
group forthe the group
efficiency firstand is always
group always greater
is classification greater than than the mean the In of theofsecond
mean the 𝑛𝑛 second
mean
mean
group groupof
offor the
for first
first group
efficiency
efficiency and isbetter
and always
better better greater
greater
classification
classification than
than purposes.
the
the mean
mean
purposes.
purposes. Inofofthis
Inthe
thisthe
thiscase,
second
second
case,case, 𝑛𝑛 1𝑛𝑛
refers group
to the forsmaller
efficiency group and and better 𝑛𝑛 classification
refers to the purposes.
larger In this The
group. case, 1 1𝑛𝑛1
group
group for
for
refers efficiency
to efficiency
to
thetothe and
and better
better classification
classification purposes.
purposes. In this
Inlarger
this case,
case, 𝑛𝑛
thesmaller group and 𝑛𝑛2𝑛𝑛 refers to tothe larger group. 𝑛𝑛1The
2
refers refers smaller smallergroup and 𝑛𝑛 2 refers to the larger group. The
concept
refers
refers toof
concept
to thevariable
theof smaller
variable
smaller swap group
swap
group cangroupcanbe
and
and be
and
reversed refers
𝑛𝑛𝑛𝑛reversedrefers
2byreferschanging
byto
to the
changing
the
the
thethe
larger
larger group.
group.
group.
classification The
classificationThe
1 The
concept
“inequality”. of
concept Basedvariable
of variable swap
on on can
swap
this be reversed
can be2 reversed
statement,
2
Equation by changing
by1changing
can the
be be classification
the classification
written as thethe
concept
concept
“inequality”. of variable
“inequality”.
of variable
Based Based swap
swap
on this can
can this be
statement, reversed
bestatement,
reversed Equationby
by changing
Equation
changing 1 can1 canthe
the
be classification
written
classification
written as astheas the
following “inequality”.
Equation Based
4, on
which this statement,
implies that Equation
the 1
conventionalcan be written
variable
“inequality”.
following
“inequality”.
following Based
Equation
Based
Equation on
on this
4,
4, this
which statement,
which
statement, implies implies Equation
thatthat
Equation the 11conventional
the can be
bewritten
conventional
can written as asthe
variable
variable the
position following
remains. Equation 4, which implies that the conventional variable
following
position
following
position Equation
remains.
Equation
remains. 4,
4, which
which implies
implies that
that the
the conventional
conventional variable
variable
position remains.
position remains.
position remains. (𝑋𝑋̅1 +𝑋𝑋 ̅ )
𝑋𝑋𝑖𝑖 𝑋𝑋 >> (𝑋𝑋̅1(𝑋𝑋 ̅̅12+𝑋𝑋 ̅2 )̅ (4)
+𝑋𝑋 (𝑋𝑋̅) +𝑋𝑋
(4)(4)(4) (4)
)
𝑋𝑋𝑖𝑖 >𝑖𝑖𝑋𝑋(𝑋𝑋 ̅ > 2 ̅ 22)1 2
+𝑋𝑋
𝑖𝑖 ̅1 +𝑋𝑋 ̅
𝑋𝑋𝑖𝑖 >
𝑋𝑋
(𝑋𝑋
> 1𝑋𝑋22 is
2 2)
2 2 (4)
(4) assign
Equation
Equation 4 implies
4 implies thatthat an 𝑖𝑖object
an object 𝑖𝑖 𝑋𝑋𝑖𝑖assigned
is assigned to ∆ to 1 ,∆otherwise,
, otherwise, assign
Equation Equation 4 implies 4 implies that an that object
an object 𝑋𝑋𝑖𝑖 is 𝑋𝑋 assigned is assigned to ∆1to , otherwise,
1∆1 , otherwise, assign assign
𝑋𝑋
Equationto ∆to .∆
4 Unfortunately,
implies
. that
Unfortunately, an Equation
object Equation 𝑋𝑋 is 1assigned
and𝑖𝑖
1 and Equationto ∆
Equation , 4 give
otherwise,
4 give theassign
the same
𝑖𝑖
Equation𝑋𝑋 2
𝑋𝑋𝑖𝑖 to𝑖𝑖𝑋𝑋∆𝑖𝑖 2to.4 Unfortunately,
implies
2 that an object
∆2 . Unfortunately, Equation 𝑋𝑋𝑖𝑖 is1 assigned
Equation𝑖𝑖 and1 Equation andtoEquation∆1 , otherwise,
1 4 give 4 give thesame
the assign
same same
classification result.
to ∆∆2.. Unfortunately,
Unfortunately, TheThe biserial
Equation point 1point and correlation
Equation that44 corresponds
give the same to to
𝑋𝑋𝑋𝑋
𝑖𝑖
classification
𝑖𝑖 to
classification
classification result.
result.
2 4 is given in Equation 5. The
result. biserial
The biserial
Equation biserial point 1 and
point correlation
Equation
correlation correlation that that givecorresponds
corresponds
that the
corresponds same to to
Equation
classification result. The biserial
Equation
Equation
classification
Equation 4 is4result.
is
given
4 isgiven in
The
given in in Equation
Equation
biserial
Equation 5.point
point 5. 5.correlation that corresponds to
correlation that corresponds to
Equation 4 is given in Equation 5. −𝑥𝑥̅ ) 𝑝𝑝×𝑞𝑞
Equation 4 is given in Equation(𝑥𝑥̅ 5.1 (𝑥𝑥̅ √2 )√𝑝𝑝×𝑞𝑞
𝑟𝑟𝑝𝑝,𝑏𝑏𝑟𝑟𝑝𝑝,𝑏𝑏
= (𝑥𝑥̅ 2−𝑥𝑥̅
(5)
2 )21√−𝑥𝑥̅
(5)(5)(5) (5)
=1= −𝑥𝑥̅1(𝑥𝑥̅ 𝑝𝑝×𝑞𝑞2 )√𝑝𝑝×𝑞𝑞
𝑟𝑟𝑝𝑝,𝑏𝑏 = 𝑟𝑟𝑝𝑝,𝑏𝑏(𝑥𝑥̅ 1 −𝑥𝑥̅√𝑠𝑠 2 )𝑧𝑧2
√√𝑠𝑠 𝑝𝑝×𝑞𝑞
2
𝑟𝑟𝑝𝑝,𝑏𝑏 = (𝑥𝑥̅1 −𝑥𝑥̅√𝑠𝑠 ) √ 𝑧𝑧 2
√𝑠𝑠
𝑝𝑝×𝑞𝑞 (5)
𝑟𝑟𝑝𝑝,𝑏𝑏 = 2 𝑧𝑧
√𝑠𝑠𝑧𝑧22
𝑧𝑧
(5)
Taking Taking thethe
the absolute
absolute
absolute value
value value of Equation
of Equation
of Equation √𝑠𝑠5 𝑧𝑧5 yields
yields
5 yields the corresponding
the corresponding value
Taking Takingthe absolute
the absolute valuevalue of Equation of Equation 5 yields 5 yieldsthethe corresponding
corresponding
the corresponding valuevalue value
of
Taking Equation
of 3.
the absolute
Equation At
At 3. this
this
At point,
point,
value
this of we
point, we
Equation wehave
have have addressed
addressed
5 yields addressed what we
the corresponding
what may
we may consider
value
consider
Taking ofthe
of Equation Equation 3. At3.
absolute Atpoint,
this
value thisofpoint, we have
Equation we have addressed
5 yieldsaddressed what
the what
we may
corresponding we may consider consider
value
the weakness
weakness
of Equation 3.ofof
Atvariable
variable
this point,interchange
we have in
interchange Equation
in
addressed 1. The
what 1.1.
we remaining
The
may remaining
consider part
ofthethe theweakness
weakness weakness of
of variable variable
of variable interchange
interchange interchange in in in
EquationEquation
Equation 1. The The
1. The remaining
remaining remaining part partpart
theEquation
of
part this
of of
thispaper
weakness
this 3. ofAtvariable
considers
paper
paper
this point,
considers
considers other
interchangewe
other other haveunivariate
univariate
univariate in addressed classifiers,
Equation what
classifiers,1. The
classifiers, we
such may
assuch
remaining
such the
as
consider
Bayes
as
the part
the
Bayes
of
the of
this
weakness this
paper paper
considers
of considers
variable other
interchange other
univariate univariate
in Equation classifiers,
classifiers, such
1.methods,
The such
as
remainingthe as the
Bayespart Bayes
classifier
of this
Bayes paper andand
classifier
classifier the
considers
and proposed
the the other
proposed
proposed univariate
univariate univariate
univariate classifier
classifiers, classifier
classifier such as the
methods,
methods, SUC Bayes
SUC SUCand
of thisclassifier
classifier paper and
andconsiders
the the
proposed proposed
other univariate
univariate univariate classifier classifier
classifiers, methods,
such methods,
as SUC
the andand
SUC
Bayes and
classifier
SUBC.
and SUBC.
SUBC. and the proposed univariate classifier methods, SUC and
SUBC.
classifier SUBC.
SUBC. and the proposed univariate classifier methods, SUC and
SUBC.
Bayes Bayes Classifier
Bayes Classifier
Classifier
Bayes Classifier
Bayes Classifier
Given
Bayes Given thatthat
Classifier𝑔𝑔1 (𝑥𝑥) and and𝑔𝑔and (𝑥𝑥) are the
are theprobability
probability density
density functions
functions and and
Given Given
that 𝑔𝑔that 𝑔𝑔1𝑔𝑔(𝑥𝑥)
(𝑥𝑥)
1(𝑥𝑥) (𝑥𝑥)
1and
𝑔𝑔2𝑔𝑔(𝑥𝑥)
𝑔𝑔22 (𝑥𝑥) 2are(𝑥𝑥) theare the
probability probability density density
functions functions andand
Given
𝑋𝑋 is thata 𝑔𝑔
random and 𝑔𝑔
variable (𝑥𝑥) forare the the twoprobability groups density
denoted functions
as ∆ and and
∆ , 2,
𝑋𝑋𝑑𝑑×1
Given
𝑋𝑋𝑑𝑑×1
𝑑𝑑×1𝑋𝑋isis
that
𝑑𝑑×1 aisrandom
a1random
is
𝑔𝑔 a random
(𝑥𝑥)
variable
variable
and 𝑔𝑔
2
variable
(𝑥𝑥) for are
forthe for
the
thetwothe two
probability two
groups groups
groups denoted
denoted
densitydenotedas as
functions 1∆and
∆11as∆and 1∆ and
and
2∆
2,, ∆2 ,
𝑋𝑋
any a random variable for
𝑑𝑑×1value of 𝑋𝑋𝑑𝑑×1 can be assigned to any of the groups.
1 2 the two groups denoted as ∆ and
1 In the ∆ 2
𝑑𝑑×1 is a random
𝑋𝑋following, we assume variable thatfor thethe priori two probabilities
groups denoted andasthe ∆1 cost and ∆ of2 ,
inaccurate classification for the two groups are equal. Let 𝐶𝐶𝑖𝑖 = 66 6 6
6
{𝑥𝑥𝑖𝑖,𝑑𝑑×1 , 𝑑𝑑 = 1} be the sample spaces of the univariate random variables.6
Suppose ∇1 is the values of 𝑥𝑥1,𝑑𝑑×1 to be classified into Group ∆1 , and
∇2 denotes the values of 𝑥𝑥2,𝑑𝑑×1 to be classified into Group ∆2 . The
aim is that each random value or object must only be classified into
one group (Johnson & Wichern, 1992). This univariate approach is
based on the conditional probability concept. Let us describe the
process of classifying the object as Equation 6 and Equation 7.
6 𝑃𝑃(1|2) = 𝑃𝑃(𝑥𝑥1,𝑑𝑑×1 𝜖𝜖∇1 |∆2 ) = ∫∇ 𝑔𝑔2 (𝑥𝑥)𝑑𝑑𝑑𝑑 (6)
1

𝑃𝑃(2|1) = 𝑃𝑃(𝑥𝑥2,𝑑𝑑×1 𝜖𝜖∇2 |∆1 ) = ∫∇ 𝑔𝑔1 (𝑥𝑥)𝑑𝑑𝑑𝑑 (7)


2
∇ denotes
∇222thatdenotes the
the values
values of 𝑥𝑥
of or
𝑥𝑥2,𝑑𝑑×1 to be
be classified
to mustclassified into
into Group
Groupinto∆2 ... The
∆ The
is∇
aim aim
aim isdenotes
is
each
that
aim
thateachthe
random
each
values
random
random
is that value
each
random
of 𝑥𝑥
value
value
random object
or
2,𝑑𝑑×1
2,𝑑𝑑×1
value or
or
to be
object
object
value
object
classified
only
must
must
or only
object
must only
into
be must
only be
be
Group
classified
be classified
classified
only
classified
The
be∆22classified
into
into
into into
aim is
aim
one one
group is(Johnson
that(Johnson
that each&random
each random
Wichern,
& value
value
Wichern, or1992).
or
1992). object
object must
must
ThisThis only be
only
univariate be classified
classified
approach
univariate approach into
is into is
one group
groupone(Johnson
group
(Johnson && Wichern,
(Johnson
Wichern, & 1992).
Wichern,
1992). This univariate
1992).
This This
univariate approach
univariate
approach is
approach
is is
based one
one group
on group
the (Johnson
(Johnson
conditional
the &
& Wichern,
Wichern,
probability
conditional probability 1992).
1992).
concept.This
This
concept. univariate
Let1univariate
us
30 describe
Let
– us approach
approach
the
describe the is
is
based
based on
onbased
Journal
the conditional
on the
of ICT, 22, No.probability
conditional
1 (January) concept.
probability
2023, pp: Let
concept.us describe
Let us the
describe the
based
based on
on the conditional
the conditional
conditional probability
probability
probability concept.
concept.
concept. Let
Let
Let us
us7.describe
describe the
the
process
process
processof classifying
process of
of
of
theclassifying
classifying
process of
classifying
classifying
object
the
the
the
as
object
object
object
Equation
as
the
as
asobject6as
Equation
Equationand
Equation
Equation
666and
Equation
and
and 6usand
Equation
Equation
Equation
describe
7.7.
Equation
7. the 7.
process of classifying the object as Equation
process of classifying the object as Equation 6 and Equation 7. 6 and Equation 7.
𝑃𝑃(1|2)
𝑃𝑃(1|2) =
= 𝑃𝑃(𝑥𝑥
𝑃𝑃(1|2)
𝑃𝑃(1|2) =
𝑃𝑃(𝑥𝑥 𝑃𝑃(𝑥𝑥 𝜖𝜖∇
𝑃𝑃(1|2)
==1,𝑑𝑑×1
𝑃𝑃(𝑥𝑥 |∆
𝜖𝜖∇11=𝜖𝜖∇
|∆ ) |∆
=
21)1|∆
𝑃𝑃(𝑥𝑥
𝜖𝜖∇ = 22∫ 𝑔𝑔∫2∇∫(𝑥𝑥)𝑑𝑑𝑑𝑑
𝑔𝑔
)∫)∇1==𝜖𝜖∇ 𝑔𝑔𝑔𝑔222)(𝑥𝑥)𝑑𝑑𝑑𝑑
1(𝑥𝑥)𝑑𝑑𝑑𝑑
|∆ = ∫∇ 𝑔𝑔2 (𝑥𝑥)𝑑𝑑𝑑𝑑
(𝑥𝑥)𝑑𝑑𝑑𝑑 (6)
=2∫
2
𝑔𝑔22(𝑥𝑥)𝑑𝑑𝑑𝑑
𝑃𝑃(1|2) 1,𝑑𝑑×1
𝑃𝑃(1|2) =
𝑃𝑃(1|2)
𝑃𝑃(𝑥𝑥
1,𝑑𝑑×1
𝑃𝑃(𝑥𝑥1,𝑑𝑑×1
= 𝑃𝑃(𝑥𝑥 1,𝑑𝑑×1 𝜖𝜖∇
𝜖𝜖∇
1,𝑑𝑑×1𝜖𝜖∇ 1 |∆
1|∆
1,𝑑𝑑×1
|∆ 2
2)
∇)
)1== ∫
∫ ∇1∇1 𝑔𝑔
∇1 𝑔𝑔2
(𝑥𝑥)𝑑𝑑𝑑𝑑
(𝑥𝑥)𝑑𝑑𝑑𝑑1
1,𝑑𝑑×1 1 2 ∇11
𝑃𝑃(2|1)
𝑃𝑃(2|1) == 𝑃𝑃(𝑥𝑥
𝑃𝑃(2|1) = 2,𝑑𝑑×1
𝑃𝑃(𝑥𝑥 𝑃𝑃(𝑥𝑥 𝜖𝜖∇
𝑃𝑃(2|1) |∆
𝜖𝜖∇22= 𝜖𝜖∇
|∆ 11) =
)2|∆
𝑃𝑃(𝑥𝑥= 11∫ 𝑔𝑔∫1∇∫(𝑥𝑥)𝑑𝑑𝑑𝑑
𝑔𝑔
)∫)∇2==𝜖𝜖∇ 𝑔𝑔11 (𝑥𝑥)𝑑𝑑𝑑𝑑
(𝑥𝑥)𝑑𝑑𝑑𝑑
|∆ = ∫∇ 𝑔𝑔1 (𝑥𝑥)𝑑𝑑𝑑𝑑
)(𝑥𝑥)𝑑𝑑𝑑𝑑 (7)
𝑃𝑃(2|1)
𝑃𝑃(2|1)== 𝑃𝑃(𝑥𝑥 𝜖𝜖∇ |∆ =1∫ 𝑔𝑔111(𝑥𝑥)𝑑𝑑𝑑𝑑
∫2∇2∇∇22 𝑔𝑔𝑔𝑔
𝑃𝑃(2|1) 2,𝑑𝑑×1
𝑃𝑃(𝑥𝑥 𝜖𝜖∇2 |∆ ∇)
𝑃𝑃(2|1) =2,𝑑𝑑×1
= 𝑃𝑃(𝑥𝑥2,𝑑𝑑×1
𝑃𝑃(𝑥𝑥 2,𝑑𝑑×1 𝜖𝜖∇
2,𝑑𝑑×1𝜖𝜖∇
2,𝑑𝑑×1
2
2|∆
2
2,𝑑𝑑×1
|∆ 1
1)
1
)2 =
= ∫ 𝑔𝑔 (𝑥𝑥)𝑑𝑑𝑑𝑑
(𝑥𝑥)𝑑𝑑𝑑𝑑
2
∇2 1 2
Equation
Equation
Equation 6 implies 666implies
Equation the 6conditional
the conditional
implies theprobability
conditional probability 𝑃𝑃(1|2) 𝑃𝑃(1|2)
probability of assigning
ofof assigning
𝑃𝑃(1|2) an an
of assigning an
object
Equation
Equation
Equation
to
object Group to Group 6implies
6 ∆
implies
implies
implies when

the
when
the
the
the
it
conditional
is
conditional
conditional
conditional
actually
it is actually from
probability
probability
probability
probability
Group
from Group ∆
𝑃𝑃(1|2)
𝑃𝑃(1|2)
𝑃𝑃(1|2)
𝑃𝑃(1|2)
, while
∆ , while
assigning
ofEquation
of assigning
assigning
Equation
an
an
an
object
object toto object
Group to ∆ Group
when ∆
it is whenactually it is actually
from Group from ∆ Group
, while ∆ , while
Equation Equation
to Group ∆1111 when it is
is actually from
from Group
11 ∆1 1 22 2 2
object
object to Group when it𝑃𝑃(2|1) actually Group 22,, while
∆object while Equation
7 implies
7777implies
implies
implies
the 7Groupconditional
the
theimplies
the

conditional
conditional
conditional
when
the 𝑃𝑃(2|1) it
conditional is
𝑃𝑃(2|1)
𝑃𝑃(2|1)
actually
of assigning ofofof
from
assigning
𝑃𝑃(2|1)
assigning
assigning
Group
an assigning
of object
anan
∆ 2 to an
object GroupEquation
toto Group
object
Group to Group
∆22 when
∆ 7 implies
implies
when it is from
it is the
the fromGroup conditional
conditional
Group ∆ . Let
∆ 𝑃𝑃(2|1)
𝑃𝑃(2|1)
. us
Let denote
us of
of
denote assigning
assigning
the priori
the an
an
probability
priori object
object
probability ofto
to ∆ Group
Group
of ∆
when ∆
itit when it is from Group ∆ . Let us denote the priori probability of ∆1
∆∆ when it2is isfrom
is from Group
Group ∆∆ . .Let
Let us us denote
denote the
the priori
the priori probability of
1 1
as 𝑤𝑤as
2

22
∆ when
2𝑤𝑤when it is from
from
) Group
Group,
1
and ∆
1
∆and
111.. Let Letas us
us
1 denote
denote
implying the priori
priori
that probability
probability
)
1
of𝑤𝑤∆
of
1
∆1111

11asthat
as 2 is
that 𝑃𝑃(∆
as is 𝑤𝑤 𝑃𝑃(∆ =
that 𝑤𝑤
) =is 𝑤𝑤
𝑃𝑃(∆ , ∆)
1 = ∆𝑤𝑤 𝑤𝑤
,as and 𝑤𝑤 ∆ implyingas 𝑤𝑤 𝑃𝑃(∆
that
implying 𝑃𝑃(∆ = 𝑤𝑤
)
that = 𝑃𝑃(∆ ) = 𝑤𝑤2
)))= 𝑤𝑤1111,,,,1probabilities
and
and ∆21222 as 2as 𝑤𝑤 2implying
implying 2 that
𝑤𝑤𝑤𝑤 that isis 1 1 2 2 2 2
suchsuch as
as
that𝑤𝑤
1
𝑤𝑤
11 that
that
that
1adding
that adding is1𝑃𝑃(∆
is
1𝑃𝑃(∆
𝑃𝑃(∆1this
𝑃𝑃(∆
this
1
1)
1
two
1=𝑤𝑤
=two
= 𝑤𝑤
𝑤𝑤
1
probabilities and
and
2 ∆ ∆

2
as 𝑤𝑤
as
will
2
𝑤𝑤will
𝑤𝑤 22 implying
2 implying
produce produce a totalthat
that
a
2𝑃𝑃(∆2 )
𝑃𝑃(∆
𝑃𝑃(∆
of
total
2
1of ))a =
2as
2 𝑤𝑤222
= as
=
1
2
𝑤𝑤22
𝑤𝑤
such 1
that suchadding that 1
thisadding two this two probabilities
2 will produce 2 total of 1 as
shown
such
such
such
shown in Equation
that
that
that
in
adding
adding
adding
Equation 8.in Equation
this
this
this
8. two probabilities
two
two probabilities will
probabilities
probabilities will produce
will
will produce aaa total
produce
produce total of
total of 1
of 11 as
as
as
shown shown 8.
showninin
shown
shown inEquation
in Equation
Equation8.8.
Equation 8.
8.
22

𝑃𝑃(∆1𝑃𝑃(∆ )
1𝑃𝑃(∆
+ 𝑃𝑃(∆
)
1 ))+ + 𝑃𝑃(∆ )
22𝑃𝑃(∆ = ∑
)=
12)) +
𝑖𝑖=1 ∑∑ 𝑤𝑤2𝑖𝑖=1
𝑃𝑃(∆ 22𝑖𝑖2𝑖𝑖 2= 𝑤𝑤
)𝑤𝑤 1
==∑ 1
2
𝑤𝑤 = 1 (8)
𝑃𝑃(∆
𝑃𝑃(∆ )
𝑃𝑃(∆11) + 𝑃𝑃(∆22) = ∑
1 1 +
+ 𝑃𝑃(∆
𝑃𝑃(∆ 2
2 ) = =
𝑖𝑖=1
= ∑ ∑𝑖𝑖=12
𝑖𝑖=1 𝑤𝑤
𝑖𝑖=1 𝑤𝑤
𝑤𝑤𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 ==
=𝑖𝑖=1
= 11
1 𝑖𝑖
1
𝑖𝑖=1
Let Let
𝐴𝐴𝐶𝐶𝐶𝐶 denotes
𝐴𝐴𝐶𝐶 denotes Letthe 𝐴𝐴𝐶𝐶 correctthe
denotes correct allocation
theallocation correct andallocation
𝐴𝐴𝑀𝑀
and 𝑀𝑀 𝐴𝐴𝐴𝐴
is𝑀𝑀the is
and misallocation.
the𝐴𝐴𝑀𝑀 misallocation.
is the misallocation.
Let Let
Let
Let 𝐴𝐴𝐴𝐴𝐴𝐴
𝐴𝐴
𝐶𝐶 𝐶𝐶 denotes
denotes
denotes
denotes the
the
the
the correct
correct
correct
correct allocation
allocation
allocation
allocation and
and
and
and 𝐴𝐴
𝐴𝐴
𝐴𝐴 𝑀𝑀
𝑀𝑀 is
is
is the
the misallocation.
misallocation.
Therefore,
Therefore, the
𝐶𝐶
𝐶𝐶 probability
the
Therefore, probability the of correct
of
probability correct allocation of allocation
correct or the
𝑀𝑀 or
allocation
𝑀𝑀 probability
the theofprobability
probability
or ofof of
Therefore,
Therefore,
Therefore,
Therefore, thethe
the
the probability
probability
probability
probability ofof
of
of correct
correct
correct
correct allocation
allocation
allocation
allocation or
or
or the
the probability
probability of
of
misallocation
misallocation for
misallocation the
for two
the groups
two for groups
the can two be
can expressed
groups be expressed
can as
be Equations
as
expressedEquations 9 toEquations
as 9 to 9 to
misallocation for the two groups can be expressed Equations 9 99 to
12. misallocation
12.misallocationfor
misallocation 12.
forthe
for thetwo
the twogroups
two groups can
groups can be
can be expressed
be expressed as
expressed as Equations to
to
12. 12.
12.
12.
𝑃𝑃(𝐴𝐴𝑃𝑃(𝐴𝐴 𝑡𝑡𝑡𝑡 ∆𝑡𝑡𝑡𝑡
𝐶𝐶𝐶𝐶𝑃𝑃(𝐴𝐴 )∆
11𝑡𝑡𝑡𝑡 =𝑃𝑃(𝐴𝐴 )𝑃𝑃(𝑥𝑥
= 𝑃𝑃(𝑥𝑥
𝑡𝑡𝑡𝑡 ∆ 𝜖𝜖∇
𝑖𝑖,𝑑𝑑×1 )= 11|∆ 𝜖𝜖∇ 111)𝑃𝑃(∆
𝑃𝑃(𝑥𝑥 |∆ )𝑃𝑃(∆
1 ) = )𝑃𝑃(1|1)𝑤𝑤
1𝜖𝜖∇11|∆ ))= 𝑃𝑃(1|1)𝑤𝑤
)𝑃𝑃(∆ 111) = 1𝑃𝑃(1|1)𝑤𝑤 (9)1
𝑃𝑃(𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑡𝑡𝑡𝑡
𝑃𝑃(𝐴𝐴
𝑃𝑃(𝐴𝐴 𝑡𝑡𝑡𝑡∆∆
𝑡𝑡𝑡𝑡 ∆∆11111))))=𝐶𝐶=
= 𝑃𝑃(𝑥𝑥
𝑃𝑃(𝑥𝑥1𝑖𝑖,𝑑𝑑×1
=𝑖𝑖,𝑑𝑑×1
𝑃𝑃(𝑥𝑥
𝑃𝑃(𝑥𝑥 𝑖𝑖,𝑑𝑑×1𝜖𝜖∇
𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1
𝜖𝜖∇
𝜖𝜖∇
𝜖𝜖∇ 11 1|∆
1
|∆
|∆11111)𝑃𝑃(∆
𝑖𝑖,𝑑𝑑×1
|∆ )𝑃𝑃(∆
)𝑃𝑃(∆
)𝑃𝑃(∆ 1))
11
1
1==
= 𝑃𝑃(1|1)𝑤𝑤
= 𝑃𝑃(1|1)𝑤𝑤
𝑃𝑃(1|1)𝑤𝑤
𝑃𝑃(1|1)𝑤𝑤 1
1
1
𝑃𝑃(𝐴𝐴𝑃𝑃(𝐴𝐴 𝑡𝑡𝑡𝑡 ∆𝑡𝑡𝑡𝑡 ) 𝑃𝑃(𝐴𝐴
∆=1 )𝑃𝑃(𝑥𝑥 ==𝑡𝑡𝑡𝑡 𝑃𝑃(𝑥𝑥 𝜖𝜖∇ 11|∆ 𝜖𝜖∇ 221)𝑃𝑃(∆
|∆ ) = )𝑃𝑃(1|2)𝑤𝑤
𝑀𝑀
𝑀𝑀𝑃𝑃(𝐴𝐴
𝑃𝑃(𝐴𝐴𝑀𝑀
𝑃𝑃(𝐴𝐴
𝑃𝑃(𝐴𝐴
11𝑡𝑡𝑡𝑡
𝑡𝑡𝑡𝑡
𝑡𝑡𝑡𝑡 ∆ ∆∆ ) )))𝑀𝑀
= =
𝑖𝑖,𝑑𝑑×1
𝑃𝑃(𝑥𝑥
𝑖𝑖,𝑑𝑑×1
𝑃𝑃(𝑥𝑥
𝑃𝑃(𝑥𝑥
)=
∆1𝑖𝑖,𝑑𝑑×1
𝜖𝜖∇
𝑃𝑃(𝑥𝑥
𝜖𝜖∇
𝜖𝜖∇ |∆ |∆
𝑖𝑖,𝑑𝑑×1
|∆ 2 )𝑃𝑃(∆
2 2𝜖𝜖∇12|∆
)𝑃𝑃(∆
)𝑃𝑃(∆
)𝑃𝑃(∆ 2))
))=2== 𝑃𝑃(1|2)𝑤𝑤
)𝑃𝑃(∆
= 𝑃𝑃(1|2)𝑤𝑤222
𝑃𝑃(1|2)𝑤𝑤
𝑃𝑃(1|2)𝑤𝑤
) = 2𝑃𝑃(1|2)𝑤𝑤 (10) 2
𝑀𝑀𝑀𝑀
𝑀𝑀
𝑀𝑀 𝑡𝑡𝑡𝑡 ∆ 1 1
1
1 = 𝑃𝑃(𝑥𝑥 𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1 𝜖𝜖∇ 1 1
1
1 2 |∆ 2 2
2 )𝑃𝑃(∆ 22
2 = 𝑃𝑃(1|2)𝑤𝑤 22
2
𝑃𝑃(𝐴𝐴𝑃𝑃(𝐴𝐴 𝑡𝑡𝑡𝑡 ∆𝑡𝑡𝑡𝑡 )∆ = )𝑃𝑃(𝑥𝑥 = 𝑃𝑃(𝑥𝑥 ∆ 𝜖𝜖∇ 22|∆ 𝜖𝜖∇ 222)𝑃𝑃(∆
|∆ ) =22|∆ )𝑃𝑃(2|2)𝑤𝑤
𝐶𝐶𝐶𝐶𝑃𝑃(𝐴𝐴 22𝑡𝑡𝑡𝑡
𝑃𝑃(𝐴𝐴𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝑡𝑡𝑡𝑡
𝑃𝑃(𝐴𝐴
𝑃𝑃(𝐴𝐴 𝑡𝑡𝑡𝑡∆∆
𝑡𝑡𝑡𝑡
𝑃𝑃(𝐴𝐴
∆22222))))=

𝐶𝐶=
=
𝑡𝑡𝑡𝑡
𝑖𝑖,𝑑𝑑×1
𝑃𝑃(𝑥𝑥 )=
𝑃𝑃(𝑥𝑥2𝑖𝑖,𝑑𝑑×1
=𝑖𝑖,𝑑𝑑×1
𝑃𝑃(𝑥𝑥
𝑃𝑃(𝑥𝑥 𝑖𝑖,𝑑𝑑×1𝜖𝜖∇
𝑖𝑖,𝑑𝑑×1
𝑃𝑃(𝑥𝑥
𝜖𝜖∇
𝜖𝜖∇
𝜖𝜖∇ 2|∆
|∆
|∆22222)𝑃𝑃(∆
𝑖𝑖,𝑑𝑑×1 )𝑃𝑃(∆
22𝜖𝜖∇
)𝑃𝑃(∆
)𝑃𝑃(∆ 2))
))=2=
=
𝑃𝑃(2|2)𝑤𝑤
)𝑃𝑃(∆
= 𝑃𝑃(2|2)𝑤𝑤
𝑃𝑃(2|2)𝑤𝑤
) = 2𝑃𝑃(2|2)𝑤𝑤
222
𝑃𝑃(2|2)𝑤𝑤 (11) 2
𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1 22 2 |∆ )𝑃𝑃(∆ 222 = 𝑃𝑃(2|2)𝑤𝑤 2
2
2
𝑃𝑃(𝐴𝐴𝑃𝑃(𝐴𝐴 𝑡𝑡𝑡𝑡 ∆𝑡𝑡𝑡𝑡 ) 𝑃𝑃(𝐴𝐴
∆=2 )𝑃𝑃(𝑥𝑥 = 𝑡𝑡𝑡𝑡 𝑃𝑃(𝑥𝑥 ∆ 𝜖𝜖∇ 22|∆ 𝜖𝜖∇ 112)𝑃𝑃(∆
|∆ ) = )𝑃𝑃(2|1)𝑤𝑤
𝑀𝑀
𝑀𝑀𝑃𝑃(𝐴𝐴
𝑃𝑃(𝐴𝐴𝑀𝑀
𝑃𝑃(𝐴𝐴
𝑃𝑃(𝐴𝐴 𝑀𝑀𝑡𝑡𝑡𝑡
22𝑡𝑡𝑡𝑡
𝑡𝑡𝑡𝑡∆∆ ∆2222))))𝑀𝑀==
𝑖𝑖,𝑑𝑑×1
=𝑖𝑖,𝑑𝑑×1𝑃𝑃(𝑥𝑥 )=
𝑃𝑃(𝑥𝑥2𝑖𝑖,𝑑𝑑×1
𝑃𝑃(𝑥𝑥 𝑖𝑖,𝑑𝑑×1𝜖𝜖∇
𝑃𝑃(𝑥𝑥
𝜖𝜖∇
𝜖𝜖∇ 2|∆
|∆ )𝑃𝑃(∆
|∆11111)𝑃𝑃(∆
𝑖𝑖,𝑑𝑑×1 1 1𝜖𝜖∇21|∆
)𝑃𝑃(∆
)𝑃𝑃(∆ 1))
))=1=
=
𝑃𝑃(2|1)𝑤𝑤
)𝑃𝑃(∆ 111
𝑃𝑃(2|1)𝑤𝑤
= 𝑃𝑃(2|1)𝑤𝑤
𝑃𝑃(2|1)𝑤𝑤
) = 1𝑃𝑃(2|1)𝑤𝑤1
(12)
𝑀𝑀𝑀𝑀
𝑀𝑀 𝑡𝑡𝑡𝑡 ∆ = 𝑃𝑃(𝑥𝑥 𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1 𝜖𝜖∇ 222 |∆ )𝑃𝑃(∆ 11
1 = 𝑃𝑃(2|1)𝑤𝑤 1
1
1
Equation
Equation 9 and 9
EquationEquation
and Equation 11 give
9 and Equation 11 the
give probabilities
the probabilities
11 give of correct
the probabilitiesof correctallocation,
allocation,
Equation
Equation
Equation
Equation 9and9
99and and
and
and Equation
Equation
Equation
Equation 1111
11Equation
11 give give
give
give the
the
the
the probabilities
probabilities
probabilities
probabilities of correctofallocation,
of correct
of
correct allocation,
allocation,
while
Equation
while Equation 9 Equation
while 10 Equation and and
10
Equation Equation
11 10 give and 12the give
12
Equation give the 12
probabilities probabilities
the give of correct
probabilities
the of of
probabilities of
while
while while
while
while Equation
Equation
Equation
Equation 10 10
10 and
10 and
and
and Equation
Equation
Equation
Equation 12
12 give
12
12 give
givethe9the
give
give the
the probabilities
probabilities
probabilities of
of
of
misallocation,
allocation,
misallocation, respectively.
Equation
misallocation, respectively. 10 In other
and In
respectively. otherwords,
Equation words,
In Equation
12
other Equation
words, to Equation
probabilities
9 to
Equation Equation912 of
to 12
Equation 12
misallocation,
misallocation,
misallocation, respectively.
respectively.
respectively. In In
Inother other
other words,
words, Equation
Equation 99 to
to Equation
Equation 12
12
summarizemisallocation,
summarize
misallocation,
summarize
thesummarize confusion
the
respectively.
the
respectively.
confusion
confusion
matrix
the matrix
Inconfusion
other
matrix
In
at aotherglance
at
words, at aawords,
matrixwords,
glance such
Equation
glance atEquation
Equation
that
such
asuch Equation
9that
glance 9Equation
to
to Equation
that such Equation
9 and
that
Equation 1299and
Equation 12
and 9 and
summarizesummarize
summarize
summarize the thethe
theconfusion
confusion confusion
confusion matrix matrix
matrix
matrix at a at
glance at a
at aa glanceglance
glance
such such
such
such
that that Equation
that Equation
Equation 9 and 9 and
9 and
Equation 11 is the diagonal of the confusion matrix whereas Equation
Equation 11 10isand theEquationdiagonal12 of isthe theconfusion off diagonal matrix of the whereas
confusion Equation
matrix.7 7Due to 7
7
77
10 and Equation equal probability 12 is the offand diagonal equalofcost the confusion of misallocation matrix. Due assumptions
to equal probability (Johnson & and Wichern, equal 1992; cost Johnson, of misallocation 1987), it assumptions
is sometimes easy to
(Johnson & implement this procedure. Nevertheless,sometimes
Wichern, 1992; Johnson, 1987), it is it is advisable easy to to consider
implementalternative this procedure. allocation Nevertheless,methods when it is this advisable assumption to consider
fails. We shall
alternativeconsider allocation the methodsBayes posterior when this probabilityassumption rule for fails. We shall
allocating an object to
consider the theBayes desiredposterior group. The probability Bayes procedure rule for allocating (Sainin et al., an 2021;
objectMa et al.,
to the desired 2011;group. Theodoridis The Bayes & Koutroumbas, procedure (Sainin 2009) et canal., be2021;
statedMa as Equation
et al., 2011; 13. Theodoridis & Koutroumbas, 2009) can be stated as
Equation 13.
𝑃𝑃(∆𝑖𝑖 )𝑃𝑃(𝑥𝑥𝑖𝑖,𝑑𝑑×1 |𝑖𝑖) 𝑤𝑤 𝑔𝑔 (𝑥𝑥 )
𝑃𝑃(∆𝑖𝑖 |𝑥𝑥𝑖𝑖,𝑑𝑑×1 ) = = ∑2 𝑖𝑖 𝑖𝑖 𝑖𝑖,𝑑𝑑×1 = (13)
𝑃𝑃(𝑥𝑥
𝑖𝑖,𝑑𝑑×1 ) 𝑖𝑖=1 𝑖𝑖 𝑖𝑖 𝑖𝑖,𝑑𝑑×1 𝑤𝑤 𝑔𝑔 (𝑥𝑥 )
−1
(∑2𝑖𝑖=1 𝑤𝑤𝑖𝑖 𝑔𝑔𝑖𝑖 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )) 𝑤𝑤𝑖𝑖 𝑔𝑔𝑖𝑖 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )
7 we can
From Equation 13 and for the two groups’ univariate allocator,
separate to Equation 14 and Equation 15.
𝑤𝑤1 𝑔𝑔1 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )
𝑃𝑃(∆ )𝑃𝑃(𝑥𝑥
𝑖𝑖 𝑃𝑃(∆ |𝑖𝑖) 𝑤𝑤 𝑔𝑔 (𝑥𝑥 )
)𝑃𝑃(𝑥𝑥𝑖𝑖,𝑑𝑑×1 |𝑖𝑖) 𝑖𝑖 𝑖𝑖 𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1 𝑤𝑤𝑖𝑖 𝑔𝑔𝑖𝑖 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )
𝑃𝑃(∆𝑖𝑖 |𝑥𝑥𝑃𝑃(∆
𝑖𝑖,𝑑𝑑×1𝑖𝑖 |𝑥𝑥 )= =𝑖𝑖,𝑑𝑑×1𝑖𝑖) 𝑃𝑃(∆
)𝑃𝑃(𝑥𝑥 =𝑖𝑖)𝑃𝑃(𝑥𝑥 = =𝑖𝑖𝑔𝑔𝑖𝑖(𝑥𝑥𝑖𝑖,𝑑𝑑×1
= )
𝑖𝑖,𝑑𝑑×1
𝑃𝑃(∆ |𝑥𝑥 ) =
𝑃𝑃(𝑥𝑥 ) 𝑤𝑤𝑖𝑖 𝑔𝑔∑|𝑖𝑖)
∑2𝑖𝑖=1𝑖𝑖,𝑑𝑑×1 𝑖𝑖2(𝑥𝑥𝑖𝑖,𝑑𝑑×1
=
𝑤𝑤
𝑤𝑤𝑖𝑖 𝑔𝑔)𝑖𝑖2(𝑥𝑥𝑖𝑖,𝑑𝑑×1 ) =
𝑖𝑖 𝑖𝑖,𝑑𝑑×1 𝑖𝑖,𝑑𝑑×1 𝑖𝑖=1 ∑
2 −1 −1 𝑃𝑃(𝑥𝑥𝑖𝑖,𝑑𝑑×1 ) 𝑖𝑖=1 𝑤𝑤𝑖𝑖 𝑔𝑔𝑖𝑖 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )
(∑𝑖𝑖=1 𝑤𝑤 (∑ 𝑖𝑖 𝑔𝑔𝑖𝑖=1
2
𝑖𝑖 (𝑥𝑥𝑤𝑤 𝑔𝑔 (𝑥𝑥
Journal
𝑖𝑖,𝑑𝑑×1 ))of ICT, 𝑤𝑤𝑖𝑖)𝑔𝑔)22,
𝑖𝑖 (𝑥𝑥𝑤𝑤
No. 𝑖𝑖 𝑔𝑔−1
𝑖𝑖,𝑑𝑑×1 ) 𝑖𝑖,𝑑𝑑×1 ) 2023, pp: 1–30
1𝑖𝑖 (𝑥𝑥
(January)
(∑𝑖𝑖2𝑖𝑖=1𝑖𝑖 𝑤𝑤𝑖𝑖,𝑑𝑑×1
𝑖𝑖 𝑔𝑔𝑖𝑖 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )) 𝑤𝑤𝑖𝑖 𝑔𝑔𝑖𝑖 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )
From Equation
FromFrom 13 and13
Equation forand the for twothe groups’
twothegroups’ univariate allocator,
univariate we canwe can
allocator,
separate to Equation Equation
14 and 13
Equation and for 15. two groups’ univariate allocator, we can
separate
From Equation to
13Equation
separate andtofor the
Equation 14two and14 Equation
groups’
and Equation 15. 15.allocator, we can
univariate
separate to Equation 14 and Equation) 15.
𝑤𝑤1 𝑔𝑔1 (𝑥𝑥𝑖𝑖,𝑑𝑑×1
𝑤𝑤1 𝑔𝑔1 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )
𝑃𝑃(∆
|𝑥𝑥
1 𝑃𝑃(∆ 𝑖𝑖,𝑑𝑑×1 )
1 |𝑥𝑥
= 2) = 𝑤𝑤=𝑔𝑔 (𝑥𝑥 = )
𝑃𝑃(∆𝑖𝑖,𝑑𝑑×1∑|𝑥𝑥 𝑤𝑤𝑖𝑖 𝑔𝑔∑𝑖𝑖2(𝑥𝑥
1 𝑖𝑖=1𝑖𝑖,𝑑𝑑×1𝑖𝑖=1 𝑤𝑤𝑖𝑖∑ = ) 1 1 𝑖𝑖,𝑑𝑑×1
) 𝑖𝑖,𝑑𝑑×1
𝑔𝑔2𝑖𝑖 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 ) =
2 −1 −1 𝑖𝑖=1 𝑤𝑤𝑖𝑖 𝑔𝑔𝑖𝑖 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )
(∑𝑖𝑖=1 𝑤𝑤
(∑𝑖𝑖 𝑔𝑔𝑖𝑖=1
2
𝑖𝑖 (𝑥𝑥
𝑤𝑤𝑖𝑖,𝑑𝑑×1 ))𝑖𝑖,𝑑𝑑×1
𝑔𝑔𝑖𝑖 (𝑥𝑥 𝑤𝑤 𝑔𝑔 )1 (𝑥𝑥𝑤𝑤)𝑖𝑖,𝑑𝑑×1
−1 ) 𝑖𝑖,𝑑𝑑×1 )
1)𝑔𝑔1 (𝑥𝑥
(∑ 𝑖𝑖2
𝑖𝑖=1 𝑤𝑤𝑖𝑖 𝑔𝑔𝑖𝑖 (𝑥𝑥1)𝑖𝑖,𝑑𝑑×1 𝑤𝑤1 𝑔𝑔1 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 ) (14)
𝑤𝑤 𝑔𝑔 (𝑥𝑥 )
𝑤𝑤2 𝑔𝑔2 (𝑥𝑥𝑤𝑤
𝑃𝑃(∆2 |𝑥𝑥𝑃𝑃(∆ 2 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 =
)𝑃𝑃(∆ ∑
2 2 𝑖𝑖,𝑑𝑑×1
2 ) = 2) =
|𝑥𝑥
𝑖𝑖,𝑑𝑑×1 𝑤𝑤𝑘𝑘 𝑔𝑔
= 1)(𝑥𝑥−𝑖𝑖,𝑑𝑑×1
2 𝑔𝑔2
𝑖𝑖,𝑑𝑑×1
= 1) −= 1 −
2 𝑖𝑖,𝑑𝑑×1
𝑖𝑖=1 ∑𝑘𝑘 (𝑥𝑥𝑤𝑤 𝑔𝑔2)
𝑘𝑘 (𝑥𝑥𝑤𝑤
𝑖𝑖,𝑑𝑑×1
𝑖𝑖=1 𝑘𝑘∑𝑖𝑖=1 )
𝑘𝑘 𝑔𝑔𝑘𝑘 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )
𝑖𝑖,𝑑𝑑×1
2 −1 −1 (15)
((∑𝑖𝑖=1((∑
𝑤𝑤𝑖𝑖 𝑔𝑔𝑖𝑖=1
2 (𝑥𝑥𝑤𝑤𝑖𝑖,𝑑𝑑×1
𝑖𝑖((∑ 2𝑔𝑔 (𝑥𝑥)) 𝑔𝑔 𝑤𝑤
𝑖𝑖 𝑤𝑤𝑖𝑖,𝑑𝑑×11)𝑔𝑔
)1 (𝑥𝑥𝑤𝑤𝑖𝑖,𝑑𝑑×1
−1 ))𝑖𝑖,𝑑𝑑×1
𝑔𝑔 (𝑥𝑥 ))𝑖𝑖,𝑑𝑑×1 ))
𝑖𝑖𝑖𝑖=1 𝑖𝑖 𝑖𝑖 (𝑥𝑥𝑖𝑖,𝑑𝑑×1 )1) 1 𝑤𝑤1 𝑔𝑔1 (𝑥𝑥

Therefore, allocateallocate
Therefore, to𝑥𝑥𝑖𝑖 ∆1toif𝑥𝑥 ∆𝑃𝑃(∆
𝑥𝑥𝑖𝑖 allocate 1 |𝑥𝑥𝑖𝑖,𝑑𝑑×1 >
)𝑃𝑃(∆ 𝑃𝑃(∆ 2 |𝑥𝑥𝑃𝑃(∆ ), 𝑖𝑖,𝑑𝑑×1 ),
Therefore, 𝑖𝑖 1toif ∆𝑃𝑃(∆ 1 if 1 |𝑥𝑥 )>
1 |𝑥𝑥 )>2 |𝑥𝑥𝑃𝑃(∆ 2 |𝑥𝑥𝑖𝑖,𝑑𝑑×1 ),
𝑖𝑖,𝑑𝑑×1 𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1
otherwise, allocate
otherwise, 𝑥𝑥
allocate
otherwise, to ∆ if 𝑃𝑃(∆ |𝑥𝑥
𝑥𝑥𝑖𝑖2 to 𝑥𝑥∆𝑖𝑖 2 toif1 ∆𝑃𝑃(∆
𝑖𝑖 allocate if )
|𝑥𝑥 < 𝑃𝑃(∆) < |𝑥𝑥
𝑃𝑃(∆ ).
|𝑥𝑥 This ). This
𝑖𝑖,𝑑𝑑×1 𝑃𝑃(∆
2 1 𝑖𝑖,𝑑𝑑×1 |𝑥𝑥 2 )
𝑖𝑖,𝑑𝑑×1
1 𝑖𝑖,𝑑𝑑×1 2 𝑖𝑖,𝑑𝑑×1 < 𝑃𝑃(∆ 2 𝑖𝑖,𝑑𝑑×1 ). This
|𝑥𝑥
decision criteria
decision can
criteria
decision be compactly
can becan
criteria written
compactly
be compactly as shown
writtenwritten in
as shown Equation
as shown 16
in Equation 16
in Equation 16
below.below. below.
𝑥𝑥 to ∆ if 𝑃𝑃(∆ if|𝑥𝑥𝑃𝑃(∆ >
)𝑃𝑃(∆ 𝑃𝑃(∆ 2 |𝑥𝑥𝑃𝑃(∆ )𝑃𝑃(∆
𝜋𝜋 = { 𝜋𝜋𝑖𝑖 = {𝜋𝜋𝑥𝑥1 𝑖𝑖=to{ ∆𝑥𝑥1𝑖𝑖 1to ∆1 if1 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 1 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 >
) 𝑖𝑖,𝑑𝑑×1 )>
𝑖𝑖,𝑑𝑑×12 |𝑥𝑥 ) 𝑖𝑖,𝑑𝑑×1 ) (16)
2 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1
𝑥𝑥𝑖𝑖 to ∆𝑥𝑥2𝑖𝑖ifto𝑃𝑃(∆ |𝑥𝑥
∆𝑥𝑥2𝑖𝑖 1to ∆
if 𝑃𝑃(∆ 2 if
)
1 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 <
𝑃𝑃(∆ 𝑃𝑃(∆
1 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 < |𝑥𝑥
2 𝑃𝑃(∆
) 𝑖𝑖,𝑑𝑑×1 )< )𝑃𝑃(∆
2 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 2 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 ) 𝑖𝑖,𝑑𝑑×1 )
Smart Univariate Smart Classifier
Univariate Classifier
Smart SmartUnivariate Univariate Classifier Classifier
The SUC ThemethodThe SUC
SUC applied
method method the F-weight
applied applied to extract
the
the F-weight F-weight thetoinlier
to extract the from
extract inlierthethe inlierthe
from from the
The
data SUCmatrix methodand applied
detected
matrix the
and the F-weight
outliers.
data matrix and detected the outliers. The outliers detected were were
data detected to
The
the extract
outliers
outliers. the inlier
detected
The from
outliers werethedetected
data matrix
subjected subjected and
tosubjected
F-weight todetected
F-weight treatment,
to thetreatment,
F-weight outliers.
which The which
transformed
treatment,
which outliers
transformed the detected
outliers
transformed were
into
the outliers the outliers
into into
inliers.inliers.
subjected Subsequently,
toinliers.
F-weight
Subsequently, the
Subsequently, classifier
treatment, which
the
the classifier wasclassifier
applied
transformed
was applied to
was theapplied
transform
the
to outliers to the
the transform data
into
transform
data data
set to build
inliers. tosetthe
Subsequently,
set build toclassification
build
thethe the classifier
classification modelwas
classification andapplied
model the
model
and benchmark
theand
to the thevalue.
transform
benchmark benchmark This
data value.
value. This This
set to build the method classification
is called smart modelbecause and the of benchmark
its sensitivity value. This
in identifying inliers
method
method
methodmethod is
is called
called
is calledfrom smart
the
smart
is smart data
called because because
because sets
smart because andof
of its
its sensitivity
subsequent
sensitivity
of its sensitivity
of its sensitivity in
conversion
in identifying
identifying
in identifying of
in identifyinginliers
outliers
inliers
inliers to inliers.
8 inliers The
8
from the data sets and subsequent conversion of outliers to inliers. The 8
from the the
from data
data formulation
the sets
setsdata and and
sets of the
subsequent
and subsequent
subsequent SUC method
conversion
conversion conversion(Okwonu,
of outliers
of outliers Ahad,
of outliers Okoloko
to inliers.
to inliers. to inliers.
The Theet al., 2022)
formulation
The formulation
formulation ofisthe
formulation
of the SUC
asoffollows.
of
SUCthe theSUC method
SUCmethod
method (Okwonu,
method
(Okwonu, Ahad,Ahad,
(Okwonu,
(Okwonu, Ahad, Okoloko
Ahad,Okoloko
Okoloko et al.,
Okoloko
et al., 2022)
2022)
etetal.,
al., 2022)
is as
2022)
is follows.
is as follows.
as follows. Let 𝑥𝑥1 and 𝑥𝑥2 be univariate random observations from ∆1 and ∆2 such
Let 𝑥𝑥 and 𝑥𝑥
Let 𝑥𝑥1 and 𝑥𝑥12that
Let be univariate
and univariate
𝑥𝑥𝑛𝑛2 −be2univariate
≥ 𝑝𝑝, random 𝑝𝑝 =random
1. observations
We observations
assume from that∆ 1𝑥𝑥and
∆from 1 and
∆2and such
𝑥𝑥2 ∆are normally
1 𝑥𝑥 2 be random observations from 1 and ∆∆12 such 2 such
that 𝑛𝑛 −
that 𝑛𝑛 that
−2≥ 2 ≥
𝑛𝑛 − 𝑝𝑝,
distributed 𝑝𝑝 =
𝑝𝑝, 2𝑝𝑝≥=𝑝𝑝,1. 𝑝𝑝We 1. We
with
= 1.assume assume
mean
We assume that
and 𝑥𝑥
variance,
that 𝑥𝑥1that 1 and 𝑥𝑥
which
and𝑥𝑥1𝑥𝑥2and 2 are is normally
𝑥𝑥 2 ).
𝑖𝑖are normallyLet 𝑊𝑊
~𝑁𝑁(𝜇𝜇,
are2𝑥𝑥2normally 𝜎𝜎
distributed with
distributed
distributed denote
with mean mean the
with and and
F-weight variance,
meanvariance, as
and variance, which
shown in
which which is 𝑥𝑥
Equation ~𝑁𝑁(𝜇𝜇, 17.𝜎𝜎
is 𝑥𝑥𝑖𝑖 ~𝑁𝑁(𝜇𝜇,
is 𝑥𝑥𝑖𝑖 ~𝑁𝑁(𝜇𝜇,
𝑖𝑖 2 ).
𝜎𝜎 ). Let Let 𝑊𝑊Let 𝑊𝑊
𝜎𝜎 2 ).
𝑊𝑊
denote the
denote denote F-weight
the F-weightthe F-weight as
as shown shown as shown in Equation
𝑥𝑥 in Equation
in Equation 17.
17. 17.
𝑥𝑥𝑖𝑖
𝑤𝑤𝑖𝑖 = 𝑍𝑍𝑖𝑖 , 𝑤𝑤𝑖𝑖 ∈ 𝑊𝑊, 𝑖𝑖 = 1, 2
𝑤𝑤𝑖𝑖 = = 𝑥𝑥𝑍𝑍𝑤𝑤𝑖𝑖 , 𝑤𝑤𝑖𝑖𝑥𝑥∈ 𝑖𝑖 𝑊𝑊, 𝑖𝑖 = 1, 2
(17)
𝑤𝑤 𝑖𝑖 𝑍𝑍
,𝑖𝑖 =𝑤𝑤𝑖𝑖`𝑍𝑍∈, 𝑊𝑊, 𝑤𝑤𝑖𝑖 𝑖𝑖∈=𝑊𝑊, 1, 2𝑖𝑖 = 1, 2
where 𝑍𝑍 = 𝑋𝑋1 𝑋𝑋2 , 𝑥𝑥1 ∈ 𝑋𝑋1 , 𝑥𝑥2 ∈ 𝑋𝑋2 , and 𝑤𝑤𝑖𝑖 are the weight associated
`
where 𝑍𝑍 𝑍𝑍 = 𝑋𝑋
where 𝑋𝑋1with
` 𝑋𝑋2 , 𝑋𝑋 𝑥𝑥1` ∈group.
each 𝑋𝑋1,, 𝑥𝑥 ∈ 𝑋𝑋2,, and
𝑥𝑥2Applying andEquation are the
𝑤𝑤,𝑖𝑖 and 𝑤𝑤𝑖𝑖17, weight
are the associated
theF-weight
weight mean vector is
associated
where = 1𝑍𝑍𝑋𝑋= 2 , 𝑥𝑥11 𝑋𝑋∈2 ,𝑋𝑋𝑥𝑥 1 1 ∈2𝑋𝑋∈1 ,𝑋𝑋𝑥𝑥 2 2 ∈ 𝑋𝑋𝑤𝑤 2 𝑖𝑖 are the weight associated
with each
with each group.
withgroup. given
each Applying
group.as Equation
Applying Applying Equation
Equation 18.Equation 17, the
17, the17, F-weight
the F-weight
F-weight mean
mean vectorvector
mean vector is
is is
given as
given as Equation
given as Equation
Equation 18. 18.
18. 𝑛𝑛𝑖𝑖
∑𝑥𝑥 =1 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖
∑𝑛𝑛𝑖𝑖=1 𝑤𝑤𝑖𝑖∑𝑥𝑥𝑥𝑥̅
𝑛𝑛
𝑛𝑛𝑖𝑖𝑖𝑖 =
𝑖𝑖
𝑖𝑖 (18)
∑𝑥𝑥𝑥𝑥𝑖𝑖𝑖𝑖=1 𝑤𝑤𝑖𝑖 𝑥𝑥𝑥𝑥𝑖𝑖 𝑖𝑖 =1 𝑖𝑖 𝑥𝑥𝑖𝑖 𝑤𝑤𝑖𝑖
𝑤𝑤
𝑥𝑥̅ 𝑖𝑖 =
𝑥𝑥̅𝑖𝑖 = 𝑥𝑥̅𝑖𝑖𝑤𝑤 𝑖𝑖
𝑤𝑤=𝑖𝑖 𝑤𝑤𝑖𝑖
8 We determine 𝑖𝑖 the inliers from Equation 17 and Equation 18 as shown
We determine
We determine
We determine in the
Equation
the inliersinliersthe inliersfrom
19.
from Equation Equation
from Equation 17 and
17 and17 Equation
and Equation
Equation 18 as
18 as shown
shown
18 as shown
in Equation 19.
in Equation
in Equation 19. 19. 1 𝑖𝑖𝑖𝑖 𝑤𝑤𝑖𝑖 < 𝑥𝑥̅𝑖𝑖 , 𝑥𝑥̅𝑖𝑖 = 𝑥𝑥̅1 , 𝑥𝑥̅2
with each group. Applying Equation 17, the F-weight mean vector is
givengiven
given as Equation 18.
givenas
asEquation 18.
as Equation
Equation 18. 18.
𝑛𝑛𝑖𝑖 𝑛𝑛𝑖𝑖
∑𝑥𝑥𝑛𝑛=1 ∑
𝑥𝑥𝑖𝑖𝑤𝑤 𝑖𝑖 𝑥𝑥 𝑤𝑤𝑛𝑛𝑖𝑖𝑥𝑥
Journal of ICT,
∑𝑥𝑥𝑖𝑖 𝑖𝑖22, No.
=1 ∑𝑖𝑖𝑥𝑥𝑖𝑖1=1
=1 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖𝑖𝑖
𝑖𝑖 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖
(January) 2023, pp: 1–30
𝑥𝑥̅𝑥𝑥̅𝑖𝑖 =𝑥𝑥̅𝑖𝑖 =𝑥𝑥̅𝑖𝑖𝑖𝑖 =𝑤𝑤
𝑖𝑖 = 𝑤𝑤𝑖𝑖 𝑖𝑖
𝑤𝑤𝑖𝑖 𝑤𝑤𝑖𝑖

We determine the inliers from Equation 17 and Equation 18 as18shown


We We Wedetermine
Wethe
determine
determine the
theinliers
determine
inliers the from
inliers
from inliers
from
Equation Equation
from
Equation 17Equationand1717and andEquation
Equation 17 18 as
and18Equation
Equation as asshown
18shownshown as shown
in Equation
ininEquation
in 19. 19. 19.
Equation
Equation
in Equation 19. 19.
1 𝑖𝑖𝑖𝑖1 𝑖𝑖𝑖𝑖𝑤𝑤𝑖𝑖1𝑤𝑤 <𝑖𝑖𝑖𝑖 <𝑖𝑖𝑤𝑤
𝑖𝑖 𝑥𝑥̅ , 𝑥𝑥̅𝑖𝑖< ,= 𝑥𝑥̅𝑖𝑖𝑥𝑥̅𝑥𝑥̅=,1𝑥𝑥̅,𝑥𝑥̅𝑥𝑥̅12=
, 𝑥𝑥̅𝑥𝑥̅2 , 𝑥𝑥̅
𝐼𝐼𝐼𝐼== 𝐼𝐼 {=1{𝐼𝐼𝑖𝑖𝑖𝑖= 𝑤𝑤 { 𝑖𝑖 < 𝑥𝑥̅𝑖𝑖 ,𝑖𝑖𝑥𝑥̅𝑖𝑖 = 𝑖𝑖𝑥𝑥̅1 ,𝑖𝑖 𝑥𝑥̅2 1 2
(19)
{ 00𝑖𝑖𝑖𝑖 0 𝑖𝑖𝑖𝑖 𝑤𝑤𝑖𝑖𝑖𝑖 >𝑖𝑖𝑤𝑤𝑥𝑥̅𝑖𝑖 𝑖𝑖> 𝑥𝑥̅𝑖𝑖
𝑖𝑖𝑖𝑖 𝑤𝑤𝑤𝑤𝑖𝑖0𝑖𝑖 >> 𝑖𝑖 𝑥𝑥̅
𝑥𝑥̅𝑖𝑖
If no
IfIfoutlier
no Ifoutlier exists, we proceed as follows by computing the sample
If no nooutlier no exists,
outlier
exists,
outlier exists, wewe proceed
exists,
proceed
we proceed weasproceed asasfollows
follows follows asbyfollows by
by computing
computing
computing the
by computing
the sample
sample
the the sample
sample
variance
variance s s given
given
variance in
s in in Equation
Equation
given in20. 20.
Equation 20.
variance s given
variance in Equation
s given Equation 20. 20.
∑𝑛𝑛𝑛𝑛𝑖𝑖∑(𝑤𝑤 𝑛𝑛𝑖𝑖
(𝑤𝑤 𝑛𝑛
)22𝑖𝑖𝑖𝑖𝑥𝑥)2𝑖𝑖 −𝑥𝑥̅𝑖𝑖 )2
𝑖𝑖 𝑥𝑥
𝑥𝑥𝑖𝑖 −𝑥𝑥̅ 𝑖𝑖 𝑖𝑖 −𝑥𝑥̅
𝑆𝑆𝑆𝑆𝑖𝑖22𝑆𝑆==2 ∑𝑖𝑖=12
= (𝑤𝑤𝑖𝑖 𝑖𝑖∑
𝑖𝑖 𝑖𝑖=1 −𝑥𝑥̅𝑖𝑖(𝑤𝑤
𝑥𝑥𝑖𝑖𝑖𝑖=1 ) (20)
𝑖𝑖 𝑆𝑆𝑖𝑖 𝑛𝑛=−𝑛𝑛1 − 1𝑖𝑖
𝑖𝑖=1
𝑖𝑖 𝑖𝑖
𝑛𝑛𝑖𝑖 𝑖𝑖 − 1 𝑛𝑛 − 1 𝑖𝑖

From
From the the sample
sample variance
variance inin in Equation
Equation 20,20,
we we compute
compute the pooled F- F-
From the
From sample
From
the variance
the
sample sample
variance in Equation
variance 20,
20,we
in Equation
Equation wecompute
20, thepooled
the
we compute
compute pooled F-
the pooled
pooled F-
weight
weight
F-weight variance
variance
variance
weight asas as
shown
shown
variance shown
asinin in Equation
Equation
Equation
shown 21.
21. 21.
in Equation
weight variance as shown in Equation 21. 21.
2 2 (𝑛𝑛 2 2
2𝑆𝑆 2 (𝑛𝑛1(𝑛𝑛 1 −1)𝑆𝑆
−1)𝑆𝑆 1 + 2(𝑛𝑛
12 +
(𝑛𝑛 −1)𝑆𝑆 +2 −1)𝑆𝑆
2−1)𝑆𝑆(𝑛𝑛
22 −1)𝑆𝑆
2 2
𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑆𝑆=2 =(𝑛𝑛1=−1)𝑆𝑆11 + (𝑛𝑛12−1)𝑆𝑆22
2 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
2
(𝑛𝑛1(𝑛𝑛
(21)
+𝑛𝑛 21)2
)(𝑛𝑛 −2
(𝑛𝑛1+𝑛𝑛
+𝑛𝑛12 −
2) − 2 2) − 2
+𝑛𝑛
Based on Equations
Based
Based on18
on Equationsto1821,
18weto18obtain
21,
towe the
wecoefficient
obtain the of the
the coefficient model ofas
of the model
BasedononEquations
Based 18toto21,
Equations
Equations 21,we
we obtain
21,
obtainthe
thecoefficient
obtain of
ofthe
themodel
coefficient
coefficient the model
model
can as
be referred
as
cancan
be
as be to Equations
referred
referred
can be to to 22
Equations
referred to and
Equations 23.
22 22
and
Equations and
23.
22 23.
and 23.
as can be referred to Equations 22 and 23.

(𝑥𝑥̅ −𝑥𝑥̅ )` )`
(𝑥𝑥̅ −𝑥𝑥̅ `
𝑤𝑤𝑤𝑤𝑥𝑥1𝑤𝑤=
𝑥𝑥= =
(𝑥𝑥̅1 −𝑥𝑥̅
1 𝑤𝑤𝑆𝑆𝑥𝑥1
12 (𝑥𝑥̅
21 𝑆𝑆=22 𝑤𝑤12𝑤𝑤 𝑥𝑥𝑥𝑥112 )𝑥𝑥==
)`2 1 −𝑥𝑥̅ = 𝑥𝑥1(𝑥𝑥̅
1𝑤𝑤(𝑥𝑥̅
(𝑥𝑥̅ −= 1 𝑥𝑥̅−
(𝑥𝑥̅ ` )−2
2 )𝑥𝑥̅1𝑆𝑆
`−
2
` −2` (𝑤𝑤
𝑆𝑆𝑥𝑥̅𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
−2 (𝑤𝑤𝑆𝑆1𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
2 )(𝑤𝑤
−2 =) 𝜔𝜔𝑤𝑤
𝑥𝑥11)𝑥𝑥(𝑤𝑤
1 =𝑥𝑥1𝜔𝜔𝑤𝑤
)1 𝑥𝑥 𝑥𝑥1 1 (22)
=11𝜔𝜔𝑤𝑤 𝑥𝑥1
𝑥𝑥1 𝑤𝑤
2 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 1 1 1 1 − 𝑥𝑥̅2 )
1 𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 1 𝑥𝑥1 ) =1 𝜔𝜔𝑤𝑤 1 𝑥𝑥1
` 1 −𝑥𝑥̅ 2 )`
(𝑥𝑥̅1 −𝑥𝑥̅2 )(𝑥𝑥̅
2 𝑥𝑥2 (23)
` −2 (𝑤𝑤
𝑤𝑤𝑥𝑥2 =𝑤𝑤𝑥𝑥22 = 𝑤𝑤 22 𝑥𝑥2 = 𝑤𝑤2(𝑥𝑥̅
𝑥𝑥21 = 𝑥𝑥̅(𝑥𝑥̅
− 2)
` −−2
1 𝑆𝑆 𝑥𝑥̅2 )(𝑤𝑤
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑆𝑆2𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑥𝑥2 )2=
𝑥𝑥2 ) =2𝜔𝜔𝑤𝑤 𝑥𝑥2𝜔𝜔𝑤𝑤(23) (23)
𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝

On theOn the hand,


other other ifhand, if an outlier
an outlier exists, exists,
we repeat we Equations
repeat Equations
17 and 1718 and 18
99 9 9
on the on the outliers
outliers only andonly andEquation
apply apply Equation 19 towhether
19 to detect detect whether the outlier
the outlier
still This
still exist. exist.process
This process continues
continues until alluntil all the outliers
the outliers are transformed
are transformed
into inliers. The inliers are merged to form the reweighted data setdata
into inliers. The inliers are merged to form the reweighted ̌ 𝑖𝑖 . set 𝑤𝑤
𝑤𝑤 ̌ 𝑖𝑖 .
The meanThe vectors,
mean vectors,
samplesample variance
variance ̌ 𝑖𝑖 , of
of 𝑤𝑤 are𝑤𝑤 𝑖𝑖 , are computed
̌computed similarlysimilarly
as as
Equations
Equations 18 to 1820 toand20 substituted
and substituted into Equations
into Equations 21 to 21 23, to 23,
respectively.
respectively.
The coefficient
The coefficient (𝜔𝜔) in in Equations
(𝜔𝜔)Equations 22 and22 23andis 23 is similar
similar to to
2 2 −1 −1
(∑𝑖𝑖=1 (∑𝑤𝑤𝑖𝑖 𝑔𝑔𝑖𝑖=1
𝑖𝑖 (𝑥𝑥𝑤𝑤 ))𝑖𝑖,𝑑𝑑×1in
𝑖𝑖 𝑔𝑔𝑖𝑖 (𝑥𝑥
𝑖𝑖,𝑑𝑑×1 )) Equation 13, and
in Equation 13,𝑤𝑤1and 𝑥𝑥1 and 𝑥𝑥2 in
𝑤𝑤1 𝑥𝑥1𝑤𝑤2and 𝑤𝑤2 𝑥𝑥2 in
Equations
Equations 22 and2223and is 23
similar to 𝑤𝑤𝑖𝑖 𝑔𝑔to
is similar 𝑖𝑖 (𝑥𝑥𝑤𝑤 in Equation
) 𝑖𝑖,𝑑𝑑×1
𝑖𝑖 𝑔𝑔𝑖𝑖 (𝑥𝑥
𝑖𝑖,𝑑𝑑×1 13.
) in Equation 13.
Therefore,
Therefore, Equations 22 and22
Equations 23and
mimic Equations
23 mimic Equations14 and 14 15.andHence,
15. Hence,
Equations
Equations 22 and2223and are 23
simply a lineara combination
are simply linear combination that allocates
that allocates
𝑤𝑤1 𝑥𝑥1 to ∆ or otherwise
𝑤𝑤1 𝑥𝑥11 to ∆1 or otherwise to ∆ if it is true. This procedure
2 to ∆2 if it is true. This procedure mimics mimics the the
FisherFisher
linear linear classification methodmethod
classification (Sheth,(Sheth,2019; Fisher, 1936). 1936).
2019; Fisher, The The
insensitivity
insensitivity of the SUC of themethod towardstowards
SUC method outliersoutliers and overfitting is due is due
and overfitting
to the F-weight
to the F-weight application to the data
application to thesets.
data sets.
To evaluate
To evaluate the performance
the
To evaluateperformance of the SUC
the performance of themethod, the group
SUC method, theevaluation
group evaluation
criteriacriteria
are obtained as in Equations
Equations 24
24 to
to 26.
26.
are obtained as in Equations 24 to 26.
−2 ` −2
− 𝑥𝑥̅(𝑥𝑥̅2 1)` 𝑆𝑆
(𝑥𝑥̅𝑥𝑥 1 =
𝑇𝑇𝑥𝑥1 = 𝑇𝑇 𝑥𝑥̅2 )𝑥𝑥̅1𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
−𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 𝜔𝜔𝑥𝑥̅ 𝑥𝑥̅11 = 𝜔𝜔𝑥𝑥̅1 (24) 9
(24)
1
−2 ` −2
𝑥𝑥2
− 𝑥𝑥̅(𝑥𝑥̅2 1)` 𝑆𝑆
(𝑥𝑥̅1 =
𝑇𝑇𝑥𝑥2 = 𝑇𝑇 𝑥𝑥̅2 )𝑥𝑥̅2𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
−𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 𝜔𝜔𝑥𝑥̅ 𝑥𝑥̅22= 𝜔𝜔𝑥𝑥̅2 (25) (25)
Fisher linear FisherFisherlinear
classification linear classification
classification
method (Sheth, method method2019; (Sheth, 2019;
(Sheth,
Fisher, Fisher,
2019;
1936). 1936).
Fisher,
The TheThe
1936).
insensitivity insensitivity
insensitivity
insensitivity
ofinsensitivity
the SUC of the of
of the
SUC
the
methodof SUC SUC
the method SUC
towards method
method towards
method
outliers towards
towards outliersoutliers
and
andoutliers
towards andand
and
outliers
overfitting overfitting
overfitting is dueisisdue
overfitting
is overfitting
due due
is due
to the to the
to the
to the F-weight F-weight
F-weight
F-weight
toapplication
the F-weight application
application
application
to the application to the
data to
to the
data
the
sets. to data data
sets.data
the sets.
sets.sets.
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
To
To evaluate theTo
evaluate
To evaluate
evaluate
To the performance
evaluate
performance theperformance
the performance
the ofperformance
the SUC of the of
of SUCthe
method, the
of SUC SUC
method,
the SUC
the method,
method,
method,
group thegroup
the group
the evaluation
group the evaluation
evaluation
evaluation
group evaluation
criteria
criteria are
criteria are
obtained
are
criteriaasare
criteria are obtained obtained
obtained as
in obtained in as as
Equationsas24intoEquations in
Equations
in Equations
Equations 26. 24 to 24
26.
24 to
to 26.
26.
24 to 26.
` −2 ` −2
𝑇𝑇𝑥𝑥1 𝑇𝑇
𝑇𝑇𝑥𝑥1 = (𝑥𝑥̅ = 𝑇𝑇 (𝑥𝑥̅== −
)1`1𝑆𝑆(𝑥𝑥̅
−2
(𝑥𝑥̅ 𝑥𝑥̅2− − 2))`𝑥𝑥̅𝑆𝑆
)𝑥𝑥̅𝑥𝑥̅𝑆𝑆𝑥𝑥̅𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑆𝑆𝑥𝑥̅−2 𝑥𝑥̅𝑥𝑥̅1𝜔𝜔𝑥𝑥̅
)1`1𝑆𝑆=𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
−2 1= = 𝜔𝜔𝑥𝑥̅
𝑥𝑥̅11𝜔𝜔𝑥𝑥̅ (24) (24) (24)(24)
(24) (24)
1 −𝑥𝑥𝑥𝑥1𝑥𝑥̅1𝑇𝑇 2 𝑥𝑥 = 11(𝑥𝑥̅
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 11 − 2= 𝜔𝜔𝑥𝑥̅ 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
2𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 1𝜔𝜔𝑥𝑥̅
1
1
−2 ` −2
𝑇𝑇𝑥𝑥2 𝑇𝑇
𝑇𝑇𝑥𝑥2 = (𝑥𝑥̅ = 𝑇𝑇 (𝑥𝑥̅2== −
)1` 𝑆𝑆(𝑥𝑥̅
−2
(𝑥𝑥̅ 𝑥𝑥̅2− )𝑥𝑥̅`𝑥𝑥̅𝑆𝑆𝑥𝑥̅𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
− 2))`𝑥𝑥̅𝑆𝑆
𝑆𝑆𝑥𝑥̅−2
)2`2𝑆𝑆=𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑥𝑥̅𝑥𝑥̅2𝜔𝜔𝑥𝑥̅
−2
𝑥𝑥̅=22𝜔𝜔𝑥𝑥̅
𝜔𝜔𝑥𝑥̅ (25) (25)
(25) (25) (25)(25)
1 −𝑥𝑥𝑥𝑥2𝑥𝑥̅2𝑇𝑇 𝑥𝑥 = 11(𝑥𝑥̅
2 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 12 − 2= 𝜔𝜔𝑥𝑥̅ 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
2𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
2= = 2𝜔𝜔𝑥𝑥̅
2
2
𝑇𝑇𝑥𝑥 + 𝑇𝑇𝑥𝑥𝑇𝑇𝑥𝑥1 +𝑇𝑇𝑇𝑇
𝑇𝑇𝑥𝑥𝑥𝑥 + 𝑇𝑇𝑥𝑥
𝑇𝑇𝑥𝑥1 𝑥𝑥2 =𝑇𝑇𝑥𝑥1 𝑥𝑥𝑇𝑇12𝑇𝑇𝑥𝑥2𝑑𝑑
𝑥𝑥= =
, 𝑑𝑑 𝑥𝑥112+ 𝑇𝑇,𝑥𝑥𝑇𝑇1𝑑𝑑
1 𝑥𝑥2 = , , 𝑥𝑥𝑑𝑑21
+2 𝑇𝑇 𝑑𝑑,==𝑑𝑑11= 1 (26) (26)
(26) (26) (26)(26)
𝑥𝑥22=
1𝑥𝑥𝑇𝑇
1 2𝑥𝑥1 𝑥𝑥2
2𝑑𝑑 ==2𝑑𝑑 2𝑑𝑑
2𝑑𝑑
Thus,
Thus, Equation Thus, Equation
Thus, 26 Equation 26benchmark
isEquation
Equation the is26 theisisbenchmark
the benchmark
value valuefor
used used
value for
used classification
forclassification
classification classification
tasks. tasks.tasks.
tasks.
Thus, Equation Thus, 26 is the26 26the
benchmark is benchmark
the benchmark
value value
used used
value
for for
used for
classificationclassification tasks.
To To
assign assign
To
an assign
objectan object
an
to object
any toof any to
the of
any
groupstheof groups
the correctly,
groups
correctly, the
correctly,
the following following
the following
tasks. To assign To assign
Toanassign an
object an object
to any object to
of the any of
to groups the
any ofcorrectly, groups
the groups correctly,
the correctly, the following
following the following
decisiondecision decision
criterion criterion
wascriterion was was
adopted adoptedbasedadopted based
on basedon Equations
on on Equations24 to 242624 toas26
26 26 as as
decision decision
criterion decision criterion
was adopted
criterion waswas adopted
based on Equations
adopted based
Equations
based on 24
24 to
to 26
Equations 26 as
Equations 24
as to to as
presented
presented in presented
Equation in Equation
in27. Equation 27. 27.
presented in presented
Equation
presented in27. Equation
in Equation 27. 27.
Classify Classify 𝑤𝑤 𝑥𝑥 to ∆ 1to > if 𝑤𝑤 𝑥𝑥if >𝑤𝑤, 𝑥𝑥𝑇𝑇 , 𝑥𝑥otherwise
𝑥𝑥2,,assign
otherwiseassign to ∆12 toif ∆2 if
𝑤𝑤1∆𝑥𝑥1 𝑤𝑤
assign
Classify 𝑤𝑤Classify1 to 1∆𝑤𝑤
𝑥𝑥1Classify 11𝑤𝑤1if
1𝑥𝑥𝑥𝑥 𝑤𝑤to
𝑤𝑤 𝑥𝑥11 ∆∆
111𝑥𝑥 to 𝑇𝑇if
11 𝑥𝑥 ∆111𝑥𝑥𝑤𝑤2if
𝑥𝑥1>
𝑥𝑥11𝑤𝑤>𝑥𝑥𝑥𝑥12𝑇𝑇𝑇𝑇>
otherwise 𝑥𝑥11𝑥𝑥𝑇𝑇 otherwise
, otherwise to
𝑤𝑤1 𝑥𝑥1assign 2𝑤𝑤if11𝑥𝑥𝑥𝑥
assign 1 1to
𝑤𝑤 𝑥𝑥1 ∆to if
∆ if
𝑤𝑤𝑥𝑥1 < 𝑤𝑤 <. In 𝑇𝑇𝑥𝑥< . In a. simplerIn a simpler form, this decision
form, this
2𝑥𝑥1 𝑥𝑥2
criteria
decision can
criteria becan
written
be as 2 as2
written
𝑥𝑥𝑤𝑤
𝑇𝑇𝑥𝑥𝑥𝑥11𝑤𝑤 2𝑥𝑥 1𝑤𝑤
𝑥𝑥 1a 𝑇𝑇
𝑥𝑥2simpler
< 𝑇𝑇<11𝑥𝑥𝑇𝑇2𝑥𝑥2.1In
𝑥𝑥 𝑥𝑥 form,
𝑥𝑥2 .aIn simpler this
a simpler decision
form, form, criteria
this thisdecision can
decision be written
criteria as
can can
criteria be written
be written as as
in Equation 1 𝑥𝑥1 𝑥𝑥 27. 27.
in Equation in in27.Equation
Equation
in Equation 27. 27.
𝑤𝑤𝑥𝑥 >{𝑤𝑤 >
𝑤𝑤
𝑇𝑇𝑥𝑥𝑥𝑥11𝑥𝑥𝑤𝑤 𝑇𝑇𝑥𝑥>
assign
2 𝑥𝑥𝑥𝑥1𝑤𝑤 >1𝑥𝑥𝑥𝑥1𝑇𝑇 >
assign
2𝑇𝑇𝑥𝑥object
𝑥𝑥2 assign
object
assign to ∆object to ∆1to ∆
1object to ∆to1 ∆1 (27) (27)
∆ = { ∆ 1=∆∆ 𝑥𝑥11𝑥𝑥𝑇𝑇 𝑥𝑥1 𝑥𝑥2 assign object (27)
𝑤𝑤𝑥𝑥1 < 𝑤𝑤 ==
𝑇𝑇𝑥𝑥𝑥𝑥∆{
{= <
𝑤𝑤 {
1 𝑇𝑇𝑥𝑥<
assign 𝑥𝑥2𝑇𝑇 assign
2
object object
assignto ∆ to ∆2to ∆1 (27)
object (27)(27)
1 𝑤𝑤 <𝑥𝑥1𝑇𝑇< 𝑥𝑥2 assign object to ∆to
1𝑥𝑥 1
Smart Univariate Bayes 𝑥𝑥𝑥𝑥11𝑤𝑤
2Classifier 𝑥𝑥𝑥𝑥11𝑥𝑥𝑇𝑇 assign object 22 ∆2
2
2𝑥𝑥1 𝑥𝑥2
Smart Univariate Bayes Classifier
The hybrid method (SUBC) is a combination of the SUC and 10 BC 10 10
The hybrid method (SUBC) is a combination of the SUC and BC 10 10
procedures with two phases. The first phase utilises the SUC
procedures (see
procedures with two phases.
Equations 17-23),The while first
in the phase second utilises
phase,the SUC
we apply
procedures
the BC procedures (see Equationsas described 17-23), in Equations while in6 to the16second
to buildphase,
the SUBC we
apply the
model and BC assign procedures
the group asmembership.
described in ItEquations implies that 6 tothe16proposed
to build
the SUBC
SUBC method model and assign
mimics and retains the group themembership.
basic characteristic It implies of that
the the
BC
decision criteria. Therefore, the algorithm for SUBC method can of
proposed SUBC method mimics and retains the basic characteristic be
the BC decision
explained criteria. Therefore,
in the following Algorithmthe 1. algorithm for SUBC method
can be explained in the following Algorithm 1.
Algorithm
Algorithm1:1:Algorithm
Algorithmfor forSUBC
SUBCmethod
method
Phase 1 - Utilise the SUC procedure as follows
Step 1: Determine the inliers or outliers (apply Equation 19).
Step 2: If outliers exist, transform the outliers into inliers.
Step 3: Repeat Step 1 and Step 2 to obtain a reweighted F-weight.
Step 4: Merged the inliers to form the reweighted data set 𝑤𝑤 ̌𝑖𝑖 .
Step 5: Compute the mean vectors and sample variance of 𝑤𝑤 ̌𝑖𝑖
in a similar way as in Equations 18 to 20.
Step 6: Substitute all values obtained from Step 4 into Equations
21 to 23 to obtain the model’s coefficient.

Phase 2 - Build the SUBC model as follows


Step 1: Apply the BC procedures as described in Equations 6 to 15
to build the SUBC model.
Step 2: Assign the group membership using the decision criteria in
Equation 16.
10

Data Collection
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

Data Collection

This study aims to investigate the comparative classification


performance of the proposed methods (SUC and SUBC) against the
conventional methods (BC and PC). In addition, we examined the
breakdown of the methods based on random outliers. Finally, we also
studied the relationship to determine whether the strong, moderate,
or weak correlations correspond to a minimum or maximum
misclassification rate. To achieve these objectives, we applied
two types of data sets (continuous and discrete). Part A consists of
continuous data set, while Part B consists of discrete data set.

Part A: This section consists of four data sets. The first data set was
based on the effect of quality of sleep; short sleep (1-4 hours), and long
sleep (5-8 hours) in relation to undergraduate academic performance
with an emphasis on grade point average (GPA) categorisation using
Pittsburgh Sleep Quality Index (PSQI) Scale and Perceived Stress
Scale (PSS) (Lok, 2018). Then, using the methods discussed, which
were BC, PC, SUC and SUBC, we applied the PSQI and PSS scales
to classify students into graduate classes based on their corresponding
GPA using the methods discussed. The second data set consisted
of the air quality index for three locations in Malaysia (Putrajaya,
Kuala Lumpur and Petaling Jaya) before the outbreak of the Covid-19
pandemic in 2019 (14/10/2019-10/11/2019) and during the same
period with the Covid-19 pandemic in 2020 (14/10/2020-10/11/2020)
(Department of Environment, 2020). The data set was collected
with two conditions: without movement control order (2019) and
with movement control order (pandemic period 2020). The third
data comprised the PH water level in Ampang Pecah and Kg. Timah
(Department of Environment, 2020), while the fourth data consisted
of body weight measurement (mg) of wide and laboratory-bred female
and male Aedes albopictus mosquitoes and body size measurements
of Aedes albopictus mosquitoes (Okwonu et al., 2012).

Part B: This section covered discrete data and consisted of two data
sets. The first data set in this section covered the Covid-19 daily
report. The data set was paired as follows: confirmed and discharged,
confirmed and dead, discharged and dead patients. We used the
Covid-19 data set from Malaysia (Kementerian Kesihatan Malaysia,
2020) and Nigeria (NCDCgov, 2020). The second data set in this
section comprised road traffic accidents via car and motorcycle.

11
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

These data were categorised based on severe injury, slight injury, and
the annual summary of car/motorcycle road traffic accidents (Jabatan
Siasatan dan Penguatkuasaan Trafik, 2020).

Therefore, the performance of the classification methods for


continuous and discrete data was investigated to infer information to
enable us to distinguish the performance of these methods based on
classification accuracy and correlation value.

Comparative Performance Analysis

In this section, the comparative performance analysis was carried out


using the four evaluated
methods.using
evaluated using
The thetheoptimum
performance optimum probability
of the proposedofmethods
probability ofexact
exactclassification
classification
was evaluated evaluated
(OPEC).
(OPEC).
using the TheTheusing
computed
optimum computedthe optimum
probabilities
probability probability
probabilities
of of
exact of exact
exact of exact classification
classifications
classifications
classification (PEC)
(PEC)
evaluated using the optimum
(OPEC). The computed probability of exactof classification
probabilities exact classifications (PEC)
(Okwonu,
(Okwonu, Ahad,
Ahad, Ogini
Ogini etof etal.,
al.,2022)
2022) ofofthese
these methods
methods were based on
(OPEC). The
(OPEC). The computedcomputed
(Okwonu,
probabilities
probabilities
Ahad, Ogini ofet
exact
exact
al.,
classifications
classifications
2022) of these (PEC) were based on
(PEC)
methods
were based
on
(Okwonu, the
Ahad, the hit
hit
Ogini ratio
ratio et compared
compared
al., 2022) tooftothe the
theseOPEC
OPEC value.
methods value. The
were The misclassification
misclassification
based on error
error
(Okwonu, Ahad, Ogini
the(𝜀𝜀)
hit ratioet al., 2022)
compared ofto these
the methods
OPEC value. were The based on
misclassification error
the hit ratio raterate
ratio compared
compared (𝜀𝜀)can to be
canthe
thebeobtained
obtained
OPEC byby
value. computing
computing
The the thedifference
misclassification differenceerrorbetween
betweenOPEC OPEC
the hit rate (𝜀𝜀) to
can be OPEC
obtained value. by The misclassification
computing the differenceerror between OPEC
and PEC, which is 𝜀𝜀 = 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 − 𝑃𝑃𝑃𝑃𝑃𝑃. The robustness of these
rate (𝜀𝜀) can beand obtained
and
PEC,bywhich
PEC, computing
which
is 𝜀𝜀 the
is by 𝜀𝜀 by
= 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
=the difference
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
− 𝑃𝑃𝑃𝑃𝑃𝑃.
between
−of𝑃𝑃𝑃𝑃𝑃𝑃.
The OPEC
The
robustness of these
robustness of these
methods
methods can can be be inferred
inferred the value
value of𝜀𝜀.𝜀𝜀.AAminimal
minimal value associated
and PEC, which methods is 𝜀𝜀can = 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
be − 𝑃𝑃𝑃𝑃𝑃𝑃.
inferred by Thevalue
the robustness
of 𝜀𝜀. A of thesevalue
minimal value
associated
associated
methods can be withwith 𝜀𝜀 𝜀𝜀implies
inferred implies
by the that
that
valuethe theof methods
methods are
𝜀𝜀. A minimal arerobust.
valueThe
robust. Thebreakdown
breakdownofofthese
associated these
with
methods
methods 𝜀𝜀 implies
was was that
investigated
investigated the methods are robust.
bybyintroducing
introducing random
randomTheoutliers
breakdown
outliers totothe ofPSQI
the these
PSQI
with 𝜀𝜀 implies methods
that the was methods are
investigated robust. The breakdown
by introducing random of these
outliers to the PSQI
and andPSS PSSdata data sets.
sets.Therefore,
Therefore, the robustness and the
thebreakdown
methods was investigated
and PSS by introducing
data sets. Therefore, randomthe the
robustness
outliers
robustness to the PSQIand
and the
breakdown
breakdown
capability
and PSS data capability
sets. Therefore, of the
of the the methods
methods robustness discussed
discussed above
and the above can
can be determinedbyby
breakdown be determined
capability
comparing
comparing of
the the
difference methods between discussed
OPEC above
and can be determined by
PEC.
capability of the methodsthediscussed difference between canOPEC and PEC.
comparing the differenceabove between be determined
OPEC and PEC. by
comparing theThe difference
The evaluation
evaluation between benchmark OPEC and
benchmark for
forPEC.
this
thisstudy
studywas wasdesigned
designedusing usingthe the
The
acceptable evaluation
acceptable benchmark
benchmark
The evaluation benchmark for this study was designed
benchmark approach
approach for this
called
called study
thethe was
Optimum
Optimum designed using
Probability
Probability the
ofof
designed using the the
acceptable
Exact
Exact Classification benchmark
Classification
acceptable benchmark approach called the Optimum (OPEC)
(OPEC) approach as asshowncalled
shown in in the
Equation Optimum
Equation 28.
28. Probability of
Optimum Probability of
Exact Classification Exact(OPEC) Classification (OPEC) as shown
Equation 28.in Equation 28.
as shown
(𝑥𝑥̅(𝑥𝑥̅
1− 1̅−𝑥𝑥2̅𝑥𝑥)in
2)
Equation 28.
∇ ∇ = = (𝑥𝑥̅𝑠𝑠1 −
(𝑥𝑥̅ − ̅𝑥𝑥 ) ∇ = 𝑠𝑠 ̅𝑥𝑥2 )

∇ = 1 2 𝑠𝑠
𝑠𝑠
𝛿𝛿 𝛿𝛿==𝜃𝜃(∇) 𝜃𝜃(∇) (28) (28) (28)
𝛿𝛿 = 𝜃𝜃(∇) (28)
𝛿𝛿where
= 𝜃𝜃(∇) represents the (28) function. Equation
where represents wherethe 𝜃𝜃 𝜃𝜃standard
represents normal thestandard
standard
distribution normal
normal distribution
distribution
function. Equation function. Equation
28 is used to 28 where
28
obtainisisused 𝜃𝜃 represents
used
the totoobtain
OPEC obtainvalue, the
the standard
theOPEC
whileOPEC normal
value,
value,
Equation whiledistribution
while
29 isEquation
Equation
applied function.
to Equation
2929isisapplied
applied toto
where 𝜃𝜃 represents the standard normal distribution function. Equation
compute the 28
compute is
compute
probability usedthe the
ofto obtain
probability
probability
misclassification. the of OPEC
of value,
misclassification.
misclassification.while Equation 29 is applied to
28 is used to obtain the OPEC value, while Equation 29 is applied to
compute the probability of misclassification.
compute the probability of𝜀𝜀misclassification.
𝜀𝜀==𝜃𝜃(∇) 𝜃𝜃(∇)−−𝑃𝑃𝑃𝑃𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃 (29)
𝜀𝜀
𝜀𝜀 = 𝜃𝜃(∇) − 𝑃𝑃𝑃𝑃𝑃𝑃 = 𝛿𝛿 − 𝑃𝑃𝑃𝑃𝑃𝑃 = =𝜃𝜃(∇)
𝛿𝛿 − −𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 (29)
(29)
= 𝛿𝛿 − 𝑃𝑃𝑃𝑃𝑃𝑃 = 𝛿𝛿 − 𝑃𝑃𝑃𝑃𝑃𝑃 (29) (29)
Equation 30 Equationbelow
Equation describe
3030below belowthe describe
optimal
describemisclassification
the
theoptimal rule to
optimalmisclassification
misclassification rule
ruletoto
determine how
Equation 30 below robust
Equation
determine
determine the 30
how
describe classifiers
below
robust
how robust the optimal theis.
describe However, the
classifiers
the classifiers Equation
optimal
is. However,
is. However,
misclassification 30 often
misclassification
Equation
rule Equation
to 30 30rule
often to
often
lead to high
determine how misclassification
leaddetermine
lead totohigh
robust how rate
robust
themisclassification
high hence
misclassification
classifiers the we
is. However, developed
classifiers
rate
ratehence is.
hence we
Equationan alternative
However,
wedeveloped Equation
30 oftenananalternative
developed 30 often
alternative
optimal evaluationlead
optimal
optimal to high
criteria
evaluation misclassification
for
evaluation classification
criteria
criteria
lead to high misclassification rate hence we developed an alternative forfor rate hence
performance.
classification
classification we developed
performance.
performance. an alternative
optimal
optimal evaluation criteria for evaluation criteria for classification performance.
12 𝜀𝜀𝜀𝜀𝜀𝜀𝜀𝜀classification
==1 1−−𝑃𝑃𝑃𝑃𝑃𝑃 𝑃𝑃𝑃𝑃𝑃𝑃
performance. (30)
(30)
𝜀𝜀𝜀𝜀 = 1 − 𝑃𝑃𝑃𝑃𝑃𝑃 𝜀𝜀𝜀𝜀 = 1 − 𝑃𝑃𝑃𝑃𝑃𝑃 (30) (30)
Equation
Equation2929isisalso alsoused usedtotodetect detectthe theoverfitting
overfittingofofthese thesemethods.
methods.IfIf𝜀𝜀 𝜀𝜀
has Equation
a negative 29 is also itused
value, implies to detect the overfitting
that overfitting of these methods.
has occurred. Equations If 𝜀𝜀
Equation 30 below describe the optimal misclassification rule to
determine how robust the classifiers is. However, Equation 30 often
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
lead to high misclassification rate hence we developed an alternative
optimal evaluation criteria for classification performance.
𝜀𝜀𝜀𝜀 = 1 − 𝑃𝑃𝑃𝑃𝑃𝑃 (30) (30)

Equation 29Equation 29 is
is also used to also used
detect thetooverfitting
detect the of
overfitting of theseIfmethods. If 𝜀𝜀
these methods.
has a negative value, it implies that overfitting has occurred. Equations
has a negative value, it implies that overfitting has occurred. Equations
28 and 29 are applied to evaluate the performance
28 and 29 are applied to evaluate the performance of the classification of the classification
methods by methods
focusing byon focusing on theofproportion
the proportion correct groupof correct group membership
membership
prediction
prediction (Jimoh (Jimoh
et al., 2022;etHuberty
al., 2022; Huberty &
& Holmes, Holmes,
1983; Alf &1983; Alf &
Abrahams,
Abrahams, 1968; Levy, 1968;
1967).Levy, 1967).
Another Another
method method
similar similar29
to Equation to Equation 29
toperformance
to analyse the analyse the performance
of classifiersof classifiers
was discussedwas discussed
in Mohd Noorin Mohd Noor
et al. (2020). The data sets used in this paper are applied to Equation
28 to obtain the evaluation benchmark. 13

RESULTS AND DISCUSSION

Part A

Table 1 shows the performance analysis of the four methods using the
first continuous data set. The values reported in Table 1 are the PEC
values. In Table 1, the outliers are randomly introduced to the original
data to determine the robustness and the breakdown of the methods.
We observe that the BC and the SUBC are more robust than the other
methods because the PEC values of these two methods (0.9933) are
equal to the OPEC value () for the uncontaminated data set. In the
second row of Table 1, we introduce ten random outliers (RO) into
the original data set. Then, in the third row, we add eight outliers
in addition to the ten outliers in Row 2, making it 18. Finally, we
introduce outliers for the other rows in a similar procedure. As more
outliers are introduced, the proposed SUC method shows outstanding
performance over the other three methods since 0.8600 is the highest
among the four methods.

Table 1

Performance Analysis of PSQI/PSS Data for Graduate Categories


and the Effect of Outliers (n=150)

BC PC SUBC SUC Random Outliers (RO)


0.9933 0.9133 0.9933 0.9300 -----
16=1,20=2,21=41,17=57,12=32,9=19
0.9667 0.9067 0.9667 0.9233
,5=15,2=12,1=11,3=13 (RO=10)
(continued)
13
BC PC SUBC SUC Random Outliers (RO)
0.9933 0.9133 0.9933 0.9300 -----
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
16=1,20=2,21=41,17=57,12
0.9667 0.9067 0.9667 0.9233 =32,9=19,5=15,2=12,1=11,3
BC PC SUBC SUC Random=13Outliers
(RO=10)(RO)
17=5,30=3,17=7,20=1,4=14,
17=5,30=3,17=7,20=1,4=14,11=31,5
0.9267 0.8833 0.9267
0.9267 0.8833 0.9267 0.9033 0.9033
11=31,5=15,7=27 (RO=18)
=15,7=27 (RO=18)
23=3,27=2,19=1,5=25,2=22,
0.8867 0.8633 0.8867 0.8833 23=3,27=2,19=1,5=25,2=22,6=26
0.8867 0.8633 0.8867 0.8833 6=26 (RO=24)
(RO=24)
32=2,22=2,9=29,16=46
0.86 0.8533
0.86 0.85330.860.86 0.8667 0.8667
32=2,22=2,9=29,16=46 (RO=28)
(RO=28)
0.8467 0.8467
0.8467 0.84670.8467
0.84670.86000.8600 23=3,9=69
23=3,9=69 (RO=30)
(RO=30)
𝛿𝛿 = 0.999, 𝑟𝑟 = 0.283, 𝑟𝑟 2 = 0.08
Figure 1 displays the breakdown of the different methods based on the
Figure
PEC and1 displays the breakdown
the number of random of the different
outliers methods
introduced. Thebased on the14
performance
PEC and in
analysis theFigure
number of random
1 shows that outliers introduced.
SUBC performs The performance
as good as BC, and it
analysis
remains in Figure 1regardless
consistent shows thatofSUBC performs
the number as good
of outliers inastheBC,
dataand
set.
itHowever, when the regardless
remains consistent number ofofrandom outliers
the number increases,
of outliers the data
in the SUC
outperforms
set. However,the other
when themethods.
number ofThus, it canoutliers
random be concluded thatthe
increases, theSUC
SUC
method is more robust when the data set comprise more outliers.
outperforms the other methods. Thus, it can be concluded that the
SUC method is more robust when the data set comprise more outliers.

Figure 1
Figure 1
Comparative Breakdown
Comparative BreakdownAnalysis
Analysisof
ofthe
theClassification
ClassificationMethods
Methods

1.1
1
PEC

0.9
0.8
0.7
1 2 3 4 5 6
Random outliers

BC PC SUBC SUC

Table 2
Table 2
Probability of Correct Classifications
Probability of Correct Classifications

BCBC PC PC SUBCSUBC SUCSUC


PCC 0.9933
0.9933 0.9167
0.9167 0.9933
0.9933 0.9300
0.9300
1-PCC
1-PCC 0.0067
0.0067 0.0833
0.0833 0.0067
0.0067 0.0700
0.0700

Table 2 describes the probability of correct classification (PCC) for


14
each method. The probability shows the number of students is
correctly classified to their respective graduate classes. Most of the
methods show that the majority of the objects belong to the actual
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

Table 2 describes the probability of correct classification (PCC)


for each method. The probability shows the number of students is
correctly classified to their respective graduate classes. Most of the
methods show that the majority of the objects belong to the actual
groups, whereas minorities are misclassified to another group. From
the analysis, there are misclassified individuals who fail to fit the
classification of their graduate classes. Table 2 shows the observations
that are correctly classified and misclassified observations based on
the
thathit
areratio. Theclassified
correctly hit ratio and
values for each method
misclassified are discussed
observations based on as the
follows.
hit ratio.The
TheBC hit and
ratioSUBC
valuesachieved
for each the sameare
method classification
discussed as accuracy
follows.
of
The99.33
BC and percent,
SUBCfollowed
achievedby thethe
same SUC with 93.0accuracy
classification percent of correct
99.33
classification. The PC reaches 91.67 percent of prediction
percent, followed by the SUC with 93.0 percent correct classification. accuracy.
The performance
The PC analysis
reaches 91.67 percent(seeofTable 1) reveals
prediction thatThe
accuracy. the performance
correlation
value
analysisis (see
positively
Table weak. Thethat
1) reveals weakthecorrelation
correlationvalue
valueindicates
(𝑟𝑟 = 0.28) thatis
the minimum
positively misclassification
weak. The weak rate is associated
correlation valuewith a weak positive
indicates that the
minimum misclassification
correlation value. It can be rate is associated
further explained thatwith the
a weak
exact positive
group
correlation value.
classification does Itnotcan be further
translate explained
to a strong that the exact
relationship group
of a group
classification Itdoes
membership. means notthat
translate to a strong performance
the classification relationship of a group
cannot be
membership.
compared withItthe
means
levelthat the classification
of association performance
of the data set. This cannot
is a newbe
compared
concept with thethe
of relating level of association
classification of the data
performance set.theThis
with is a new
strength of
the correlation value. The rest of the analysis for the other data sets isof
concept of relating the classification performance with the strength
the correlation
based value. Thebetween
on the comparison rest of the
theseanalysis for thethe
two criteria: other data setsofis
proportion
based on
correct the comparison
classification between
and the these among
correlation two criteria: the proportion of
the methods.
correct classification and the correlation among the methods.
Table
Table33

Performance
Performance analysis of cumulative
analysis of cumulative air
air quality
qualityindex
indexbefore
before(2019)
(2019)
and during lockdown (2020) (n=84)
and during lockdown (2020) (n=84)

BC
BC PC PC SUBC
SUBC SUCSUC
0.5833
0.5833 0.5833
0.5833 0.610.61 0.593
0.593
𝛿𝛿 = 0.702, 𝑟𝑟 = −0.1481, 𝑟𝑟 2 = 0.0219

Table 33 to
Table to Table
Table66contain
containthe
theclassification
classificationofofthe
theairairquality
qualityindex
index
(AQI) before
(AQI) beforeand
andduring
duringthe
theCovid-19
Covid-19pandemic
pandemicininMalaysia.
Malaysia.InInTable
Table
3, the
3, the SUBC
SUBC method
method (86.9%)
(86.9%)isisthe
thebest
bestclassifier,
classifier,followed
followedby bySUC
SUC
(84.5%), while both BC and PC account for 83.1%. However,
(84.5%), while both BC and PC account for 83.1%. However, this this data
set is weakly negatively correlated (𝑟𝑟 = −0.15); therefore,
data set is weakly negatively correlated (); therefore, the minimum the
minimum misclassification error rates relate to a very
misclassification error rates relate to a very weak and negative weak and
negative relationship.
relationship.
Table 4 15

Performance Analysis of Putrajaya AQI before (2019) and during


Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

Table 4

Performance Analysis of Putrajaya AQI before (2019) and during


Lockdown (2020)

BC
0.607 PC
0.518 SUBC
0.571 SUC
0.518
0.607
0.607 0.518
0.518 0.571
0.571 0.518
0.518
𝛆𝛆𝛆𝛆 0.607 0.518 0.571 0.518
𝛆𝛆𝛆𝛆
𝛆𝛆𝛆𝛆 0.393
0.393 0.482
0.482 0.429
0.429 0.482
0.482
0.393
0.393 0.4820.482 0.429
0.429 0.482
0.482
𝛆𝛆 =
= 𝜹𝜹𝜹𝜹 −
− 𝑷𝑷𝑷𝑷𝑷𝑷
𝑷𝑷𝑷𝑷𝑷𝑷 -0.001 0.088 0.035 0.088
𝛆𝛆𝛆𝛆 = 𝜹𝜹 − 𝑷𝑷𝑷𝑷𝑷𝑷 -0.001
-0.001
-0.001 0.0880.088
0.088 0.035
0.035
0.035 0.088
0.088
0.088
𝛿𝛿 =0.606,
𝛿𝛿 =0.606, 𝑟𝑟𝑟𝑟 = −0.063, 𝑟𝑟𝑟𝑟222 =
= −0.063, = 0.0004
0.0004
𝛿𝛿 =0.606, 𝑟𝑟 = −0.063, 𝑟𝑟 = 0.0004
As shown
As shown in in Table
Table 4, an
4,an overfitting
anoverfitting problem
problemoccurs
overfittingproblem occursfor forthe
theBCBCmethod
method
As shown
(ε = in TableIt is4, considered overfitting occurs for the BC method


= −0.001).
=
−0.001). It
−0.001). It is considered
is overfitting in
considered overfitting in this
in this aspect
this aspect because
aspect becausethe
because the
the
OPEC (𝛿𝛿)
OPEC (𝛿𝛿) is is used
used as as the
the performance
performance benchmark.
benchmark. Nevertheless,
Nevertheless, the
the
OPEC
BC is is (𝛿𝛿) is usedbyas applying
in order
order the performance
the basicbenchmark.
axiom of Nevertheless,
probability. the
The
BC
BC in
is in order by applying
by implies
applyingthat the
thethe basic
basic axiom of probability.
axiom of probability. The
The
following
following criterion
criterion implies thatthat the misclassification
misclassification error
the misclassification error forfor BCBC
following
criterion implies
(refer to Equation 30) is equal to error for BC
(refer to Equation 30)
(refer to Equation 30) is equal to is equal to
𝜀𝜀𝜀𝜀 = 1 − 𝑃𝑃𝑃𝑃𝑃𝑃 = 1 − 0.607 = 0.393
𝜀𝜀𝜀𝜀 =
𝜀𝜀𝜀𝜀 = 11− −𝑃𝑃𝑃𝑃𝑃𝑃
𝑃𝑃𝑃𝑃𝑃𝑃 == 11− −0.607
0.607 = = 0.393
0.393
Applying this criterion to the classification problem suggests suggests thatthat the
the
Applying
Applying
BC method this
this criterion to
criterion
performs to the
the classification
classification problem
problem suggests
suggests thatthat the
the
error
poorly, with a significant misclassification error
BC
BC method
ratemethod performs
of 39.3 performs poorly,
poorly, withwith a significant misclassification error
percent. Therefore, the ajustification
significant formisclassification
applying the error
the OPEC
OPEC
rate
rate of
of 39.3
39.3
as an evaluation percent.
percent. Therefore,
Therefore, the
the justification
justification for
for applying
applying the
the OPEC
OPEC
evaluation and validation benchmark benchmark is is adopted
adopted andand justified.
justified.
as an
as an evaluation
evaluation and and validation benchmark
benchmark isis adopted
adopted and justified.
justified.
The SUBC classifier validation
correctly predicts a group and
membership
membership of
of 94.2
94.2
The SUBC
The SUBC classifier
classifier correctly
by correctly
PC and SUC predicts
withaa group
predicts grouppercent,
85.5 membership
membership of 94.2
of
respectively,94.2
percent, followed percent, respectively,
percent, followed
percent, followed by by PCPC andand SUC
SUC withwith 85.5
85.5 percent,
percent, respectively,
respectively,
based on the OPEC criterion. The correlation value value obtained
obtained is is
based
based
negative onand
on thevery
the OPEC
OPEC weak criterion.
criterion.
(-0.063). The
The
The correlation
correlation
effect of value isobtained
value
outliers obtained
generally isis
negative and very weak (-0.063). The effect of outliers is generally
negative
negative
negative and and
in all very
very
all weak (-0.063).
weak (-0.063).
statistical and The effect
The effect
probability of outliers
of outliers
models. That is
is generally
is generally
why the
negative
negative in
in statistical
all statistical
statistical and probability
andmethods
probability models.
models. That That
That isis why
why the the
negative
proposed
proposed in all
robust
robust classification
classification and probability
methods are
are models.
essential.
essential. is why the
proposed robust classification methods
proposed robust classification methods are essential. are essential.
Table 5
Table 55
Table
Table 5
Performance Analysis of Kuala Lumpur AQI before (2019) and
Performance
Performance
Performance
during Lockdown Analysis
Analysis
Analysis(2020)ofofKuala
of KualaLumpur
Kuala
(n=28) LumpurAQI
Lumpur AQI before
AQI before (2019)
before (2019) and
(2019) and
and
during Lockdown
during Lockdown (2020) (n=28) (2020) (n=28)
BC PC SUBC SUC
BC
BC
BC PCPC
PC SUBC
SUBCSUBC SUC SUC
SUC
0.679
0.679 0.643
0.643 0.679
0.679 0.696 0.696
0.679
𝛿𝛿=0.905,
0.679 0.643
𝑟𝑟 = −0.251, 2
0.643 𝑟𝑟 = 0.063 0.679
0.679 0.696
0.696
𝛿𝛿=0.905, 𝑟𝑟𝑟𝑟 =
𝛿𝛿=0.905, −0.251, 𝑟𝑟𝑟𝑟22 =
= −0.251, = 0.063
0.063
Table
In Table 5, SUC
5, the the classifier
SUC classifier accurately
accurately predicts
predicts the the group
group membership
In
with
In Table
membership 5,with
the76.9
76.9 percent,
Table 5, the SUC
BC
SUC classifier
percent, BC and
andclassifier
SUBC accurately
SUBC
with predicts
withpredicts
75and
75 percent,
accurately PCthewith
percent,
the andgroup
PC
71
group
membership
with 71The
membership
percent. with
percent.
with 76.9
The
76.9 percent,
percent,
relationship BC
showsBC andshows
relationship
and
a weak SUBC
SUBC awith
with 75correlation.
weak
75
and negative percent, andPC
and negative
percent, and PC
with 71 percent.
percent. The
correlation.
with 71 The relationship
relationship shows
shows aa weak
weak andand negative
negative
16
correlation.
correlation.
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

Table
Table66
Table 6
Table
Table 66
Performance
PerformanceAnalysis
Analysisof Petaling Jaya AQI before (2019) and during
Performance Analysis ofofPetaling
PetalingJaya
JayaAQI
AQIbefore
before(2019)
(2019)and
andduring
during
Lockdown
Lockdown(2020)
Performance(2020) (n=28)
(n=28)
Analysis of
Lockdown (2020)
Performance
BC PC of Petaling
(n=28)
Analysis Petaling Jaya
SUBCJayaAQI
AQIbefore
before
SUC
(2019)
(2019)and
andduring
during
Lockdown
BC
BC (2020) (n=28)
PC SUBC SUC
Lockdown (2020)PC(n=28) SUBC SUC
BC
0.536
0.536 PC
0.518
0.518 SUBC
0.500 SUC
0.519
0.500 SUBC 0.519
0.519 SUC
BC0.536 0.518PC 22 0.500
𝜹𝜹𝜹𝜹=0.539,
=0.539, 𝑟𝑟𝑟𝑟==−0.1960,
−0.1960, 𝑟𝑟𝑟𝑟2 ==0.0384
0.0384
𝜹𝜹0.536 0.5180.518
=0.539, 𝑟𝑟 = −0.1960,
0.536 0.500 0.500
𝑟𝑟 = 0.0384 0.519 0.519
The
𝜹𝜹The performance
performance
=0.539, analysis
𝑟𝑟 =in
analysis
𝑟𝑟 = −0.1960, 2
in Table
Table 66 demonstrates
0.0384 demonstrates that
The performance analysis in Table 6 demonstrates that the BC that the
the BC
BC
achieves
achievesthe
achieves thehighest
the highestaccuracy
highest accuracyof
accuracy of 99.4
99.4percent
of99.4 percentin ininclassifying
classifying the
classifyingthe group
thegroup
group
The performance
performance
membership, followedanalysis
analysis
by SUCin Tablepercent
in (96.3%)
Table 66anddemonstrates
demonstrates
PC (96.1%). that
that the BC
the
Meanwhile, BC
membership, followed
membership, followedby bySUC
SUC(96.3%)
(96.3%)and andPC PC(96.1%).
(96.1%).Meanwhile,
Meanwhile,
achieves
SUBC the
the highest
highest accuracy
accuracy of
of 99.4
99.4 percent
percent in
in classifying
classifying the
thegroup
group
SUBCaccounts
SUBC accounts
accounts for
for92.8
for 92.8
92.8 percent,
percent, with
withaaaweakly
weakly negative
weaklynegative relationship.
negativerelationship.
relationship.
membership, followed
followed bybypercent,
SUC with
SUC(96.3%)
(96.3%) and
andPC PC(96.1%).
(96.1%).Meanwhile,
Meanwhile,
SUBC
SUBC accounts for 92.8 percent, with a weakly negative
Table77accounts for 92.8 percent, with a weakly negative relationship.
Table relationship.
Table 7
Table
Table 77
Performance
PerformanceAnalysis
Performance Analysisof
Analysis ofofPH
PHLevel
PH Levelof
Level ofofWater
Water
Waterin ininAmpang
Ampang Pecah
AmpangPecahPecahandand Kg.
andKg.
Kg.
Timah
Timah
Timah (Mm)
(Mm)
(Mm)
Performance (n=39)
(n=39)
(n=39)
Analysis of of PH
PHLevel
LevelofofWater
Waterinin Ampang Pecah
Performance Analysis Ampang Pecah andand
Kg.
Kg. Timah
Timah (Mm) (Mm) (n=39)
(n=39)
BC BC
BC PC PC
PC SUBCSUBC
SUBC SUC SUC
SUC
0.8611
0.8611
BC
0.8611 0.875
0.875
PC
0.875 0.8611
0.8611
SUBC
0.8611 0.8333
0.8333
SUC
0.8333
BC PC SUBC SUC
0.8611 0.875 0.8611 0.8333
0.8611
-0.001 0.875
-0.015 0.8611
-0.001 0.8333
𝜀𝜀𝜀𝜀𝜀𝜀 -0.001
-0.001
-0.001 -0.015
-0.015
-0.015 -0.001
-0.001
-0.001
2
𝛿𝛿𝛿𝛿𝛿𝛿=
= 0.8599,
0.8599,
= 0.8599, 𝑟𝑟𝑟𝑟== 0.0793,
0.0793,
𝑟𝑟 =-0.001 𝑟𝑟 22=
𝑟𝑟 = 0.0063
0.0063
0.0793, 𝑟𝑟 = -0.015
0.0063
𝜀𝜀 -0.001
2
𝛿𝛿 = analysis
The 0.8599, 𝑟𝑟 shows
= 0.0793,
that𝑟𝑟accurate
= 0.0063group membership prediction does
The analysis
The analysis shows shows that that accurate
accurategroup
groupmembership
membershippredictionpredictiondoes does
not
notreflect
not reflecthighly
reflect highlycorrelated
highly correlateddatadatasets. Instead
sets.Instead “the better
“thebetter
Instead“the betterthethe prediction
theprediction
prediction
The analysis shows thatcorrelation
accurate group membership prediction does
power, the
power,
power, the weaker
the weaker the
weaker the correlationvalue”. This
value”.This finding
findingisisisrelatively
Thisfinding relatively
relatively
not reflect
possible highly correlated data sets. Instead “the better the prediction
possiblefor
possible forcontinuous
for continuousdata
continuous datasets.
sets.Table contains
Table77contains
containsthe the classification
theclassification
classification
power,
analysis the weaker the correlation value”. This finding is relatively
analysisof
analysis ofthe
of thePH
the PHlevel
levelof ofwater
waterinintwo different
twodifferent locations.
differentlocations.
locations.This This was
Thiswas
was
possible
applied for continuous data sets. Table 7 contains the classification
appliedto
applied toobserve
to observeififthe
observe thePHPHlevel
levelofofwater quality
waterquality
qualityinininthese
these locations
locationsisisis
theselocations
analysisorof
unique the PH levelverifiedof water in two different locations. This was
unique
unique or orvaries.
varies.ItItwas
varies. was verifiedininTable that
Table77that the
thatthe BC,
theBC,BC,PC PC
PCandand SUBC
andSUBC
SUBC
applied
methods to observe if the PH level of water quality in these locations is
methods overfit
methods overfit (ε(ε==−0.001,
overfit −0.001,−0.015, −0.001)based
−0.001)
−0.015,−0.001) based
basedon ononthethe OPEC
theOPEC
OPEC
unique
benchmark. or varies. It was verified in Table 7 that the BC, PC and SUBC
benchmark. On
benchmark. On the
On the other
other hand,
hand,SUC shows
showsaaa96.9
SUCshows 96.9 percent
96.9percent accurate
percentaccurate
accurate
methods overfit
prediction of (ε = −0.001, −0.015, −0.001) based on the OPEC
prediction
predictionof ofaaagroup membership.The
groupmembership. Thecorrelation
correlationvalue
correlation valuedisplayed
value displayedfor
displayed for
for
benchmark.
this data On the other
isispositive
positive hand, SUC shows a 96.9 percent accurate
this
thisdata
datais positiveand veryweak,
andvery weak,0.08.
0.08.
prediction of a group membership. The correlation value displayed for
this
Table
Table
Table
Table data
888 is positive and very weak, 0.08.
Table 8
Performance Analysis ofof Wide
Wide
Performance
Performance Analysis
Analysis of WideFemale
Femaleand
Female andMale
and MaleAedes
Male AedesAlbopictus
Aedes Albopictus
Albopictus
Mosquitoes’
Mosquitoes’
Mosquitoes’ Weight
Weight
Mosquitoes’Weight (mm)
Weight(mm) (n=96)
(mm)(n=96)
(n=96)
Performance Analysis of Wide Female and Male Aedes Albopictus
BC PC SUBC SUC
Mosquitoes’
BC Weight (mm) PC(n=96) SUBC SUC
0.4688 0.4896 0.5100 0.4896 18
18
18
0.4688 0.4896 0.5100 0.4896
𝛿𝛿=0.536, 𝑟𝑟 = 0.218, 𝑟𝑟 2 = 0.047 18
17
Based on Table 8, the proposed SUBC achieves the highest prediction
accuracy with 95.2 percent, followed by PC and SUC with 91.3 percent
prediction accuracy, respectively. Meanwhile, the BC records the
BC PC
BC BC
BC PC PC
PC SUBC SUBC
SUBC SUC SUC
SUC 0.4688 0.48
0.4688 0.4688
0.46880.4896 0.4896
0.4896 0.5100 0.5100
0.5100 0.4896
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30 0.4896
0.4896 2
𝛿𝛿=0.536, 𝑟𝑟 = 0.218, 𝑟𝑟 =
22
𝛿𝛿=0.536, 𝑟𝑟 𝛿𝛿=0.536,
= 0.218, 𝑟𝑟𝑟𝑟 2=
𝛿𝛿=0.536, ==0.218,
0.047 𝑟𝑟𝑟𝑟 =
0.218, = 0.047
0.047
Based on Table 8, the pro
Based
Based on
on Table
Table
Table 8,
8,
8,the
the
theproposed
proposed
proposed SUBC
SUBC
SUBC achieves
achieves
achieves the
the
thehighest
highest
highest
Based on Table 8, the proposed SUBC achieves the highest prediction accuracy prediction
prediction
predictionwith 95.2 percen
accuracy
accuracy with
with
with 95.2
95.2
95.2 percent,
percent,
percent, followed
followed
followed by
by by
PC
PC PC
and
and and
SUC
SUC SUC
with
with
accuracy with 95.2 percent, followed by PC and SUC with 91.3 percent prediction with
91.3
91.3 91.3 accuracy, resp
percent
percent
prediction
prediction
prediction percent accuracy,
accuracy,
prediction
accuracy, respectively.
respectively.
accuracy,
respectively. Meanwhile,
Meanwhile,
respectively.
Meanwhile, theMeanwhile,the
the BC
BC records BC records
thetherecords
BC the
the
records
lowest prediction accurac
lowest
lowest
the prediction
prediction
lowest
lowest prediction ataccuracy
prediction
accuracy accuracy
accuracy
87.5 at
at 87.5
87.5
percent. percent.
at 87.5
Thepercent. The
TheThe
percent.
relationship relationship
relationship
relationship
for for
the Aedes the
forfor
the
Albopictus mosqui
Aedes
Aedes
Aedes Albopictus
the Albopictus
Albopictus
Aedesmosquito mosquito
data
Albopictus is alsodata
mosquito
mosquitodata is
is also
also
positively
data positively
positively
weak.
is also weak.
positivelyweak.
weak.
Table 9
Table
Table999
Table 9 Table
Performance Analysis of
PerformancePerformance
Analysis ofAnalysis
Performance Analysis of
of Laboratory-reared
Female andFemale
Laboratory-reared
Laboratory-reared Female and
and Male
Male Aedes Male Aedes
Aedes Mosquitoes Bo
Albopictus
Performance Analysis of Laboratory-reared Female and Male Aedes
AlbopictusAlbopictus
MosquitoesMosquitoes
Albopictus Mosquitoes
Body Size (mm)Body Size
Size (mm)
Body (n=30) (mm) (n=30)
(n=30)
Albopictus Mosquitoes Body Size (mm) (n=30) BC
BC BC
BCPC PC
PC
SUBC SUBC
SUBC
SUC SUC
SUC
BC PC SUBC SUC 1.00
1.00 1.00
1.00
0.950 0.950
0.950
1.00 1.00
1.00
0.950 0.950
0.950
1.00 0.950 1.00 0.950 𝜀𝜀 -0.002
𝜀𝜀 𝜀𝜀𝜀𝜀
-0.002 -0.002
-0.002
-0.002 -0.002 -0.002
-0.002
-0.002 𝛿𝛿 =0.998, 𝑟𝑟 = −0.095, 𝑟𝑟 2 =
22
𝛿𝛿 =0.998, 𝑟𝑟𝛿𝛿𝛿𝛿==0.998,
−0.095,𝑟𝑟𝑟𝑟 =
=0.998, 𝑟𝑟=2 −0.095,
= 0.009 𝑟𝑟𝑟𝑟 =
−0.095, = 0.009
0.009
In Table 9, the BC and SU
In
In Table
Table
Table 9,
9,
9, the
the
the BC
BC
BC and
and
and SUBC
SUBC
SUBC methods
methods
methods predict
predict
predict the
the
the group
group
group membership
membership
membership
In Table 9, the BC and SUBC methods predict the group membership with 100% accuracy. H
with 100% with
with 100%
100%
100%
accuracy. accuracy.
accuracy.
accuracy. However,
However,
However,
However, both both both
methods
methods methods
overfittedoverfitted
both overfitted
methods overfitted (𝜀𝜀
(𝜀𝜀 = −0.002).(𝜀𝜀 =
=Meanwhile, the
−0.002). Meanwhile,
−0.002).
−0.002). Meanwhile,
Meanwhile, Meanwhile,
the
thePCPCand the
the
andSUC PC
SUC and SUC
PCmethods
and SUC
methods methods
methods
achieve
achieve achieve
achieve
95.2%
95.2% of95.2%
95.2%
of the group
the of
groupthe
ofmembership
the predic
group
group membership
group membership membership
membership prediction. prediction.
prediction.
Therefore,
prediction. Therefore,
Therefore, the
the association
the association
Therefore, shown
the association has shown
association
shown shown
a weak
has has
has aa
negative
a weak correlation
weak
weak
weak negative negative
negative
correlation.
negative correlation.
correlation.
correlation.
Table 10
Table 10 Table
Table 10
10
Table 10 Performance Analysis of
PerformancePerformance
Analysis ofAnalysis
Performance Analysis
Wide and of
of Wide
Wide and
and Laboratory-reared
Laboratory-reared
Laboratory-reared Female and Female
Female and
and Albopictus M
Male Aedes
Performance
Male AedesMale
Male Aedes
Albopictus Analysis
Aedes Albopictus
Albopictus of Wide
MosquitoesMosquitoes
Mosquitoesand
Wing Length Laboratory-reared
Wing
Wing
(mm)Length
Length
(n=20) (mm) Female
(mm) (n=20)
(n=20) and
Male Aedes Albopictus Mosquitoes Wing Length (mm) (n=20) BC
BC BC
BCPC PC
PC
SUBC SUBC
SUBC
SUC SUC
SUC𝛿𝛿 𝛿𝛿𝛿𝛿
BC PC SUBC SUC 1.00
1.00 1.00
1.00
0.925 0.925
0.925
1.00 1.00
0.901.00 0.90
1.00 0.90 0.90
1.00 0.925 𝜀𝜀 -0.007
𝜀𝜀 𝜀𝜀𝜀𝜀
-0.007 -0.007
-0.007
-0.007 -0.007 -0.007-0.007
-0.007 =0.993, 𝑟𝑟 = −0.038, 𝑟𝑟 2 =
22
=0.993, 𝑟𝑟 ==0.993,
−0.038,𝑟𝑟𝑟𝑟 =
=0.993, 𝑟𝑟= −0.038,
2
= 0.001 𝑟𝑟𝑟𝑟 =
−0.038, = 0.001
0.001 Similarly, the data set in T
Similarly,
data setthe
Similarly,
Similarly, the the data
data set
in Table 10in
set Table
Table 10
10 demonstrates
indemonstrates that the BCthat
demonstrates andthe
that the BC
BC and
SUBC and SUBC
SUBCobtain 100 pe
methods
methods
methods100
Similarly,
methods obtain obtain
obtain
the 100
data set
percent percent
100inaccuracy
percent
Table accuracy
accuracy
10 in in predicting
in that
demonstrates
predicting predicting
thethegroup
BC andthe group
theSUBC
group
methods obtain 100 percent accuracy in predicting the group and there is
membership,
membership, and there is also overfitting in both methods (𝜀𝜀 19 = −0.007). 19The accurate g
19
The accurate group predictions for the PC and SUCmethods methodsareare
93.2 percent and 9
93.2 percent and 90.6 percent, respectively. Even though the the
BCBC andand
SUBC show ove
SUBC show overfitting in Table 9 and Table 10, the methods can bebe accepted beca
methods can
accepted because the OPEC value is approximately one. one.Therefore,
Therefore, the correlation
the correlation for this data set is weak and negatively correlated.
correlated.

18 Part B

The results in this section demo


four methods on the two discr
the BC and SUBC show overfitting in Table 9 and Table 10, the
methodsBCcan be acceptedPCbecause the OPEC SUBC value is approximately
SUC
one. Therefore,
0.4688 the
Journalcorrelation
of ICT, 22,
0.4896 for
No. 1 this data
(January) set
2023,
0.5100 is weak
pp: 1– 30and negatively
0.4896
correlated. 2
𝛿𝛿=0.536, 𝑟𝑟 = 0.218, 𝑟𝑟 = 0.047
Part B
Based on Table 8, the proposed SUBC achieves the highest prediction
accuracy
The resultswith
in 95.2
in this
this percent,
section
section followed by
demonstrate
demonstrate thePC
the and SUC with
performance
performance 91.3 percent
analysis
analysis of
ofthe
the
prediction accuracy,
four methods on the the tworespectively.
two discrete
discrete dataMeanwhile,
data related
related to the BC
to Covid-19 records
Covid-19datadatasets.the
sets.
lowest
Table
Table 11 prediction
11 to
to Table
Table13 accuracy at
13demonstrate 87.5
demonstratethe percent. The
theperformance relationship
performanceanalysis
analysisbased for
basedon the
on
Aedes
the Albopictus
Malaysian mosquito
Covid-19 data data
sets is
foralso
sets for 189positively
189 days, weak.
days, while
while Tables
Tables14 14toto16
16
show the performance analysisanalysis of
of the
the Covid-19
Covid-19 datadatasets
setsfor
for163
163days
days
inTable 9
Nigeria.
Performance
Table 11 Analysis of Laboratory-reared Female and Male Aedes
Table
Table 11
11
Albopictus Mosquitoes Body Size (mm) (n=30)
Performance Analysisof
Performance Analysis
Analysis ofConfirmed
Confirmedand andDischarged
DischargedCases
Casesforfor
Performance BC of Confirmed
PC and Discharged
SUBC Cases
SUC for
Covid-19Virus,
Covid-19 Virus,Malaysia
Malaysiadata data(n=189)
(n=189)
Covid-19 Virus, Malaysia data (n=189)
1.00 0.950 1.00 0.950
BC
BC PC
PC SUBC
SUBC SUC
SUC
BC0.466 PC
-0.002
0.471 0.466SUBC -0.002 SUC
0.466 𝜀𝜀 0.471 0.466 0.466 0.492
0.492
0.466 0.471𝑟𝑟 2 = 0.009
𝛿𝛿 =0.998, 𝑟𝑟 = −0.095, 0.492
𝛿𝛿=0.512, 0.379, 𝑟𝑟𝑟𝑟22 == 0.144
𝛿𝛿=0.512, 𝑟𝑟𝑟𝑟 ==0.379, 0.144
In Table 9, the BC and SUBC methods predict the group membership
As
As shown
with 100%
shown in
in Table 11,
11, the
accuracy.
Table SUC
SUC predicts
However,
the both96.1
predicts percent
methods
96.1 correct
correctgroup
percentoverfitted (𝜀𝜀 =
group
membership,
Table 12 and
−0.002). Meanwhile,
membership, this is
and this isthefollowed by
PC and by
followed SUCthe PC method
themethods
PC method with
achieve 92 percent
with95.2% of the
92 percent
accuracy, while
while both
group membership
accuracy, BC
BC and
prediction.
both SUBC
SUBC achieve
andTherefore, 91
91 percent
the association
achieve correct
shown
percent has a
correct
Performance
weak negativeanalysis
classification.
classification. The of confirmed
correlation
correlation.
The correlation valueand
value death cases
isis positive
positive andfor
and Covid-19
relatively
relatively weak
weak
virus,
(𝑟𝑟 Malaysia data (n=189)
= 0.379).
Table 10
Table
Table 1212
Performance Analysis of Wide and Laboratory-reared Female and
Male Aedes Albopictus
Performance Mosquitoes Wing Length (mm) (n=20)
Performance analysis
analysis of
of confirmed
confirmed and
and death
death cases
cases for
for Covid-19
BC PC SUBC SUC
Covid-19
virus, Malaysia data
virus, Malaysia data (n=189)
0.889(n=189)0.725 0.889 0.646
BC PC SUBC SUC 𝛿𝛿
𝜀𝜀 -0.03
BC PC -0.03SUBC SUC
1.00 𝑟𝑟 2 = 0.434
0.889
𝛿𝛿 =0.864, 𝑟𝑟 = 0.659, 0.925
0.725 1.00
0.889 0.90
0.646
0.889 0.725 0.889 0.646
𝜀𝜀𝜀𝜀 -0.007
-0.03
-0.03
BC PC -0.007
-0.03
SUBC -0.03 SUC
2
𝛿𝛿=0.993,
=0.864,𝑟𝑟 𝑟𝑟==−0.038,
0.659, 𝑟𝑟 = 0.001
0.434
Table 13 the data set in Table 10 demonstrates that the BC and SUBC
Similarly, 20-
Table
methods
Table 12
12 shows
obtainthat
shows 100thethe
that BCBCandand
percent SUBC
accuracy
SUBC methods are overfitted
inmethods
predicting the (ɛ =
group
are overfitted
0.03),
(ɛ while both
Performance
= -0.03), while PCboth
Analysis andof SUC methods
Discharged
PC and SUC andpredict
Deaththe
methods group
Cases formembership
predict Covid-19
the group
correctly
Virus, at 83.9
Malaysia percent
Data and
(n=189) 74.8 percent, respectively.
membership correctly at 83.9 percent and 74.8 percent, respectively.19 Dissimilar to
other data sets,
Dissimilar the correlation
to other data sets, the of this data is positive
correlation and is
of this data moderately
positive
strong (r
BC = 0.659). This makes
PC sense as
SUBCit reveals a considerably
and moderately strong (r = 0.659). This makes sense as it reveals SUC strong
relationship
a considerably
0.825 between confirmed
strong0.741 and death
relationship between
0.825 cases, implying
confirmed that
0.685and death the
number of deaths would
cases, implying that the also increase if the confirmed cases
number of deaths would also increase if the increases.
2
𝛿𝛿= 0.867,
confirmed 𝑟𝑟 = 0.305, 𝑟𝑟 = 0.093
Table 13 cases increases.
19
Performance Analysis of Discharged and Death Cases for Covid-19
Virus, Malaysia Data (n=189)
1
accuracy
other datawith sets,95.2
thepercent, followed
correlation of thisbydata
PC and SUC with
is positive and91.3 percent
moderately
𝜀𝜀
prediction -0.03 respectively. Meanwhile,
accuracy, -0.03 the BC records the
strong (r = 0.659). This2 makes sense as it reveals a considerably strong
𝛿𝛿 =0.864,
lowest 𝑟𝑟 = 0.659,
prediction 𝑟𝑟 = 0.434
accuracy at 87.5 percent. The relationship for the
relationship Journal
between confirmed and death cases, implying that
of ICT, 22, No. 1 (January) 2023, pp: 1–30
Aedes
numberAlbopictus mosquito
of deaths would alsodata is also
increase positively
if the weak.
confirmed cases increases.
Table 913
Table
Table 13
Performance
Performance Analysis
Analysis of Laboratory-reared
of Discharged Female
and Death Casesand
forMale Aedes
Covid-19
Performance
Albopictus Analysis
Mosquitoes of Discharged
Body
Virus, Malaysia Data (n=189) Size and
(mm) Death
(n=30) Cases for Covid-19
Virus, Malaysia Data (n=189)
BC BC PC PC SUBCSUBC SUC
SUC
BC PC SUBC SUC
0.825
0.825 1.00 0.741 0.7410.950
0.825 1.00
0.825
0.685
0.950
0.685
𝛿𝛿= 0.867,
𝜀𝜀 𝑟𝑟 = 0.305, 𝑟𝑟 22 = 0.093
-0.002 -0.002
𝛿𝛿 =0.998, 𝑟𝑟 = −0.095, 𝑟𝑟 2 = 0.009
Table13
Table 13 also reveals that
that both
boththe theBCBCandandSUBC
SUBCmethods
methods achieve
achieve the
highest
In Table
the correct
9, the
highest predictions
BC and
correct at
SUBC methods
predictions 95.2 percent,
at 95.2 predict while
percent,the group PC and SUC
PCmembership
and SUC
correctly
with 100%
correctly predict the group
accuracy.
predict the group memberships
However,
memberships with85.5
both with
methods85.5percent
percentand
overfittedand(𝜀𝜀7979
=
percent accuracy,
accuracy, respectively. This data setshows
shows 1
−0.002).
percent Meanwhile, the PC and
respectively. ThisSUCdatamethods
set a aweak
achieve weak
95.2%positive
of the
positive
relationship.
group Theimplication
membership
relationship. The implication
prediction.isisTherefore,
thatthe
that themore
more peopleare
the association
people aredischarged,
discharged,
shown has a
the
weaklower the
negative death rate. Hence
correlation. the weak correlation value is
the lower the death rate. Hence the weak correlation value is justified. justified.

Table 10
Table14
14
Table
Performance Analysis ofofConfirmed
Performance Analysis Wide and and
Laboratory-reared Female
Discharged Cases for and
Performance
Male Aedes Analysis ofMosquitoes
Albopictus Confirmed Wing
and Discharged
Length Cases
(mm) for
(n=20)
Covid-19 Virus, Nigeria Data (n=163)
Covid-19 Virus, Nigeria Data (n=163)
BC
BC PCPC SUBC
SUBC SUC
SUC 𝛿𝛿
BC PC SUBC SUC
Table 14 1.00
0.798 0.925
0.647 1.00
0.798 0.90
0.497
0.798 0.647 0.798 0.497
𝜀𝜀
𝜀𝜀 -0.244
-0.007
-0.244 -0.099
-0.099 -0.244
-0.007
-0.244
Performance Analysis22of Confirmed and Discharged Cases for
=0.993,
𝛿𝛿= 0.548,𝑟𝑟 =
Covid-19 =−0.038,
𝑟𝑟Virus,
0.153, 𝑟𝑟𝑟𝑟 =
Nigeria 0.001
=Data
0.023(n=163)
Similarly, the data set in Table 10 demonstrates that the BC and SUBC
methods
Table obtain BC
14 reveals 100 BC,
that percent accuracy
PC,PCand SUBC inmethods
SUBC predicting
are the group
SUCoverfitted
(ɛ = -0.244, -0.099, -0.244), based on the OPEC benchmark values. In
0.798 0.647 0.798 0.497 21
contrast, the SUC method attains 90.7 percent accuracy in predicting 19
the group -0.244 with a-0.099
𝜀𝜀 membership, -0.244 correlation for this
very weak positive
2
𝛿𝛿= 0.548,
data set. 𝑟𝑟 = 0.153, 𝑟𝑟 = 0.023

Table
Table15
15

PerformanceAnalysis
Performance Analysisof
ofConfirmed
Confirmedand
andDeath
DeathCases
Casesfor
forCovid-19
Covid-19
Virus, Nigeria Data (n=163)
Virus, Nigeria Data (n=163)

BC
BC PC PC SUBCSUBC SUC SUC
0.834
0.834 0.804 0.8340.834 0.733
0.804 0.733
𝛿𝛿 =0.939, 𝑟𝑟 = 0.65, 𝑟𝑟 2 = 0.423

Table 15 illustrates that the BC and SUBC can predict the group
20
membership correctly with 88.8 percent, while the PC and SUC have
85.6 percent and 78.1 percent accuracy, respectively. Similar to the
Malaysian data set, a moderately strong correlation can also be found
BC 1.00 PC 0.950 SUBC 1.00 SUC0.950
0.834 0.804 0.834 0.733
𝜀𝜀 -0.002 -0.002
22, No. 1 (January) 2023, pp: 1–30
=0.939, 𝑟𝑟 =Journal
𝛿𝛿 =0.998, 0.65, 𝑟𝑟of2 ICT,
−0.095, =
𝑟𝑟 2 0.423
= 0.009

Table
In Table 15 illustrates that the BC and SUBC can predict the group
Table 15 9, the BC and
illustrates thatSUBC
the BC methods predictcan
and SUBC the predict
group membership
the group
membership correctly
with 100% correctly
accuracy. with 88.8 percent, while the PC and SUC have
membership withHowever, bothwhile
88.8 percent, methods
the PC overfitted
and SUC have (𝜀𝜀 =
85.6 percent
−0.002). and 78.1 the
Meanwhile, percent
PC accuracy,
and SUC respectively.
methods achieve Similar
95.2% to
of the
the
85.6 percent and 78.1 percent accuracy, respectively. Similar to the
Malaysian
group data set, prediction.
membership a moderately strong correlation
Therefore, the can also
association be found
Malaysian
in this data data
Nigeria set, aset
moderately
for strong
confirmed correlation
and death can also be found a
cases.
shown has
weak
in this negative correlation.
Nigeria data set for confirmed and death cases.
BC PC SUBC SUC
Table 10 16
Table 16 0.4688 0.4896 0.5100 0.4896
Performance Analysis
0.218, 𝑟𝑟 2ofof
𝛿𝛿=0.536, 𝑟𝑟 =Analysis Wide
0.047 and Laboratory-reared
=Discharged and Death Cases forFemale Covid-19and
Performance
Male Aedes Analysis
Albopictus
Virus, Nigeria Data (n=163) of Discharged
Mosquitoes and
Wing Death
Length Cases
(mm) for Covid-19
(n=20)
Virus,
Based Nigeria Data
on Table (n=163)
8, the proposed SUBC achieves the highest prediction
accuracy with 95.2 BCBC PCPC bySUBC
percent, followed and SUC SUC
PCSUBC withSUC
91.3 percent𝛿𝛿
prediction accuracy, BC respectively.
0.767 PC Meanwhile,
0.73 SUBC
0.761 the BC0.509SUC
records the
1.00 0.9250.73 1.00 0.90 0.509
lowest prediction0.767 accuracy at 87.5 percent. 0.761
The relationship for the
Aedes𝜀𝜀𝜀𝜀 Albopictus mosquito data
-0.145
-0.145
-0.007 is also-0.007
-0.108
-0.108 positively
-0.139 weak.
-0.139
=0.993,
𝛿𝛿 =0.622,𝑟𝑟 = 0.198, 𝑟𝑟𝑟𝑟22 =
−0.038,
𝑟𝑟 = = 0.001
0.039
Similarly,
Table 9 the data set in Table 10 demonstrates that the BC and SUBC
Again,
methods
Again, theobtain
the dataset
data set100
ininTable
Table 16also
percent
16 also revealsinthat
accuracy
reveals that BC,PCPCand
predicting
BC, and SUBC
theSUBC
group
Performance
methods are Analysis
overfitted of
(ɛ Laboratory-reared
= -0.145, -0.108, Female
-0.139), and
while
methods are overfitted (ɛ = -0.145, -0.108, -0.139), while the SUC Male
the Aedes
SUC
Albopictus
method Mosquitoes
predicts 81.8 Body
percent Size
group (mm) (n=30)
membership, and the
method predicts 81.8 percent group membership, and the data show19 data show an
extremely weak and positive correlation.
an extremely weak and positive correlation.
BC PC SUBC SUC
Tables 17 to 19 display the annual road traffic accident data
Tables 17 to 19 display the annual road traffic accident data categorised
categorised based 1.00 on the degrees 0.950 1.00
of fatalities. Based on 0.950
the results in
based on the degrees of fatalities. Based on the results in Table 17 to
Table 17 𝜀𝜀 to Table-0.002
19, the probability-based-0.002 methods (BC and SUBC)
Table 19, the probability-based methods (BC and SUBC) overfit the
overfit
OPEC the 𝑟𝑟OPEC
𝛿𝛿 =0.998,
value. = value.
−0.095,
However, 𝑟𝑟 2 However,
the =PC0.009
method theoutperforms
PC methodtheoutperforms
SUC method the
SUCamethod
with moderate withtoa very
moderate to very
strong strong relationship.
relationship. Based on the Based on the
adopted 22
In Table
adopted 9, the
benchmarkBC and SUBC
value, we methods
may predict
conclude the
that
benchmark value, we may conclude that these two methods (PC and group
these membership
two methods
withand
(PC
SUC) 100%
provide accuracy.
SUC)moreprovide However,
more
accurate accurateboth
predictions formethods
predictions foroverfitted
classification (𝜀𝜀 =
classification
than the
than the probability methods (BC and SUBC). The Pearson
probability methods (BC and SUBC). The Pearson correlation of thisthe
−0.002). Meanwhile, the PC and SUC methods achieve correlation
95.2% of
ofgroup
data this
setdata set is
membership
is very veryprediction.
strong strong andTherefore,
and positively positively associated,
the
associated, mainlymainly
association forshownforsets
data data
has a
sets
weak in Tables
negative
in Tables 17 and 19.17 and 19.
correlation.
Table17
Table 10
Table 17
PerformanceAnalysis
Performance AnalysisofofCar Wideandand Laboratory-reared
Motorbike Road Traffic Female and
Accident
Performance
Male Aedes Analysis
Albopictus of Mosquitoes
Car and Motorbike
Wing Road(mm)
Length Traffic Accident
(n=20)
(n=30)
(n=30)
BCBC PC PC SUBCSUBC SUC SUC 𝛿𝛿
1.00 BC 0.83 PC 1.00 1.00SUBC 0.82 0.90 SUC
1.00
1.00 0.925
0.83 1.00 0.82
𝜀𝜀 𝜀𝜀 -0.022
-0.007
-0.022 -0.007
-0.022 -0.022
2
𝛿𝛿=0.993,
=0.978, 𝑟𝑟𝑟𝑟==−0.038,
0.946, 𝑟𝑟𝑟𝑟2 ==0.895
0.001
Similarly, the data set in Table 10 demonstrates that the BC and SUBC
Table 17 reveals
methods obtainthat
100thepercent
PC and accuracy
SUC achieve
in 85.3 percentthe
predicting and group
83.4
21
percent correct group prediction, with a very strong positive
correlation value of 0.946. Meanwhile, the probability-based methods
(BC and SUBC) in Table 18 show overfitted values (ɛ = -0.007), in 19
Based on TableBC 8, the proposed PC PC SUBC SUBCachieves the highest prediction
BC SUBC SUC SUC
accuracy with 95.2 1.00 percent, followed
0.83 by PC
1.00 and SUC
0.82with 91.3 percent
Journal
prediction accuracy, of ICT, 22, No. 1 (January) 2023, pp: 1–30BC records the
1.00 respectively. 0.950 Meanwhile, 1.00 the 0.950
lowest
𝜀𝜀 𝜀𝜀 prediction
-0.022 accuracy at 87.5 percent.
-0.022-0.002 The relationship for the
-0.0022
Aedes
𝛿𝛿𝛿𝛿=0.978, Albopictus mosquito data is also positively weak.
Table 17 𝑟𝑟reveals
=0.998, 𝑟𝑟==0.946, that𝑟𝑟 the
−0.095, 𝑟𝑟=2 0.895
=PC0.009and SUC achieve 85.3 percent and
83.4
Table percent
Table17 correct group prediction,
9 reveals that the PC and SUC achieve with a very strong positive
85.3 percent and 83.4
In Table
correlation 9, the
value BC
of and
0.946. SUBC methods
Meanwhile,
percent correct group prediction, with a very strong thepredict the group
probability-based membership
methods
positive
(BCwithand100%
Performance
correlation SUBC)
valueaccuracy.
Analysis
in
of Table
0.946. However,
of 18 both
Laboratory-reared
show overfitted
Meanwhile, methods
Female(ɛand
values
the probability-basedoverfitted
Male
= -0.007), (𝜀𝜀
Aedes
methodsin =
Albopictus
−0.002).
contrast
(BC andtoSUBC)Meanwhile,
Mosquitoes
the PCinand Tablethe
SUCPCSize
Body
18 and (mm)
methods
show SUC methods
(n=30)
which
overfitted achieve
accurately
values 95.2%
(ɛ =predict of
thethe
-0.007), in
contrast to the PC and SUC methods which accurately predict thea
group
group membership
membership atprediction.
86.6 percent Therefore,
and 85.6 the association
percent, shown
respectively, has
with
weak
agroup negative
moderately
membership correlation.
positiveBCat correlation
86.6 percent PCofand
0.613 SUBC
85.6for this data
percent, SUC
set.
respectively, with
aTable
moderately
10 positive correlation of 0.613 for this data set.
1.00 0.950 1.00 0.950
Table 18
Table 𝜀𝜀18 -0.002 -0.002
Performance Analysis 2of Wide and Laboratory-reared Female and
𝛿𝛿 =0.998, 𝑟𝑟 =Analysis
Performance −0.095, 𝑟𝑟of = Car0.009Severe and Motorbike Severe Road
Male Aedes Albopictus
Performance Analysis of Mosquitoes
Car Severe Wing
and Length
Motorbike (mm) (n=20)
Severe Road
Traffic Accident Injuries (n=25)
Traffic
In Table Accident
9, the BC Injuries
and SUBC (n=25)methods predict the group membership
BC PC SUBC SUC 𝛿𝛿
with 100% accuracy. BC However, PC both methodsSUBC overfitted
SUC (𝜀𝜀 =
−0.002). Meanwhile, BC PC SUC SUBC SUC
1.00the PC
1.00
1.00
and
0.925
0.86 0.86 methods
1.001.00achieve
1.00
0.9095.2%
0.84 of the
0.84 shown
group membership-0.007 prediction. Therefore, the-0.007 association has a
𝜀𝜀 -0.007 -0.007
weak𝜀𝜀negative -0.007 correlation.
22
-0.007
𝛿𝛿=0.993,
=0.993,𝑟𝑟𝑟𝑟==−0.038,
0.613, 𝑟𝑟𝑟𝑟 ==0.376
0.001
Similarly,
Table 10 the data set in Table 10 demonstrates that the BC and SUBC
methods
Table 19 obtain 100 percent accuracy in predicting the group
Performance
Table 19
Performance AnalysisofofCar
Analysis Wide andand
Slight Laboratory-reared
MotorbikeSlight Female and
SlightRoad
Road
Performance
Performance Analysis
Analysis of
of Car
Car Slight
Slight and
and Motorbike
Motorbike Slight Road
Performance
Male Aedes
Traffic Analysis
Albopictus
Accident of Mosquitoes
InjuriesCar Slight and
(n=25) Motorbike
Wing Slight(n=20)
Length (mm) Road Traffic19
Traffic
Traffic Accident
Accident Injuries (n=25)
Accident InjuriesInjuries
(n=25) (n=25)
BC
BC PC
PC SUBC
SUBC SUC
SUC 𝛿𝛿
BC
BC PC
PC SUBC
SUBC SUC
SUC
1.00 BC
1.00 0.76 PC
0.76 1.00 SUBC0.70
1.00 0.70 SUC 23
1.00
1.00 0.76
0.925 1.00
1.00 0.70
0.90
1.00 0.76 1.00 0.70
𝜀𝜀
𝜀𝜀𝜀𝜀 𝜀𝜀 -0.117
-0.007
-0.117 -0.117 -0.117
-0.007
-0.117 -0.117
-0.117 -0.117
𝛿𝛿𝛿𝛿𝛿𝛿=0.993, 0.968, 𝑟𝑟𝑟𝑟22 =
=0.883,𝑟𝑟 𝑟𝑟==−0.038, = 0.937
0.001
=0.883,
=0.883, 𝑟𝑟𝑟𝑟 = 0.968, 𝑟𝑟𝑟𝑟22 =
= 0.968, = 0.937
0.937
Similarly, the data set in Table 10 demonstrates that the BC and SUBC
Similar
methods
Similar
Similar
Similar to
to the
the results
obtain
tothe results
results 100
results ininin
in Table
percent
Table
Table
Table 19,
19,
19, the
the
the probability-based
accuracy
19, the in predicting
probability-based
probability-based
probability-based classifiers
the groupare
classifiers
classifiers
classifiers are
are
overfitted,
are overfitted,
overfitted,
overfitted, while
while
while the
while
the classification
the classification
the classification
classification using
using PC
usingusing
PC
PC and and
and SUC
PCSUC
SUC methods
and methods
SUC
methods attain
methods
attain
attain
86.1
attain
86.1 percent
86.1
percent and
percent
and 79.3
and
79.3 percent
79.3 percent
percent accuracy
accuracy
accuracy inin
86.1 percent and 79.3 percent accuracy in predicting the groupin predicting
predicting
predicting the
the
the group
group
group 19
membership.
membership. This
membership.
membership. This
This data
This data
data
data set set
set reveals a very
reveals aa very
set reveals strong
very strong
very positive
strong positive
strong correlation
positive correlation
positive correlation
correlation
whichisisis0.968.
which
which 0.968.
0.968.
Table20
Table
Table 2020summarises
summarisesthe
summarises thebest-performed
the best-performedmethod
best-performed methodfor
method forthe
for thecontinuous
the continuous
continuous
continuous
data
data sets
data sets based
sets based
based on on
on thethe minimum
the minimum misclassification
minimum misclassification
misclassification rate rate (𝜀𝜀).
rate (𝜀𝜀). The
(𝜀𝜀).The results
results
Theresults
The results
show
show
show that that
that the the proposed
the proposed
proposed methodsmethods outperformed
methods outperformed
outperformed the the conventional
the conventional
conventional
conventional
methods.
methods. Table 20 also includes the comparative
methods. Table 20 also includes the comparative analysis of
Table 20 also includes the comparative analysis
analysis of the
of the
the
the
performance
performance
performance of of different
of different methods with respect to the Pearson
different methods
methods with with respect
respect to to2 the Pearson
the Pearson
Pearson
correlation
correlation (𝑟𝑟) and
and thethe coefficient
the coefficient of determination 22 ) values. Based
(𝑟𝑟
correlation (𝑟𝑟)
(𝑟𝑟)and coefficient ofof determination
determination (𝑟𝑟 values.
(𝑟𝑟 ))values. Based
values.Based
Based
onthe
on theresults
resultsin inthis
thistable,
table,we wemay
mayconclude
concludethat thataaaweak
weakpositive
positiveor oraaa
on theresults
the resultsininthis thistable,
table,wewemaymayconclude
conclude thatthatweak
a weak positive or
positive
very
very weak
very weak negative
weak negative correlation
negative correlation
correlation is is associated
is associated
associated with with
with the the minimum
the minimum
minimum
misclassificationrate
22
misclassification
misclassification ratefor
rate foraaacontinuous
for continuousdata
continuous dataset.
data set.We
set. Wealso
We alsoobserve
also observethat
observe that
that
2
very
very small
very small 22 values correspond to the minimum misclassification rate.
𝑟𝑟
small 𝑟𝑟𝑟𝑟 values
values correspond
correspond to to the
the minimum
minimum misclassification
misclassification rate. rate.
Theresults
The
The resultsof
results ofofthe
thedata
the datasets
data setsin
sets inTable
in Table20
Table 20are
20 aredepicted
are depictedin
depicted inFigure
in Figure2.
Figure 2.2.
data
datasets
sets based
based on on thethe minimum
minimum misclassification
misclassification rate (𝜀𝜀). The
rate (𝜀𝜀). The results
results
show
showthat that
thatthe the proposed
theproposed
proposedmethods methods outperformed
methodsoutperformed
outperformedthe the conventional
theconventional
conventional
show
show that the proposed methods outperformed the conventional
methods.
methods.TableTable
Table20 20 also includes the comparative analysis of the
methods.
methods. Table 2020alsoalsoincludes
also includesthe
includes thecomparative
the comparativeanalysis
comparative analysisof
analysis ofofthe the
the
performance
performance of
of
Journal different
different
of ICT, 22, methods
methods
No. 1 (January)with
with respect
respect
2023, pp: 1 – toto the
30 the Pearson
Pearson
performance
performance of of different
different methods
methods with with respect
respect to to2 thethe Pearson
Pearson
correlation
correlation (𝑟𝑟)
(𝑟𝑟) and
and the
the coefficient
coefficient ofof determination
determination (𝑟𝑟22 2))values.
(𝑟𝑟 values. Based
Based
correlation
correlation (𝑟𝑟) and the coefficient of determination (𝑟𝑟 ) values.Based
(𝑟𝑟) and the coefficient of determination (𝑟𝑟 ) values. Based
on
on the
the results
results in
in this
this table,
table, we
we may
may conclude
conclude that
thata a weak
weak positive
positive or a
on the
on athe
or
results
veryresults
weak
in this
innegativetable,
this table, we may conclude
we may conclude
correlation is
that a weak
that awith
associated weak positive
thepositive
minimum ororaa a
or
very weak
veryweakweaknegativenegative correlation
negativecorrelation is associated
correlationisisisassociated
associatedwith withthe the minimum
theminimum
minimum
very
very weak negative correlation associated with the minimum
misclassification
misclassification
misclassification rate
rate
ratefor
foraaacontinuous
for continuous
continuous data
data
dataset.
set.
set.We
We
We also
also
also observe
observe
observe that
that
that
misclassification
misclassification 2 rate
rate for
for aa continuous
continuous data data set.
set. We
We also also observe
observe that that
very
very
very small
small
small 𝑟𝑟2𝑟𝑟2 values
2
values correspond
correspond
correspond to
totothe
the
the minimum
minimum
minimum misclassification
misclassification
misclassification rate.
rate.
rate.
very
verysmall
small𝑟𝑟𝑟𝑟 valuesvaluescorrespond
correspondto tothe theminimum
minimummisclassification
misclassificationrate. rate.
The
The results
results
Theresults of
resultsof ofthethe data
data sets
sets inTable
in Table 2020are
are depicted
depicted inininFigure
Figure 2.2.2.
The
The results ofofthe thedata
the datasets
data setsin
sets ininTable
Table20
Table 2020arearedepicted
are depictedin
depicted inFigure Figure2.
Figure 2.
Table
Table 20
20
Table20
Table
Table 2020
Comparative Analysis
ComparativeAnalysis ofthe
Analysisof the Best Methods for Continuous Data
Comparative
Comparative
Comparative Analysis
Analysis ofofthe
of theBest
the BestMethods
Best
Best Methodsfor
Methods
Methods forContinuous
for
for ContinuousData
Continuous
Continuous Data
Data
Data
r 𝑟𝑟 22 PEC OPEC(𝛿𝛿) 𝜀𝜀 Best Methods
rrr r PEC
PECPEC OPEC(𝛿𝛿)
PEC OPEC(𝛿𝛿) 𝜀𝜀𝜀𝜀 𝜀𝜀
𝑟𝑟𝑟𝑟22𝑟𝑟
OPEC(𝛿𝛿) Best
Best
Best
Best Methods
Methods
Methods
Methods
= 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 −−𝑃𝑃𝑃𝑃𝑃𝑃
=== 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂−− 𝑃𝑃𝑃𝑃𝑃𝑃
𝑃𝑃𝑃𝑃𝑃𝑃
𝑃𝑃𝑃𝑃𝑃𝑃
0.283 0.080
0.283 0.080 0.993
0.080 0.993 0.999
0.993 0.999
0.999 0.006
0.006 BC
BC &&SUBC
0.283
0.283
0.283 0.080
0.080 0.993
0.993 0.999
0.999 0.006
0.006
0.006 BC
BCBC
&&& SUBC
SUBC
SUBC
SUBC
-0.148
-0.148 0.022
-0.148 0.022 0.61
0.022 0.61
0.61 0.702
0.702
0.702 0.092
0.092
0.092 SUBC
SUBC
SUBC
-0.148
-0.148 0.022
0.022 0.61
0.61 0.702
0.702 0.092
0.092 SUBC
SUBC
-0.063
-0.063 0.0004
-0.063 0.0004 0.571
0.0004 0.571
0.571 0.606
0.606
0.606 0.035
0.035
0.035 SUBC
SUBC
SUBC
-0.063
-0.063 0.0004
0.0004 0.571
0.571 0.606
0.606 0.035
0.035 SUBC
SUBC
-0.251
-0.251 0.063
-0.251 0.063 0.696
0.063 0.696
0.696 0.905
0.905
0.905 0.209
0.209
0.209 SUC
SUC
SUC
-0.251
-0.251 0.063
0.063 0.696
0.696 0.905
0.905 0.209
0.209 SUC
SUC
-0.196
-0.196 0.038
-0.196 0.038 0.536
0.038 0.536
0.536 0.539
0.539
0.539 0.003
0.003
0.003 BC
BC
BC
-0.196
-0.196 0.038
0.038 0.536
0.536 0.539
0.539 0.003
0.003 BC
BC
0.079 0.006 0.833 0.859 0.026 SUC
0.079 0.006 0.833 0.859 0.026 SUC
24
24
0.218 0.047 0.51 0.536 0.026 SUBC 24
24
0.218 0.047 0.51 0.536 0.026 SUBC
-0.095 0.009 0.95 0.998 0.048 PC & SUC
-0.095 0.009 0.95 0.998 0.048 PC & SUC
-0.038 0.001 0.925 0.993 0.068 PC
-0.038 0.001 0.925 0.993 0.068 PC
Figure 2
Figure 2
Comparative Analysis of Continuous Data
Comparative Analysis of Continuous Data

0.4 0.3
Performance values
Performance values

0.3 0.25
0.2 0.2
0.1 0.15
0 0.1
-0.1 1 2 3 4 5 6 7 8 9 0.05
0
-0.2
1 2 3 4 5 6 7 8 9
-0.3
Number of experiments
Number of experiments
r MISCLASSIFICATION RATE
MISCLASSIFICATION RATE r square

23
(a) (b)

The comparative analysis in Table 21 shows that the proposed methods


which
whichis is
which 0.968.
is0.968.
0.968.
MISCLASSIFICATION RATE r square
Table
Table
Table2020 summarises
20 summarises
summarises thethe best-performed
the best-performed
best-performed method
method
method forforthethe
for continuous
the continuous
continuous
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
data
data
sets
data setsbased
sets
(a) based
based ononthe
on theminimum
the minimum
minimum misclassification
misclassification
misclassification rate
(b) rate(𝜀𝜀).
rate (𝜀𝜀).The
(𝜀𝜀). Theresults
The results
results
show
show
show that
that
thatthethe proposed
the proposed
proposed methods
methods
methods outperformed
outperformed
outperformed thethe
theconventional
conventional
conventional
methods.
The methods.
methods. Table
comparative Table
Table 20 20 also
20
analysis also
alsoincludes
inincludes
includes
Table thethe
21 comparative
the comparative
comparative
shows analysis
analysis
analysis ofofofthethe
the
The comparative analysis in Table 21 shows that thethat the
proposed proposed
methods
performance
performance
performance
methods of of
ofdifferent
different
different methods
methods
methods with
with
withrespect
respect
respect to to the
to the Pearson
the Pearson
Pearson
are morearerobust more than robustthe thanconventional
the conventional methods methodsbased
2 22
based
on theon
correlation
correlation
correlation (𝑟𝑟)
the misclassification(𝑟𝑟)and
(𝑟𝑟) andthe
and the
coefficient
the
error coefficient
coefficient of of
determination
ofdetermination
determination (𝑟𝑟(𝑟𝑟)
(𝑟𝑟values.
) values.
) values.Based
Based
Based
misclassification error forfordiscrete
discretedatadata cases.
cases. Furthermore, the the
onon
the
on the
results
the results
results in in
this
inthistable,
this table,
table,we we may
we may
mayconclude
conclude
conclude thatthata
thatweak
a aweak
weakpositive
positive
positive or or
aa
analysis reveals
analysis reveals that that aa weak
weak to to aa strong
strong positive
positive correlation
correlation (𝑟𝑟) are are or a
very
very
veryweak
weak
weak negative
negative
negative correlation
correlation
correlation is isis
associated
associated
associated with
with
withthethe
theminimum
minimum
minimum
associated
associated withwith the theminimum
minimummisclassification
misclassificationrate. Moreover,thether
rate.Moreover, 2
misclassification
misclassification
misclassification rate
rate forfor
rate a continuous
for aacontinuous
continuous data
data set.
data set.We
set. WeWealso
also observe
also observe
observe that
that
that
shows
r2 shows very
very small
small
2 22 totovery
very large
large values
values associated
associated with relatively
a relatively
very
very
small
very
minimum small 𝑟𝑟
small 𝑟𝑟values
𝑟𝑟 values
valuescorrespond
misclassificationrate. correspond
correspond to
rate.These to
Thesethe
tothe
minimum
the minimum
minimum
results misclassification
misclassification
misclassification rate.
rate.
rate.
minimum misclassification results areare illustrated
illustrated in Figure
in Figure 3.
3.The
Theresults
The results
resultsofof the
ofthe data
the data sets
data sets inin
sets Table
inTable
Table2020 areare
20 depicted
aredepicted
depicted inin Figure
inFigure
Figure 2. 2.2.
Table
Table 212020
Table
Table 20
Table 21
Comparative
Comparative
Comparative
Comparative Analysis
Analysis
Analysis
Analysisofofthe
of
the Best
ofthe Best
the Methods
BestMethods
Best Methods
Methodsfor Discrete
forforContinuous
for DataData
Continuous
Continuous Data
Data

rr rr PEC
PECPEC
PEC OPEC(𝛿𝛿)
OPEC(𝛿𝛿)
𝑟𝑟 2𝑟𝑟 𝑟𝑟2 2
OPEC(𝛿𝛿) 𝜀𝜀 𝜀𝜀 𝜀𝜀 Best
Best Methods
Best
Methods
BestMethods
Methods
== 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
25
=𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
− 𝑃𝑃𝑃𝑃𝑃𝑃
−−𝑃𝑃𝑃𝑃𝑃𝑃
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝑃𝑃𝑃𝑃𝑃𝑃
0.283
0.283
0.379 0.080
0.080
0.080 0.993
0.2830.144 0.993
0.993 0.999
0.492 0.999
0.999
0.512 0.006
0.006
0.02 0.006 BCBC
&SUC
BC&
SUBC
&SUBC
SUBC
0.659-0.1480.434
-0.148
-0.148 0.022
0.022 0.725
0.022 0.61
0.61
0.61 0.864
0.702
0.702
0.702 0.139
0.092
0.092
0.092 PC
SUBC
SUBC
SUBC
0.305 0.093 0.825 0.867 0.042 BC & SUBC
-0.063
-0.063
0.153 0.0004
0.0004
0.00040.571
-0.0630.023 0.571
0.571
0.497 0.606
0.606
0.606
0.548 0.035
0.035
0.035
0.051 SUBC
SUBC
SUBC
SUC
0.65 0.423 0.834 0.939 0.105 BC & SUBC
-0.251
-0.251
-0.251 0.063
0.063
0.063 0.696
0.696
0.696 0.905
0.905
0.905 0.209
0.209
0.209 SUC
SUC
SUC
0.198 0.039 0.509 0.622 0.113 SUC
0.946-0.1960.895
-0.196
-0.196 0.038
0.038 0.83
0.038 0.536
0.536
0.536 0.978
0.539
0.539
0.539 0.148
0.003
0.003
0.003 PC
BCBC
BC
0.613 0.376 0.86 0.993 0.133 PC
0.968 0.937 0.76 0.883 0.123 PC
2424
24
Figure 3

Comparative Analysis of Discrete Data

1.2 1
Performance values
Performance values

1 0.8
0.8
0.6
0.6
0.4
0.4
0.2 0.2
0 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Number of experiments Number of experiments

MISCLASSIFICATION RATE r square


r MISCLASSIFICATION RATE

(a) (b)

24
DISCUSSION
86.1 percent and 79.3 percent accuracy in predicting the group
Table 20 summarises
membership. This data setthe best-performed
reveals a very strong methodpositivefor the continuous
correlation
data sets
which is 0.968. based on the minimum misclassification rate (𝜀𝜀). The results
show that Journal theof ICT,
proposed
22, No. 1methods pp: 1–30
outperformed
(January) 2023, the conventional
Tablemethods. Table 20thealso
20 summarises includes themethod
best-performed comparative for theanalysis
continuousof the
dataperformance
sets based onof the different
minimum methods with respect
misclassification rate (𝜀𝜀).toThe theresults
Pearson
correlation and theDISCUSSION
coefficient of determination 2
) values. Based
show that the(𝑟𝑟) proposed methods outperformed the(𝑟𝑟conventional
methods. Table 20 also includes the comparative analysispositive
on the results in this table, we may conclude that a weak of theor a
Based on
very the comparative
weak negative performance
correlation analysis,
is
performance of different methods with respect to the Pearson associated we observed
with the that
minimum
different methods
misclassification
correlation performed
(𝑟𝑟) and therate differently
for a continuous
coefficient based on the characteristics
data set.(𝑟𝑟We) values.
of determination 2 also observe
Basedthat
of on
thethedataresults
very sets. We
small hadtable,
𝑟𝑟 2this
in values shown wethat
maythe
correspond to SUC method
the minimum
conclude that a weakdid not overfitor arate.
misclassification
positive
thevery
data
The when the
results
weak OPEC benchmark
of the data
negative evaluation
sets in Table
correlation 20 arewas
is associated applied.
depicted
with in theWe have2.
Figure
minimum
also revealed that all the methods performed differently
misclassification rate for a continuous data set. We also observe that depending
on very
the Table
data
small types,
20 as summarised
𝑟𝑟 2 values correspondintoTables 20 and 21.
the minimum This study hasrate.
misclassification
shown that for continuous data, as reported
The results of the data sets in Table 20 are depicted in Figure in Table 20, a relatively
2.
minimum misclassification
Comparative Analysisrate could
of the BestbeMethods
associated with very weak
for Continuous Data
negative
Table 20 and weak positive correlation values. This analysis was also
applicabler to the 𝑟𝑟 2 Meanwhile, PEC for a discrete
OPEC(𝛿𝛿) 𝜀𝜀 data set (Table 21),
Best it
Methods
indicated that a Analysis
weak to of a strong positive = 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
correlation − 𝑃𝑃𝑃𝑃𝑃𝑃
is related to a
Comparative the Best Methods for Continuous Data
0.283 0.080 0.993 0.999 0.006 BC & SUBC
minimum misclassification rate. A similar performance analysis is
portrayed by 𝑟𝑟 2 0.022
r -0.148 The PEC results showed
0.61OPEC(𝛿𝛿) overfitting
0.702 𝜀𝜀 for
0.092 the discrete
Best data
Methods
SUBC
sets was more than for the continuous ones. Thus,
= 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 it can be inferred
− 𝑃𝑃𝑃𝑃𝑃𝑃
0.283
that the -0.063 0.080
proposed 0.0004
SUC 0.9930.571
and SUBC0.9990.606 0.0060.035 better
methods performed BC &than SUBC
SUBC
the
conventional
-0.148 methods (BC and PC)
-0.2510.0220.063 0.610.696 0.7020.905 for both types
0.0920.209of data sets. Based
SUBCSUC
on the comparative analysis, the proposed methods were found to be
more-0.063 -0.1960.0004
robust than0.038the 0.571 0.536 0.6060.539
conventional methods. 0.0350.003 SUBCBC

-0.251 0.063 0.696 0.905 0.209 SUC


The proposed method was able to penalise the outlier based on the 24
following.
-0.196 The proposed
0.038 0.536 methods
0.539 transformed
0.003the outliers toBC
inliers
using the F-weight, and if any outliers still existed, we reweighted the
F-weight until the outliers were transformed into inliers. This study 24
also showed that the proposed SUC method solved the overfitting
problem associated with the BC method when OPEC was used as
a performance benchmark. We may technically generalise that
using continuous data for classification problems yielded a weak
negative and weak positive correlation. Meanwhile, the discrete data
used for classification yielded a relatively weak to a strong positive
correlation. The strength and the direction of the correlation values do
not imply robust or poor classification performance of the methods.
This suggested that the classification performance is independent of
the correlation values.

CONCLUSION

The analysis reveals that the performance of the investigated methods


is data-dependent. The results show that the proposed Smart Univariate

25
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

Classifier (SUC) is robust and has solved the problem of overfitting


associated with the conventional methods when OPEC is used as a
performance benchmark. The proposed SUBC method is robust but
still affected by the overfitting problem. In addition, both proposed
methods achieve high classification accuracy. The performance of
the proposed methods based on the introduction of random outliers
indicates that they are more robust and capable of resisting influential
observations than conventional methods. The findings also show that
a minimum misclassification error is independent of the strength of the
correlation values. Conclusively, the proposed methods are robust and
capable of overcoming the overfitting problem, which often occurs in
conventional methods.

ACKNOWLEDGMENT

This research received no specific grant from any funding agency


in the public, commercial, or not-for-profit sectors.

REFERENCES

Aborisade, O. M., & Anwar, M. (2018). Classification for authorship


of tweets by comparing logistic regression and naive bayes clas-
sifiers. IEEE International Conference on Information Reuse
and Integration (IRI). https://doi.org/ 10.1109/IRI.2018.00049.
Alf, E., & Abrahams, N. M. (1968). Relationship between per
cent overlap and measures of correlation. Educational
and Psychological Measurement, 28(3). https://doi.org/
10.1177/001316446802800307.
Banik, D., Ekbal, A., Bhattacharyya, P., Bhattacharyya, S., & Platos,
J. (2019). Statistical-based system combination approach to
gain advantages over different machine translation systems.
Heliyon, 5(9). https://doi.org/ 10.1016/j.heliyon.2019.e02504.
Barbini, E., Manzi, P., & Barbini, P. (2013). Bayesian approach in
medicine and health management. Current Topics in Public
Health.
Berry, D. A. (2004). Bayesian statistics and the efficiency and eth-
ics of clinical trials. Statistical Science, 19(1). https://doi.
org/10.1214/088342304000000044.

26
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

Bharadwaj, P., & Shao, Z. (2019). Fake news detection with


semantic features and text mining. International Journal on
Natural Language Computing, 8(3). https://doi: 10.5121/
ijnlc.2019.8302.
Blanca, M. J., Alarcon, R., Arnua, J., Bono, R., & Bendayan, R.
(2017). Non-normal data: Is ANOVA still a valid option?
Psicothema, 29(4), 552-557. https://dx.doi.org/ 10.7334/
psicothema2016.383.
Cain, M. K., Zhang, Z., & Yuan, K. (2017). Univariate and multivariate
skewness and kurtosis for measuring nonnormality: prevalence,
influence and estimation. Behav Res, 49, 1716-1735. https://
dx.doi.org/0.3758/s13428-016-0814-1.
Chattopadhyay, S., Davis, R. M., Menezes, D. D., Singh, G., Acharya,
R. U., & Tamura, T. (2012). Application of bayesian classifier
for the diagnosis of dental pain. Journal of Medical Systems,
36(3). https://doi.org/10.1007/s10916-010-9604-y.
Das, K. R., & Imon, A. H. M. R. (2016). A brief review of
tests for normality. American Journal of Theoretical and
Applied Statistics, 5(1), 5-12. https://doi.org/ 10.11648/j.
ajtas.20160501.12.
Dávideková, M., Michal Greguš, M. L., & Bureš, V. (2019). Yet
another classification of ICT in knowledge management
initiatives: Synchronicity and interaction perspective. Journal
of Engineering and Applied Sciences, 14(Special Issue 9), pp.
10549-10554.
Department of Environment (Dec. 2020). Official Portal of Department
of Environment. https:// www.doe.gov.my/portalv1/en/
Donoho, D., & Jin, J. (2008). Higher criticism thresholding: Optimal
feature selection when useful features are rare and weak.
Proceedings of the National Academy of Sciences of the
United States of America, 105(39). https://doi.org/ 10.1073/
pnas.0807471105.
El Abbassi, M., Overbeck, J., Braun, O., Calame, M., van der Zant,
H. S., & Perrin, M. L. (2021). Benchmark and application
of unsupervised classification approaches for univariate
data. Communications Physics, 4(1), pp. 1-9.
Field, A. (2009). Discovering Statistics using SPSS (3rd ed). London:
SAGE Publications Ltd, p. 822.
Fisher, R. A. (1936). The use of multiple measurements in
taxonomic problems. Annals of Eugenics, 7(2). https://doi.
org/10.1111/j.1469-1809.1936.tb02137.x.
27
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

Hamid, H., Ngu, P. A. H., & Alipiah, F. M. (2018). New smoothed


location models integrated with PCA and two types of MCA
for handling large number of mixed continuous and binary
variables. Pertanika Journal of Science & Technology, 26(1),
247-260.
Hamid H., Zainon, F., & Yong T. P. (2016). Performance analysis:
An integration of principal component analysis and linear
discriminant analysis for a very large number of measured
variables. Research Journal of Applied Sciences, 11(11), 1422-
1426.
Huberty, C. J., & Holmes, S. E. (1983). Two-group comparisons
and univariate classification. Educational and
Psychological Measurement, 43(1). https://doi.org/
10.1177/001316448304300103.
Jabatan Siatan dan Penguatkuasaan Trafik, PDRM (2020).
Jimoh, R. G., Abisoye, O. A., & Uthman, M. M. B. (2022). Ensemble
feed-forward neural network and support vector machine
for prediction of multiclass malaria infection. Journal of
Information and Communication Technology, 21(1), 117-148.
https://doi.org/10.32890/jict2022.21.1.6.
Johnson, R. A., & Wichern, D. W. (1992). Applied Multivariate
Statistical Analysis (3rd ed). New Jersey: Prentice-Hall, Inc,
Englewood Cliffs.
Johnson, W. (1987). The detection of influential observations
for allocation, separation, and the determination of
probabilities in a bayesian framework. Journal of
Business and Economic Statistics, 5(3). https://doi.org/
10.1080/07350015.1987.10509601.
Kementerian Kesihatan Malaysia. (2020, November 10). Covid-19
Malaysia. http://covid-19.moh.gov.my/.
Levy, P. (1967). Substantive significance of significant differences
between two groups. Psychological Bulletin, 67(1). https://doi.
org/10.1037/h0020415.
Lok, P. Y. (2018). Sleep quality among undergraduates during pre-
examination period in Universiti Utara Malaysia (Unpublished
final year project). Universiti Utara Malaysia, Malaysia.
Ma, D., Wei, W., Hu, H., & Guan, J. (2011). The application of
bayesian classification theories in distance education system.
International Journal of Modern Education and Computer
Science, 3(4). https://doi.org/10.5815/ijmecs.2011.04.02.

28
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

Mohd Noor, N. K., Mohd Noah, S. A., & Ab Aziz, M. J. (2020).


Classification of short possessive clitic pronoun nya in Malay
text to support anaphor candidate determination. Journal of
Information and Communication Technology, 19(4), 513-532.
https://doi.org/10.32890/jict2020.19.4.3.
NCDC (@NCDCgov) / Twitter (2020, June 19). https://twitter.com/
ncdcgov.
Okwonu, F. Z., Ahad, N. A., Ogini, N. O., Okoloko, I. E., & Wan
Husin, W. Z. (2022). Comparative performance evaluation of
efficiency for high dimensional classification methods. Journal
of Information and Communication Technology, 21(3), 437-
464. https://doi.org/10.32890/jict2022.21.3.6
Okwonu, F. Z., Ahad, N. A., Okoloko, I. E., Apanapudor, J. S.,
Kamaruddin, S. A., & Arunaye F. I. (2022). Robust Hybrid
Classification Methods and Applications. Pertanika Journal
of Science and Technology, 30(4), https://doi.org/10.47836/
pjst.30.4.29.
Okwonu, F. Z., Dieng, H., Othman, A. R., & Ooi, S. H. (2012).
Classification of aedes adults mosquitoes in two distinct groups
based on fisher linear discriminant analysis and FZOARO
techniques. Mathematical Theory and Modeling, 2(6), pp. 22-30.
Pang, Y. S., Ahad, N. A, Syed Yahaya, S. S., & Lim, Y. F. (2019). Robust
linear discriminant rule using novel distance-based trimming
procedure. Journal of Advance Research in Dynamical &
Control Systems, 11(05-Special Issue), pp. 969-978.
Sainin, M. S., & Alfred, R., & Ahmad, F. (2021). Ensemble meta
classifier with sampling and feature selection for data with
imbalance multiclass problem. Journal of Information and
Communication Technology, 20(2), 103-133. https://doi.
org/10.32890/jict2021.20.2.1.
Sheth, M., Gerovitch, A., Welsch, R., & Markuzon, N. (2019).
The univariate flagging algorithm (UFA): An interpretable
approach for predictive modeling. PLoS ONE, 14(10). https://
doi.org/10.1371/journal.pone.0223161.
Song, K., Wang, N., & Wang, H. (2020). A metric learning-based
univariate time series classification method. Information
(Switzerland), 11(6), https://doi.org/ 10.3390/INFO11060288.
Sun, Y., Li, J., Liu, J., Sun, B., & Chow, C. (2014). An improvement
of symbolic aggregate approximation distance measure for
time series. Neurocomputing, 138. https://doi.org/10.1016/j.
neucom.2014.01.045.

29
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30

Taheri, S., & Mammadov, M. (2013). Learning the naive bayes


classifier with optimisation models. International Journal of
Applied Mathematics and Computer Science, 23(4), https://doi.
org/10.2478/amcs-2013-0059.
Thet Zan, C., & Yamana, H. (2016). An improved symbolic aggregate
approximation distance measure based on its statistical
features. Proceedings of the 18th International Conference
on Information Integration and Web-based Applications and
Services, 72-80. https://doi.org/ 10.1145/3011141.3011146.
Theodoridis, S., & Koutroumbas, K. (2009). Classifiers based on
bayes decision theory. Pattern Recognition, 13-89.
Trovato, G., Chrupala, G., & Takanishi, A. (2016). Application
of the naive bayes classifier for representation and use of
heterogeneous and incomplete knowledge in social robotics.
Robotics, 5(1). https://doi.org/ 10.3390/robotics5010006.
Wald, A. (1944). On a statistical problem arising in the classification
of an individual into one of two groups. The Annals of
Mathematical Statistics, 15(2), https://doi.org/ 10.1214/
aoms/1177731280.
Wei, Q. (2018). Understanding of the naive Bayes classifier in spam
filtering. AIP Conference Proceedings, 1967. https://doi.
org/10.1063/1.5038979.

30

You might also like