Professional Documents
Culture Documents
ABSTRACT
INTRODUCTION
4
phase of the proposed methods transformed the outlier data points into
inlier data points before
Journal the22,classifier
of ICT, was applied
No. 1 (January) 2023, pp:to1–determine
30 group
membership. Finally, the hybrid method applied the Bayes rule to
perform group classification on the transformed F-Weighted data set.
the
Thetwo proposed
main methods
contribution to solve
of this papertheisoverfitting problem
the robustness and two
of the the
transformation
proposed methods of outliers
to solveto inliers. Furthermore,
the overfitting we investigated
problem and the
the classification
transformation performance
of outliers of the
to inliers. proposed we
Furthermore, methods and the
investigated
conventional
classification classification
performance methods of the using continuous
proposed methods and and
discrete
the
conventional classification methods using continuous and discrete
variables.
variables.
The next part of this paper discusses the Predictive Classification
The next followed
method, part of this
by paper
the Bayesdiscusses the Predictive
Classifier. The proposedClassification
Smart
method, followed by the Bayes Classifier. The proposed Smart
Univariate Classifier and Smart Univariate Bayes Classifier are
Univariate Classifier and Smart Univariate Bayes Classifier are
outlined in the next section, followed by data collection and
outlined in the next section, followed by data collection and
comparative performance analysis. The performance comparison
comparative performance analysis. The performance comparison
through the probability of exact classification (hit ratio) and analysis
through the probability of exact classification (hit ratio) and analysis is
is then presented, and finally the conclusion is conferred in the last
then presented, and finally the conclusion is conferred in the last
section.
section.
METHODOLOGY
METHODOLOGY
In
In this
this section,
section, we
we discuss
discuss the
the conventional
conventional methods,
methods, Predictive
Predictive
Classifier
Classifier (PC) and Bayes Classifier (BC), with the proposed proposed
(PC) and Bayes Classifier (BC), with the methods,
methods, Smart Univariate
Smart Univariate ClassifierClassifier
(SUC) and (SUC) and Univariate
Smart Smart Univariate
Bayes
Bayes Classifier
Classifier (SUBC).
(SUBC). The The different
different classifiers
classifiers arearebased
based on
on step
step
computational
computational procedures defined in
procedures defined in the
the following
following sections.
sections.
Predictive Classifier
We describe
We describe the
the predictive
predictive classifier
classifier according
according to
to Huberty
Huberty and
and Holmes
Holmes
(1983). This technique classifies an object 𝑋𝑋𝑖𝑖 to Group 1 (∆1 )
according to Equation 1.
(𝑋𝑋̅1 +𝑋𝑋̅2 )
𝑋𝑋𝑖𝑖 < 2 (1)
where 𝑋𝑋̅𝑖𝑖 is the mean vectors of Group ∆𝑖𝑖 as shown in Equation 2.
𝑛𝑛
𝑖𝑖 𝑥𝑥
∑𝑘𝑘=1
𝑋𝑋̅𝑖𝑖 =
𝑘𝑘
, 𝑖𝑖 = 1, 2 (2)
𝑛𝑛𝑖𝑖
Otherwise
Otherwiseclassify
classify𝑋𝑋𝑋𝑋𝑖𝑖𝑖𝑖 to
toGroup
Group22(∆ Equation
Equation111is
(∆22).).Equation isisvery
very
verysensitive
sensitivetoto
outliers
outliersas
asthe
as themean
the meanisisis
mean easily
easilyperturbed
perturbed
easily perturbed with
with aaslight
with slight change
change
a slight in
inthe
change the the5
indata
data
set.
dataThe
set. The basic idea
basicbasic
set. The of
ideaidea of Equation
Equation
of Equation11 is
1isisvariable
variableswap,
variable swap,in
swap, in which
which the
which the
independent
independentvariable
variableisis swapped
swapped with
withthe
with thedependent
dependentvariable
variableand
andvice
vice
versa.
versa.This
This procedure
procedure mimics mimics thethepoint
the pointbiserial
point biserialcorrelation
correlationcoefficient
coefficient
in
inwhich
whichaavariable
variableswap
swapisisapplicable
applicableas asshown
shownin inEquation
Equation3. 3.
𝑟𝑟𝑟𝑟𝑝𝑝,𝑏𝑏
(𝑥𝑥̅ −𝑥𝑥̅11))√√𝑝𝑝×𝑞𝑞
(𝑥𝑥̅22−𝑥𝑥̅ 𝑝𝑝×𝑞𝑞
(3)
(3) 5
𝑝𝑝,𝑏𝑏 =
=
√𝑠𝑠𝑧𝑧2𝑧𝑧2
√𝑠𝑠
𝑛𝑛𝑛𝑛11 𝑛𝑛𝑛𝑛22
where
where𝑛𝑛𝑛𝑛 = +𝑛𝑛𝑛𝑛22,,𝑝𝑝𝑝𝑝 =
= 𝑛𝑛𝑛𝑛11+ = and
and 𝑞𝑞𝑞𝑞 =
= are
arethe
thegroup
groupprobabilities
probabilitiesof
of
set. outliers
The as theideamean is easily perturbed with aswap, slight in change in the
the data
outliers
outliers
set.set. asbasic
as the mean
thebasicmean isof Equation
ofeasily perturbed
perturbed 1 iswith
1 with
variable aa slight
slight change
change in in
which
inthethedata data
set.The
The
independent basic
The idea
basic
variable
ideaidea
is
of
swappedofEquation
Equation Equation with 1 is the 1is isvariable
variable variable
dependent swap, swap,
swap,
variable which
inwhich
and whichthethethe
vice
set.
set. The
The basic
independent
independent basic idea
variable
variable of Equation
is
ismimics
swapped swapped with 11with isthe
is variable
variable
the dependent
dependent swap,
swap, inin which
variable
variable which
andand the
the
vice vice
versa. independent
This procedure variable is swapped
the point with biserialthe dependent
correlation variable
coefficient and vice
independent
independent
versa.versa.This This variable
variable
procedure
Journal
procedure ofis swapped
ICT,
mimicsmimics
22, No.
the with
with
the
1 point point
(January) the
thebiserial dependent
dependent
biserial
2023, pp: variable
variable
–
correlation
1
correlation 30 and
and vice
vice
coefficient
coefficient
in versa.
which This
a variable procedure
swap mimics
is applicable the point biserial correlation coefficient
versa.
versa. in in
in which This
This
which procedure
procedure
a variable
a variable swapmimics
swap is the
is applicable applicablepointas
point as
showncorrelation
biserial
biserial
as
shown shown
in Equation
correlation
in inEquation
in Equation
3.
coefficient
coefficient
3. 3. 3.
which a variable swap is applicable as shown Equation
in
in which
which aa variable
variable swap is applicable applicable (𝑥𝑥̅2 −𝑥𝑥̅1 )as as
√ shown
shown
𝑝𝑝×𝑞𝑞 in
in Equation
Equation 3.3.
𝑟𝑟𝑝𝑝,𝑏𝑏 = (𝑥𝑥̅2(𝑥𝑥̅ −𝑥𝑥̅1𝑝𝑝×𝑞𝑞
1 )22√−𝑥𝑥̅
−𝑥𝑥̅2(𝑥𝑥̅ )√𝑝𝑝×𝑞𝑞 (3)
𝑟𝑟𝑝𝑝,𝑏𝑏𝑟𝑟𝑝𝑝,𝑏𝑏
=𝑟𝑟𝑝𝑝,𝑏𝑏= (𝑥𝑥̅2=
(𝑥𝑥̅ −𝑥𝑥̅ √𝑠𝑠
−𝑥𝑥̅ )
11 √)𝑧𝑧2
√√𝑠𝑠𝑝𝑝×𝑞𝑞
𝑝𝑝×𝑞𝑞
2
1 )√𝑝𝑝×𝑞𝑞 (3)(3)(3)
𝑟𝑟𝑝𝑝,𝑏𝑏
𝑛𝑛
= 2 √𝑠𝑠
𝑧𝑧 √𝑠𝑠𝑧𝑧
22
𝑧𝑧 2
(3)
(3) (3)
√𝑠𝑠𝑛𝑛
√𝑠𝑠
where 𝑛𝑛 = 𝑛𝑛1 + 𝑛𝑛2 , 𝑝𝑝 = 𝑛𝑛𝑛𝑛11 and 𝑛𝑛1𝑛𝑛 𝑞𝑞 = 𝑛𝑛 𝑧𝑧𝑧𝑧2 𝑛𝑛
are the group probabilities of
where where 𝑛𝑛 =𝑛𝑛 𝑛𝑛𝑛𝑛=1= 𝑛𝑛+1𝑛𝑛𝑛𝑛+2+ 𝑛𝑛2𝑛𝑛,=𝑝𝑝, 𝑛𝑛𝑝𝑝
= and and 𝑞𝑞 𝑛𝑛𝑛𝑛𝑞𝑞
= are𝑛𝑛are
2
2thethe group probabilities of ofof
1 , 𝑝𝑝 group probabilities
where 1 𝑞𝑞 = 𝑛𝑛2 =
2 𝑛𝑛11 2= 𝑛𝑛 𝑛𝑛 and 22 𝑛𝑛 are the group probabilities
Group
where 𝑛𝑛𝑛𝑛
where Group
1= and
= 𝑛𝑛𝑛𝑛11Group
1 and 𝑛𝑛22,2,𝑝𝑝 and
+Group = and
2,
𝑠𝑠𝑧𝑧2andis the
𝑠𝑠is𝑧𝑧2𝑠𝑠the
is
𝑞𝑞 =
2 the
pooled
= pooled 𝑛𝑛
arecovariance
are 𝑛𝑛the
the groupof
group
covariance
Groups
probabilities
probabilities
of Groups
1 andof
1of and
Group
2. To 1 and
Group
avoid Group
1having
and 2,negative
Group
a and 2,𝑛𝑛𝑛𝑛𝑠𝑠and
2𝑧𝑧 𝑧𝑧 ispooled
value the
of
𝑛𝑛𝑛𝑛 pooled
the covariance covariance
correlation ofcoefficient,
Groups
of Groups1 andthe 1 and
Group
Group
2. 2.
To To11
avoid and
and
avoid Group
Group
having having 2,
a and
a 𝑠𝑠
negative
negative
2 is
𝑧𝑧 valuevaluethe value pooled
pooled
of of
the covariance
covariance
the correlation
correlation of
of Groups
Groups
coefficient,
coefficient, 1 1 and
and
the thethe
mean 2.
of To the avoid having
firstfirst
group a negative
is always
𝑧𝑧
greater of
than the correlation
thethe mean of of coefficient,
thethe second
2.
2. To
To
mean
meanmean avoid
avoid of
of theoffirst having
having
the a negative
group is value
value
always ofof
greater the
the correlation
correlation
than coefficient,
meancoefficient, the
the
second
group forthe the group
efficiency firstand is always
group always greater
is classification greater than than the mean the In of theofsecond
mean the 𝑛𝑛 second
mean
mean
group groupof
offor the
for first
first group
efficiency
efficiency and isbetter
and always
better better greater
greater
classification
classification than
than purposes.
the
the mean
mean
purposes.
purposes. Inofofthis
Inthe
thisthe
thiscase,
second
second
case,case, 𝑛𝑛 1𝑛𝑛
refers group
to the forsmaller
efficiency group and and better 𝑛𝑛 classification
refers to the purposes.
larger In this The
group. case, 1 1𝑛𝑛1
group
group for
for
refers efficiency
to efficiency
to
thetothe and
and better
better classification
classification purposes.
purposes. In this
Inlarger
this case,
case, 𝑛𝑛
thesmaller group and 𝑛𝑛2𝑛𝑛 refers to tothe larger group. 𝑛𝑛1The
2
refers refers smaller smallergroup and 𝑛𝑛 2 refers to the larger group. The
concept
refers
refers toof
concept
to thevariable
theof smaller
variable
smaller swap group
swap
group cangroupcanbe
and
and be
and
reversed refers
𝑛𝑛𝑛𝑛reversedrefers
2byreferschanging
byto
to the
changing
the
the
thethe
larger
larger group.
group.
group.
classification The
classificationThe
1 The
concept
“inequality”. of
concept Basedvariable
of variable swap
on on can
swap
this be reversed
can be2 reversed
statement,
2
Equation by changing
by1changing
can the
be be classification
the classification
written as thethe
concept
concept
“inequality”. of variable
“inequality”.
of variable
Based Based swap
swap
on this can
can this be
statement, reversed
bestatement,
reversed Equationby
by changing
Equation
changing 1 can1 canthe
the
be classification
written
classification
written as astheas the
following “inequality”.
Equation Based
4, on
which this statement,
implies that Equation
the 1
conventionalcan be written
variable
“inequality”.
following
“inequality”.
following Based
Equation
Based
Equation on
on this
4,
4, this
which statement,
which
statement, implies implies Equation
thatthat
Equation the 11conventional
the can be
bewritten
conventional
can written as asthe
variable
variable the
position following
remains. Equation 4, which implies that the conventional variable
following
position
following
position Equation
remains.
Equation
remains. 4,
4, which
which implies
implies that
that the
the conventional
conventional variable
variable
position remains.
position remains.
position remains. (𝑋𝑋̅1 +𝑋𝑋 ̅ )
𝑋𝑋𝑖𝑖 𝑋𝑋 >> (𝑋𝑋̅1(𝑋𝑋 ̅̅12+𝑋𝑋 ̅2 )̅ (4)
+𝑋𝑋 (𝑋𝑋̅) +𝑋𝑋
(4)(4)(4) (4)
)
𝑋𝑋𝑖𝑖 >𝑖𝑖𝑋𝑋(𝑋𝑋 ̅ > 2 ̅ 22)1 2
+𝑋𝑋
𝑖𝑖 ̅1 +𝑋𝑋 ̅
𝑋𝑋𝑖𝑖 >
𝑋𝑋
(𝑋𝑋
> 1𝑋𝑋22 is
2 2)
2 2 (4)
(4) assign
Equation
Equation 4 implies
4 implies thatthat an 𝑖𝑖object
an object 𝑖𝑖 𝑋𝑋𝑖𝑖assigned
is assigned to ∆ to 1 ,∆otherwise,
, otherwise, assign
Equation Equation 4 implies 4 implies that an that object
an object 𝑋𝑋𝑖𝑖 is 𝑋𝑋 assigned is assigned to ∆1to , otherwise,
1∆1 , otherwise, assign assign
𝑋𝑋
Equationto ∆to .∆
4 Unfortunately,
implies
. that
Unfortunately, an Equation
object Equation 𝑋𝑋 is 1assigned
and𝑖𝑖
1 and Equationto ∆
Equation , 4 give
otherwise,
4 give theassign
the same
𝑖𝑖
Equation𝑋𝑋 2
𝑋𝑋𝑖𝑖 to𝑖𝑖𝑋𝑋∆𝑖𝑖 2to.4 Unfortunately,
implies
2 that an object
∆2 . Unfortunately, Equation 𝑋𝑋𝑖𝑖 is1 assigned
Equation𝑖𝑖 and1 Equation andtoEquation∆1 , otherwise,
1 4 give 4 give thesame
the assign
same same
classification result.
to ∆∆2.. Unfortunately,
Unfortunately, TheThe biserial
Equation point 1point and correlation
Equation that44 corresponds
give the same to to
𝑋𝑋𝑋𝑋
𝑖𝑖
classification
𝑖𝑖 to
classification
classification result.
result.
2 4 is given in Equation 5. The
result. biserial
The biserial
Equation biserial point 1 and
point correlation
Equation
correlation correlation that that givecorresponds
corresponds
that the
corresponds same to to
Equation
classification result. The biserial
Equation
Equation
classification
Equation 4 is4result.
is
given
4 isgiven in
The
given in in Equation
Equation
biserial
Equation 5.point
point 5. 5.correlation that corresponds to
correlation that corresponds to
Equation 4 is given in Equation 5. −𝑥𝑥̅ ) 𝑝𝑝×𝑞𝑞
Equation 4 is given in Equation(𝑥𝑥̅ 5.1 (𝑥𝑥̅ √2 )√𝑝𝑝×𝑞𝑞
𝑟𝑟𝑝𝑝,𝑏𝑏𝑟𝑟𝑝𝑝,𝑏𝑏
= (𝑥𝑥̅ 2−𝑥𝑥̅
(5)
2 )21√−𝑥𝑥̅
(5)(5)(5) (5)
=1= −𝑥𝑥̅1(𝑥𝑥̅ 𝑝𝑝×𝑞𝑞2 )√𝑝𝑝×𝑞𝑞
𝑟𝑟𝑝𝑝,𝑏𝑏 = 𝑟𝑟𝑝𝑝,𝑏𝑏(𝑥𝑥̅ 1 −𝑥𝑥̅√𝑠𝑠 2 )𝑧𝑧2
√√𝑠𝑠 𝑝𝑝×𝑞𝑞
2
𝑟𝑟𝑝𝑝,𝑏𝑏 = (𝑥𝑥̅1 −𝑥𝑥̅√𝑠𝑠 ) √ 𝑧𝑧 2
√𝑠𝑠
𝑝𝑝×𝑞𝑞 (5)
𝑟𝑟𝑝𝑝,𝑏𝑏 = 2 𝑧𝑧
√𝑠𝑠𝑧𝑧22
𝑧𝑧
(5)
Taking Taking thethe
the absolute
absolute
absolute value
value value of Equation
of Equation
of Equation √𝑠𝑠5 𝑧𝑧5 yields
yields
5 yields the corresponding
the corresponding value
Taking Takingthe absolute
the absolute valuevalue of Equation of Equation 5 yields 5 yieldsthethe corresponding
corresponding
the corresponding valuevalue value
of
Taking Equation
of 3.
the absolute
Equation At
At 3. this
this
At point,
point,
value
this of we
point, we
Equation wehave
have have addressed
addressed
5 yields addressed what we
the corresponding
what may
we may consider
value
consider
Taking ofthe
of Equation Equation 3. At3.
absolute Atpoint,
this
value thisofpoint, we have
Equation we have addressed
5 yieldsaddressed what
the what
we may
corresponding we may consider consider
value
the weakness
weakness
of Equation 3.ofof
Atvariable
variable
this point,interchange
we have in
interchange Equation
in
addressed 1. The
what 1.1.
we remaining
The
may remaining
consider part
ofthethe theweakness
weakness weakness of
of variable variable
of variable interchange
interchange interchange in in in
EquationEquation
Equation 1. The The
1. The remaining
remaining remaining part partpart
theEquation
of
part this
of of
thispaper
weakness
this 3. ofAtvariable
considers
paper
paper
this point,
considers
considers other
interchangewe
other other haveunivariate
univariate
univariate in addressed classifiers,
Equation what
classifiers,1. The
classifiers, we
such may
assuch
remaining
such the
as
consider
Bayes
as
the part
the
Bayes
of
the of
this
weakness this
paper paper
considers
of considers
variable other
interchange other
univariate univariate
in Equation classifiers,
classifiers, such
1.methods,
The such
as
remainingthe as the
Bayespart Bayes
classifier
of this
Bayes paper andand
classifier
classifier the
considers
and proposed
the the other
proposed
proposed univariate
univariate univariate
univariate classifier
classifiers, classifier
classifier such as the
methods,
methods, SUC Bayes
SUC SUCand
of thisclassifier
classifier paper and
andconsiders
the the
proposed proposed
other univariate
univariate univariate classifier classifier
classifiers, methods,
such methods,
as SUC
the andand
SUC
Bayes and
classifier
SUBC.
and SUBC.
SUBC. and the proposed univariate classifier methods, SUC and
SUBC.
classifier SUBC.
SUBC. and the proposed univariate classifier methods, SUC and
SUBC.
Bayes Bayes Classifier
Bayes Classifier
Classifier
Bayes Classifier
Bayes Classifier
Given
Bayes Given thatthat
Classifier𝑔𝑔1 (𝑥𝑥) and and𝑔𝑔and (𝑥𝑥) are the
are theprobability
probability density
density functions
functions and and
Given Given
that 𝑔𝑔that 𝑔𝑔1𝑔𝑔(𝑥𝑥)
(𝑥𝑥)
1(𝑥𝑥) (𝑥𝑥)
1and
𝑔𝑔2𝑔𝑔(𝑥𝑥)
𝑔𝑔22 (𝑥𝑥) 2are(𝑥𝑥) theare the
probability probability density density
functions functions andand
Given
𝑋𝑋 is thata 𝑔𝑔
random and 𝑔𝑔
variable (𝑥𝑥) forare the the twoprobability groups density
denoted functions
as ∆ and and
∆ , 2,
𝑋𝑋𝑑𝑑×1
Given
𝑋𝑋𝑑𝑑×1
𝑑𝑑×1𝑋𝑋isis
that
𝑑𝑑×1 aisrandom
a1random
is
𝑔𝑔 a random
(𝑥𝑥)
variable
variable
and 𝑔𝑔
2
variable
(𝑥𝑥) for are
forthe for
the
thetwothe two
probability two
groups groups
groups denoted
denoted
densitydenotedas as
functions 1∆and
∆11as∆and 1∆ and
and
2∆
2,, ∆2 ,
𝑋𝑋
any a random variable for
𝑑𝑑×1value of 𝑋𝑋𝑑𝑑×1 can be assigned to any of the groups.
1 2 the two groups denoted as ∆ and
1 In the ∆ 2
𝑑𝑑×1 is a random
𝑋𝑋following, we assume variable thatfor thethe priori two probabilities
groups denoted andasthe ∆1 cost and ∆ of2 ,
inaccurate classification for the two groups are equal. Let 𝐶𝐶𝑖𝑖 = 66 6 6
6
{𝑥𝑥𝑖𝑖,𝑑𝑑×1 , 𝑑𝑑 = 1} be the sample spaces of the univariate random variables.6
Suppose ∇1 is the values of 𝑥𝑥1,𝑑𝑑×1 to be classified into Group ∆1 , and
∇2 denotes the values of 𝑥𝑥2,𝑑𝑑×1 to be classified into Group ∆2 . The
aim is that each random value or object must only be classified into
one group (Johnson & Wichern, 1992). This univariate approach is
based on the conditional probability concept. Let us describe the
process of classifying the object as Equation 6 and Equation 7.
6 𝑃𝑃(1|2) = 𝑃𝑃(𝑥𝑥1,𝑑𝑑×1 𝜖𝜖∇1 |∆2 ) = ∫∇ 𝑔𝑔2 (𝑥𝑥)𝑑𝑑𝑑𝑑 (6)
1
Therefore, allocateallocate
Therefore, to𝑥𝑥𝑖𝑖 ∆1toif𝑥𝑥 ∆𝑃𝑃(∆
𝑥𝑥𝑖𝑖 allocate 1 |𝑥𝑥𝑖𝑖,𝑑𝑑×1 >
)𝑃𝑃(∆ 𝑃𝑃(∆ 2 |𝑥𝑥𝑃𝑃(∆ ), 𝑖𝑖,𝑑𝑑×1 ),
Therefore, 𝑖𝑖 1toif ∆𝑃𝑃(∆ 1 if 1 |𝑥𝑥 )>
1 |𝑥𝑥 )>2 |𝑥𝑥𝑃𝑃(∆ 2 |𝑥𝑥𝑖𝑖,𝑑𝑑×1 ),
𝑖𝑖,𝑑𝑑×1 𝑖𝑖,𝑑𝑑×1
𝑖𝑖,𝑑𝑑×1
otherwise, allocate
otherwise, 𝑥𝑥
allocate
otherwise, to ∆ if 𝑃𝑃(∆ |𝑥𝑥
𝑥𝑥𝑖𝑖2 to 𝑥𝑥∆𝑖𝑖 2 toif1 ∆𝑃𝑃(∆
𝑖𝑖 allocate if )
|𝑥𝑥 < 𝑃𝑃(∆) < |𝑥𝑥
𝑃𝑃(∆ ).
|𝑥𝑥 This ). This
𝑖𝑖,𝑑𝑑×1 𝑃𝑃(∆
2 1 𝑖𝑖,𝑑𝑑×1 |𝑥𝑥 2 )
𝑖𝑖,𝑑𝑑×1
1 𝑖𝑖,𝑑𝑑×1 2 𝑖𝑖,𝑑𝑑×1 < 𝑃𝑃(∆ 2 𝑖𝑖,𝑑𝑑×1 ). This
|𝑥𝑥
decision criteria
decision can
criteria
decision be compactly
can becan
criteria written
compactly
be compactly as shown
writtenwritten in
as shown Equation
as shown 16
in Equation 16
in Equation 16
below.below. below.
𝑥𝑥 to ∆ if 𝑃𝑃(∆ if|𝑥𝑥𝑃𝑃(∆ >
)𝑃𝑃(∆ 𝑃𝑃(∆ 2 |𝑥𝑥𝑃𝑃(∆ )𝑃𝑃(∆
𝜋𝜋 = { 𝜋𝜋𝑖𝑖 = {𝜋𝜋𝑥𝑥1 𝑖𝑖=to{ ∆𝑥𝑥1𝑖𝑖 1to ∆1 if1 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 1 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 >
) 𝑖𝑖,𝑑𝑑×1 )>
𝑖𝑖,𝑑𝑑×12 |𝑥𝑥 ) 𝑖𝑖,𝑑𝑑×1 ) (16)
2 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1
𝑥𝑥𝑖𝑖 to ∆𝑥𝑥2𝑖𝑖ifto𝑃𝑃(∆ |𝑥𝑥
∆𝑥𝑥2𝑖𝑖 1to ∆
if 𝑃𝑃(∆ 2 if
)
1 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 <
𝑃𝑃(∆ 𝑃𝑃(∆
1 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 < |𝑥𝑥
2 𝑃𝑃(∆
) 𝑖𝑖,𝑑𝑑×1 )< )𝑃𝑃(∆
2 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 2 |𝑥𝑥
𝑖𝑖,𝑑𝑑×1 ) 𝑖𝑖,𝑑𝑑×1 )
Smart Univariate Smart Classifier
Univariate Classifier
Smart SmartUnivariate Univariate Classifier Classifier
The SUC ThemethodThe SUC
SUC applied
method method the F-weight
applied applied to extract
the
the F-weight F-weight thetoinlier
to extract the from
extract inlierthethe inlierthe
from from the
The
data SUCmatrix methodand applied
detected
matrix the
and the F-weight
outliers.
data matrix and detected the outliers. The outliers detected were were
data detected to
The
the extract
outliers
outliers. the inlier
detected
The from
outliers werethedetected
data matrix
subjected subjected and
tosubjected
F-weight todetected
F-weight treatment,
to thetreatment,
F-weight outliers.
which The which
transformed
treatment,
which outliers
transformed the detected
outliers
transformed were
into
the outliers the outliers
into into
inliers.inliers.
subjected Subsequently,
toinliers.
F-weight
Subsequently, the
Subsequently, classifier
treatment, which
the
the classifier wasclassifier
applied
transformed
was applied to
was theapplied
transform
the
to outliers to the
the transform data
into
transform
data data
set to build
inliers. tosetthe
Subsequently,
set build toclassification
build
thethe the classifier
classification modelwas
classification andapplied
model the
model
and benchmark
theand
to the thevalue.
transform
benchmark benchmark This
data value.
value. This This
set to build the method classification
is called smart modelbecause and the of benchmark
its sensitivity value. This
in identifying inliers
method
method
methodmethod is
is called
called
is calledfrom smart
the
smart
is smart data
called because because
because sets
smart because andof
of its
its sensitivity
subsequent
sensitivity
of its sensitivity
of its sensitivity in
conversion
in identifying
identifying
in identifying of
in identifyinginliers
outliers
inliers
inliers to inliers.
8 inliers The
8
from the data sets and subsequent conversion of outliers to inliers. The 8
from the the
from data
data formulation
the sets
setsdata and and
sets of the
subsequent
and subsequent
subsequent SUC method
conversion
conversion conversion(Okwonu,
of outliers
of outliers Ahad,
of outliers Okoloko
to inliers.
to inliers. to inliers.
The Theet al., 2022)
formulation
The formulation
formulation ofisthe
formulation
of the SUC
asoffollows.
of
SUCthe theSUC method
SUCmethod
method (Okwonu,
method
(Okwonu, Ahad,Ahad,
(Okwonu,
(Okwonu, Ahad, Okoloko
Ahad,Okoloko
Okoloko et al.,
Okoloko
et al., 2022)
2022)
etetal.,
al., 2022)
is as
2022)
is follows.
is as follows.
as follows. Let 𝑥𝑥1 and 𝑥𝑥2 be univariate random observations from ∆1 and ∆2 such
Let 𝑥𝑥 and 𝑥𝑥
Let 𝑥𝑥1 and 𝑥𝑥12that
Let be univariate
and univariate
𝑥𝑥𝑛𝑛2 −be2univariate
≥ 𝑝𝑝, random 𝑝𝑝 =random
1. observations
We observations
assume from that∆ 1𝑥𝑥and
∆from 1 and
∆2and such
𝑥𝑥2 ∆are normally
1 𝑥𝑥 2 be random observations from 1 and ∆∆12 such 2 such
that 𝑛𝑛 −
that 𝑛𝑛 that
−2≥ 2 ≥
𝑛𝑛 − 𝑝𝑝,
distributed 𝑝𝑝 =
𝑝𝑝, 2𝑝𝑝≥=𝑝𝑝,1. 𝑝𝑝We 1. We
with
= 1.assume assume
mean
We assume that
and 𝑥𝑥
variance,
that 𝑥𝑥1that 1 and 𝑥𝑥
which
and𝑥𝑥1𝑥𝑥2and 2 are is normally
𝑥𝑥 2 ).
𝑖𝑖are normallyLet 𝑊𝑊
~𝑁𝑁(𝜇𝜇,
are2𝑥𝑥2normally 𝜎𝜎
distributed with
distributed
distributed denote
with mean mean the
with and and
F-weight variance,
meanvariance, as
and variance, which
shown in
which which is 𝑥𝑥
Equation ~𝑁𝑁(𝜇𝜇, 17.𝜎𝜎
is 𝑥𝑥𝑖𝑖 ~𝑁𝑁(𝜇𝜇,
is 𝑥𝑥𝑖𝑖 ~𝑁𝑁(𝜇𝜇,
𝑖𝑖 2 ).
𝜎𝜎 ). Let Let 𝑊𝑊Let 𝑊𝑊
𝜎𝜎 2 ).
𝑊𝑊
denote the
denote denote F-weight
the F-weightthe F-weight as
as shown shown as shown in Equation
𝑥𝑥 in Equation
in Equation 17.
17. 17.
𝑥𝑥𝑖𝑖
𝑤𝑤𝑖𝑖 = 𝑍𝑍𝑖𝑖 , 𝑤𝑤𝑖𝑖 ∈ 𝑊𝑊, 𝑖𝑖 = 1, 2
𝑤𝑤𝑖𝑖 = = 𝑥𝑥𝑍𝑍𝑤𝑤𝑖𝑖 , 𝑤𝑤𝑖𝑖𝑥𝑥∈ 𝑖𝑖 𝑊𝑊, 𝑖𝑖 = 1, 2
(17)
𝑤𝑤 𝑖𝑖 𝑍𝑍
,𝑖𝑖 =𝑤𝑤𝑖𝑖`𝑍𝑍∈, 𝑊𝑊, 𝑤𝑤𝑖𝑖 𝑖𝑖∈=𝑊𝑊, 1, 2𝑖𝑖 = 1, 2
where 𝑍𝑍 = 𝑋𝑋1 𝑋𝑋2 , 𝑥𝑥1 ∈ 𝑋𝑋1 , 𝑥𝑥2 ∈ 𝑋𝑋2 , and 𝑤𝑤𝑖𝑖 are the weight associated
`
where 𝑍𝑍 𝑍𝑍 = 𝑋𝑋
where 𝑋𝑋1with
` 𝑋𝑋2 , 𝑋𝑋 𝑥𝑥1` ∈group.
each 𝑋𝑋1,, 𝑥𝑥 ∈ 𝑋𝑋2,, and
𝑥𝑥2Applying andEquation are the
𝑤𝑤,𝑖𝑖 and 𝑤𝑤𝑖𝑖17, weight
are the associated
theF-weight
weight mean vector is
associated
where = 1𝑍𝑍𝑋𝑋= 2 , 𝑥𝑥11 𝑋𝑋∈2 ,𝑋𝑋𝑥𝑥 1 1 ∈2𝑋𝑋∈1 ,𝑋𝑋𝑥𝑥 2 2 ∈ 𝑋𝑋𝑤𝑤 2 𝑖𝑖 are the weight associated
with each
with each group.
withgroup. given
each Applying
group.as Equation
Applying Applying Equation
Equation 18.Equation 17, the
17, the17, F-weight
the F-weight
F-weight mean
mean vectorvector
mean vector is
is is
given as
given as Equation
given as Equation
Equation 18. 18.
18. 𝑛𝑛𝑖𝑖
∑𝑥𝑥 =1 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖
∑𝑛𝑛𝑖𝑖=1 𝑤𝑤𝑖𝑖∑𝑥𝑥𝑥𝑥̅
𝑛𝑛
𝑛𝑛𝑖𝑖𝑖𝑖 =
𝑖𝑖
𝑖𝑖 (18)
∑𝑥𝑥𝑥𝑥𝑖𝑖𝑖𝑖=1 𝑤𝑤𝑖𝑖 𝑥𝑥𝑥𝑥𝑖𝑖 𝑖𝑖 =1 𝑖𝑖 𝑥𝑥𝑖𝑖 𝑤𝑤𝑖𝑖
𝑤𝑤
𝑥𝑥̅ 𝑖𝑖 =
𝑥𝑥̅𝑖𝑖 = 𝑥𝑥̅𝑖𝑖𝑤𝑤 𝑖𝑖
𝑤𝑤=𝑖𝑖 𝑤𝑤𝑖𝑖
8 We determine 𝑖𝑖 the inliers from Equation 17 and Equation 18 as shown
We determine
We determine
We determine in the
Equation
the inliersinliersthe inliersfrom
19.
from Equation Equation
from Equation 17 and
17 and17 Equation
and Equation
Equation 18 as
18 as shown
shown
18 as shown
in Equation 19.
in Equation
in Equation 19. 19. 1 𝑖𝑖𝑖𝑖 𝑤𝑤𝑖𝑖 < 𝑥𝑥̅𝑖𝑖 , 𝑥𝑥̅𝑖𝑖 = 𝑥𝑥̅1 , 𝑥𝑥̅2
with each group. Applying Equation 17, the F-weight mean vector is
givengiven
given as Equation 18.
givenas
asEquation 18.
as Equation
Equation 18. 18.
𝑛𝑛𝑖𝑖 𝑛𝑛𝑖𝑖
∑𝑥𝑥𝑛𝑛=1 ∑
𝑥𝑥𝑖𝑖𝑤𝑤 𝑖𝑖 𝑥𝑥 𝑤𝑤𝑛𝑛𝑖𝑖𝑥𝑥
Journal of ICT,
∑𝑥𝑥𝑖𝑖 𝑖𝑖22, No.
=1 ∑𝑖𝑖𝑥𝑥𝑖𝑖1=1
=1 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖𝑖𝑖
𝑖𝑖 𝑤𝑤𝑖𝑖 𝑥𝑥𝑖𝑖
(January) 2023, pp: 1–30
𝑥𝑥̅𝑥𝑥̅𝑖𝑖 =𝑥𝑥̅𝑖𝑖 =𝑥𝑥̅𝑖𝑖𝑖𝑖 =𝑤𝑤
𝑖𝑖 = 𝑤𝑤𝑖𝑖 𝑖𝑖
𝑤𝑤𝑖𝑖 𝑤𝑤𝑖𝑖
From
From the the sample
sample variance
variance inin in Equation
Equation 20,20,
we we compute
compute the pooled F- F-
From the
From sample
From
the variance
the
sample sample
variance in Equation
variance 20,
20,we
in Equation
Equation wecompute
20, thepooled
the
we compute
compute pooled F-
the pooled
pooled F-
weight
weight
F-weight variance
variance
variance
weight asas as
shown
shown
variance shown
asinin in Equation
Equation
Equation
shown 21.
21. 21.
in Equation
weight variance as shown in Equation 21. 21.
2 2 (𝑛𝑛 2 2
2𝑆𝑆 2 (𝑛𝑛1(𝑛𝑛 1 −1)𝑆𝑆
−1)𝑆𝑆 1 + 2(𝑛𝑛
12 +
(𝑛𝑛 −1)𝑆𝑆 +2 −1)𝑆𝑆
2−1)𝑆𝑆(𝑛𝑛
22 −1)𝑆𝑆
2 2
𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑆𝑆=2 =(𝑛𝑛1=−1)𝑆𝑆11 + (𝑛𝑛12−1)𝑆𝑆22
2 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
2
(𝑛𝑛1(𝑛𝑛
(21)
+𝑛𝑛 21)2
)(𝑛𝑛 −2
(𝑛𝑛1+𝑛𝑛
+𝑛𝑛12 −
2) − 2 2) − 2
+𝑛𝑛
Based on Equations
Based
Based on18
on Equationsto1821,
18weto18obtain
21,
towe the
wecoefficient
obtain the of the
the coefficient model ofas
of the model
BasedononEquations
Based 18toto21,
Equations
Equations 21,we
we obtain
21,
obtainthe
thecoefficient
obtain of
ofthe
themodel
coefficient
coefficient the model
model
can as
be referred
as
cancan
be
as be to Equations
referred
referred
can be to to 22
Equations
referred to and
Equations 23.
22 22
and
Equations and
23.
22 23.
and 23.
as can be referred to Equations 22 and 23.
(𝑥𝑥̅ −𝑥𝑥̅ )` )`
(𝑥𝑥̅ −𝑥𝑥̅ `
𝑤𝑤𝑤𝑤𝑥𝑥1𝑤𝑤=
𝑥𝑥= =
(𝑥𝑥̅1 −𝑥𝑥̅
1 𝑤𝑤𝑆𝑆𝑥𝑥1
12 (𝑥𝑥̅
21 𝑆𝑆=22 𝑤𝑤12𝑤𝑤 𝑥𝑥𝑥𝑥112 )𝑥𝑥==
)`2 1 −𝑥𝑥̅ = 𝑥𝑥1(𝑥𝑥̅
1𝑤𝑤(𝑥𝑥̅
(𝑥𝑥̅ −= 1 𝑥𝑥̅−
(𝑥𝑥̅ ` )−2
2 )𝑥𝑥̅1𝑆𝑆
`−
2
` −2` (𝑤𝑤
𝑆𝑆𝑥𝑥̅𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
−2 (𝑤𝑤𝑆𝑆1𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
2 )(𝑤𝑤
−2 =) 𝜔𝜔𝑤𝑤
𝑥𝑥11)𝑥𝑥(𝑤𝑤
1 =𝑥𝑥1𝜔𝜔𝑤𝑤
)1 𝑥𝑥 𝑥𝑥1 1 (22)
=11𝜔𝜔𝑤𝑤 𝑥𝑥1
𝑥𝑥1 𝑤𝑤
2 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 1 1 1 1 − 𝑥𝑥̅2 )
1 𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 1 𝑥𝑥1 ) =1 𝜔𝜔𝑤𝑤 1 𝑥𝑥1
` 1 −𝑥𝑥̅ 2 )`
(𝑥𝑥̅1 −𝑥𝑥̅2 )(𝑥𝑥̅
2 𝑥𝑥2 (23)
` −2 (𝑤𝑤
𝑤𝑤𝑥𝑥2 =𝑤𝑤𝑥𝑥22 = 𝑤𝑤 22 𝑥𝑥2 = 𝑤𝑤2(𝑥𝑥̅
𝑥𝑥21 = 𝑥𝑥̅(𝑥𝑥̅
− 2)
` −−2
1 𝑆𝑆 𝑥𝑥̅2 )(𝑤𝑤
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑆𝑆2𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑥𝑥2 )2=
𝑥𝑥2 ) =2𝜔𝜔𝑤𝑤 𝑥𝑥2𝜔𝜔𝑤𝑤(23) (23)
𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑆𝑆𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
Data Collection
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
Data Collection
Part A: This section consists of four data sets. The first data set was
based on the effect of quality of sleep; short sleep (1-4 hours), and long
sleep (5-8 hours) in relation to undergraduate academic performance
with an emphasis on grade point average (GPA) categorisation using
Pittsburgh Sleep Quality Index (PSQI) Scale and Perceived Stress
Scale (PSS) (Lok, 2018). Then, using the methods discussed, which
were BC, PC, SUC and SUBC, we applied the PSQI and PSS scales
to classify students into graduate classes based on their corresponding
GPA using the methods discussed. The second data set consisted
of the air quality index for three locations in Malaysia (Putrajaya,
Kuala Lumpur and Petaling Jaya) before the outbreak of the Covid-19
pandemic in 2019 (14/10/2019-10/11/2019) and during the same
period with the Covid-19 pandemic in 2020 (14/10/2020-10/11/2020)
(Department of Environment, 2020). The data set was collected
with two conditions: without movement control order (2019) and
with movement control order (pandemic period 2020). The third
data comprised the PH water level in Ampang Pecah and Kg. Timah
(Department of Environment, 2020), while the fourth data consisted
of body weight measurement (mg) of wide and laboratory-bred female
and male Aedes albopictus mosquitoes and body size measurements
of Aedes albopictus mosquitoes (Okwonu et al., 2012).
Part B: This section covered discrete data and consisted of two data
sets. The first data set in this section covered the Covid-19 daily
report. The data set was paired as follows: confirmed and discharged,
confirmed and dead, discharged and dead patients. We used the
Covid-19 data set from Malaysia (Kementerian Kesihatan Malaysia,
2020) and Nigeria (NCDCgov, 2020). The second data set in this
section comprised road traffic accidents via car and motorcycle.
11
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
These data were categorised based on severe injury, slight injury, and
the annual summary of car/motorcycle road traffic accidents (Jabatan
Siasatan dan Penguatkuasaan Trafik, 2020).
Equation 29Equation 29 is
is also used to also used
detect thetooverfitting
detect the of
overfitting of theseIfmethods. If 𝜀𝜀
these methods.
has a negative value, it implies that overfitting has occurred. Equations
has a negative value, it implies that overfitting has occurred. Equations
28 and 29 are applied to evaluate the performance
28 and 29 are applied to evaluate the performance of the classification of the classification
methods by methods
focusing byon focusing on theofproportion
the proportion correct groupof correct group membership
membership
prediction
prediction (Jimoh (Jimoh
et al., 2022;etHuberty
al., 2022; Huberty &
& Holmes, Holmes,
1983; Alf &1983; Alf &
Abrahams,
Abrahams, 1968; Levy, 1968;
1967).Levy, 1967).
Another Another
method method
similar similar29
to Equation to Equation 29
toperformance
to analyse the analyse the performance
of classifiersof classifiers
was discussedwas discussed
in Mohd Noorin Mohd Noor
et al. (2020). The data sets used in this paper are applied to Equation
28 to obtain the evaluation benchmark. 13
Part A
Table 1 shows the performance analysis of the four methods using the
first continuous data set. The values reported in Table 1 are the PEC
values. In Table 1, the outliers are randomly introduced to the original
data to determine the robustness and the breakdown of the methods.
We observe that the BC and the SUBC are more robust than the other
methods because the PEC values of these two methods (0.9933) are
equal to the OPEC value () for the uncontaminated data set. In the
second row of Table 1, we introduce ten random outliers (RO) into
the original data set. Then, in the third row, we add eight outliers
in addition to the ten outliers in Row 2, making it 18. Finally, we
introduce outliers for the other rows in a similar procedure. As more
outliers are introduced, the proposed SUC method shows outstanding
performance over the other three methods since 0.8600 is the highest
among the four methods.
Table 1
Figure 1
Figure 1
Comparative Breakdown
Comparative BreakdownAnalysis
Analysisof
ofthe
theClassification
ClassificationMethods
Methods
1.1
1
PEC
0.9
0.8
0.7
1 2 3 4 5 6
Random outliers
BC PC SUBC SUC
Table 2
Table 2
Probability of Correct Classifications
Probability of Correct Classifications
Performance
Performance analysis of cumulative
analysis of cumulative air
air quality
qualityindex
indexbefore
before(2019)
(2019)
and during lockdown (2020) (n=84)
and during lockdown (2020) (n=84)
BC
BC PC PC SUBC
SUBC SUCSUC
0.5833
0.5833 0.5833
0.5833 0.610.61 0.593
0.593
𝛿𝛿 = 0.702, 𝑟𝑟 = −0.1481, 𝑟𝑟 2 = 0.0219
Table 33 to
Table to Table
Table66contain
containthe
theclassification
classificationofofthe
theairairquality
qualityindex
index
(AQI) before
(AQI) beforeand
andduring
duringthe
theCovid-19
Covid-19pandemic
pandemicininMalaysia.
Malaysia.InInTable
Table
3, the
3, the SUBC
SUBC method
method (86.9%)
(86.9%)isisthe
thebest
bestclassifier,
classifier,followed
followedby bySUC
SUC
(84.5%), while both BC and PC account for 83.1%. However,
(84.5%), while both BC and PC account for 83.1%. However, this this data
set is weakly negatively correlated (𝑟𝑟 = −0.15); therefore,
data set is weakly negatively correlated (); therefore, the minimum the
minimum misclassification error rates relate to a very
misclassification error rates relate to a very weak and negative weak and
negative relationship.
relationship.
Table 4 15
Table 4
BC
0.607 PC
0.518 SUBC
0.571 SUC
0.518
0.607
0.607 0.518
0.518 0.571
0.571 0.518
0.518
𝛆𝛆𝛆𝛆 0.607 0.518 0.571 0.518
𝛆𝛆𝛆𝛆
𝛆𝛆𝛆𝛆 0.393
0.393 0.482
0.482 0.429
0.429 0.482
0.482
0.393
0.393 0.4820.482 0.429
0.429 0.482
0.482
𝛆𝛆 =
= 𝜹𝜹𝜹𝜹 −
− 𝑷𝑷𝑷𝑷𝑷𝑷
𝑷𝑷𝑷𝑷𝑷𝑷 -0.001 0.088 0.035 0.088
𝛆𝛆𝛆𝛆 = 𝜹𝜹 − 𝑷𝑷𝑷𝑷𝑷𝑷 -0.001
-0.001
-0.001 0.0880.088
0.088 0.035
0.035
0.035 0.088
0.088
0.088
𝛿𝛿 =0.606,
𝛿𝛿 =0.606, 𝑟𝑟𝑟𝑟 = −0.063, 𝑟𝑟𝑟𝑟222 =
= −0.063, = 0.0004
0.0004
𝛿𝛿 =0.606, 𝑟𝑟 = −0.063, 𝑟𝑟 = 0.0004
As shown
As shown in in Table
Table 4, an
4,an overfitting
anoverfitting problem
problemoccurs
overfittingproblem occursfor forthe
theBCBCmethod
method
As shown
(ε = in TableIt is4, considered overfitting occurs for the BC method
(ε
(ε
= −0.001).
=
−0.001). It
−0.001). It is considered
is overfitting in
considered overfitting in this
in this aspect
this aspect because
aspect becausethe
because the
the
OPEC (𝛿𝛿)
OPEC (𝛿𝛿) is is used
used as as the
the performance
performance benchmark.
benchmark. Nevertheless,
Nevertheless, the
the
OPEC
BC is is (𝛿𝛿) is usedbyas applying
in order
order the performance
the basicbenchmark.
axiom of Nevertheless,
probability. the
The
BC
BC in
is in order by applying
by implies
applyingthat the
thethe basic
basic axiom of probability.
axiom of probability. The
The
following
following criterion
criterion implies thatthat the misclassification
misclassification error
the misclassification error forfor BCBC
following
criterion implies
(refer to Equation 30) is equal to error for BC
(refer to Equation 30)
(refer to Equation 30) is equal to is equal to
𝜀𝜀𝜀𝜀 = 1 − 𝑃𝑃𝑃𝑃𝑃𝑃 = 1 − 0.607 = 0.393
𝜀𝜀𝜀𝜀 =
𝜀𝜀𝜀𝜀 = 11− −𝑃𝑃𝑃𝑃𝑃𝑃
𝑃𝑃𝑃𝑃𝑃𝑃 == 11− −0.607
0.607 = = 0.393
0.393
Applying this criterion to the classification problem suggests suggests thatthat the
the
Applying
Applying
BC method this
this criterion to
criterion
performs to the
the classification
classification problem
problem suggests
suggests thatthat the
the
error
poorly, with a significant misclassification error
BC
BC method
ratemethod performs
of 39.3 performs poorly,
poorly, withwith a significant misclassification error
percent. Therefore, the ajustification
significant formisclassification
applying the error
the OPEC
OPEC
rate
rate of
of 39.3
39.3
as an evaluation percent.
percent. Therefore,
Therefore, the
the justification
justification for
for applying
applying the
the OPEC
OPEC
evaluation and validation benchmark benchmark is is adopted
adopted andand justified.
justified.
as an
as an evaluation
evaluation and and validation benchmark
benchmark isis adopted
adopted and justified.
justified.
The SUBC classifier validation
correctly predicts a group and
membership
membership of
of 94.2
94.2
The SUBC
The SUBC classifier
classifier correctly
by correctly
PC and SUC predicts
withaa group
predicts grouppercent,
85.5 membership
membership of 94.2
of
respectively,94.2
percent, followed percent, respectively,
percent, followed
percent, followed by by PCPC andand SUC
SUC withwith 85.5
85.5 percent,
percent, respectively,
respectively,
based on the OPEC criterion. The correlation value value obtained
obtained is is
based
based
negative onand
on thevery
the OPEC
OPEC weak criterion.
criterion.
(-0.063). The
The
The correlation
correlation
effect of value isobtained
value
outliers obtained
generally isis
negative and very weak (-0.063). The effect of outliers is generally
negative
negative
negative and and
in all very
very
all weak (-0.063).
weak (-0.063).
statistical and The effect
The effect
probability of outliers
of outliers
models. That is
is generally
is generally
why the
negative
negative in
in statistical
all statistical
statistical and probability
andmethods
probability models.
models. That That
That isis why
why the the
negative
proposed
proposed in all
robust
robust classification
classification and probability
methods are
are models.
essential.
essential. is why the
proposed robust classification methods
proposed robust classification methods are essential. are essential.
Table 5
Table 55
Table
Table 5
Performance Analysis of Kuala Lumpur AQI before (2019) and
Performance
Performance
Performance
during Lockdown Analysis
Analysis
Analysis(2020)ofofKuala
of KualaLumpur
Kuala
(n=28) LumpurAQI
Lumpur AQI before
AQI before (2019)
before (2019) and
(2019) and
and
during Lockdown
during Lockdown (2020) (n=28) (2020) (n=28)
BC PC SUBC SUC
BC
BC
BC PCPC
PC SUBC
SUBCSUBC SUC SUC
SUC
0.679
0.679 0.643
0.643 0.679
0.679 0.696 0.696
0.679
𝛿𝛿=0.905,
0.679 0.643
𝑟𝑟 = −0.251, 2
0.643 𝑟𝑟 = 0.063 0.679
0.679 0.696
0.696
𝛿𝛿=0.905, 𝑟𝑟𝑟𝑟 =
𝛿𝛿=0.905, −0.251, 𝑟𝑟𝑟𝑟22 =
= −0.251, = 0.063
0.063
Table
In Table 5, SUC
5, the the classifier
SUC classifier accurately
accurately predicts
predicts the the group
group membership
In
with
In Table
membership 5,with
the76.9
76.9 percent,
Table 5, the SUC
BC
SUC classifier
percent, BC and
andclassifier
SUBC accurately
SUBC
with predicts
withpredicts
75and
75 percent,
accurately PCthewith
percent,
the andgroup
PC
71
group
membership
with 71The
membership
percent. with
percent.
with 76.9
The
76.9 percent,
percent,
relationship BC
showsBC andshows
relationship
and
a weak SUBC
SUBC awith
with 75correlation.
weak
75
and negative percent, andPC
and negative
percent, and PC
with 71 percent.
percent. The
correlation.
with 71 The relationship
relationship shows
shows aa weak
weak andand negative
negative
16
correlation.
correlation.
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
Table
Table66
Table 6
Table
Table 66
Performance
PerformanceAnalysis
Analysisof Petaling Jaya AQI before (2019) and during
Performance Analysis ofofPetaling
PetalingJaya
JayaAQI
AQIbefore
before(2019)
(2019)and
andduring
during
Lockdown
Lockdown(2020)
Performance(2020) (n=28)
(n=28)
Analysis of
Lockdown (2020)
Performance
BC PC of Petaling
(n=28)
Analysis Petaling Jaya
SUBCJayaAQI
AQIbefore
before
SUC
(2019)
(2019)and
andduring
during
Lockdown
BC
BC (2020) (n=28)
PC SUBC SUC
Lockdown (2020)PC(n=28) SUBC SUC
BC
0.536
0.536 PC
0.518
0.518 SUBC
0.500 SUC
0.519
0.500 SUBC 0.519
0.519 SUC
BC0.536 0.518PC 22 0.500
𝜹𝜹𝜹𝜹=0.539,
=0.539, 𝑟𝑟𝑟𝑟==−0.1960,
−0.1960, 𝑟𝑟𝑟𝑟2 ==0.0384
0.0384
𝜹𝜹0.536 0.5180.518
=0.539, 𝑟𝑟 = −0.1960,
0.536 0.500 0.500
𝑟𝑟 = 0.0384 0.519 0.519
The
𝜹𝜹The performance
performance
=0.539, analysis
𝑟𝑟 =in
analysis
𝑟𝑟 = −0.1960, 2
in Table
Table 66 demonstrates
0.0384 demonstrates that
The performance analysis in Table 6 demonstrates that the BC that the
the BC
BC
achieves
achievesthe
achieves thehighest
the highestaccuracy
highest accuracyof
accuracy of 99.4
99.4percent
of99.4 percentin ininclassifying
classifying the
classifyingthe group
thegroup
group
The performance
performance
membership, followedanalysis
analysis
by SUCin Tablepercent
in (96.3%)
Table 66anddemonstrates
demonstrates
PC (96.1%). that
that the BC
the
Meanwhile, BC
membership, followed
membership, followedby bySUC
SUC(96.3%)
(96.3%)and andPC PC(96.1%).
(96.1%).Meanwhile,
Meanwhile,
achieves
SUBC the
the highest
highest accuracy
accuracy of
of 99.4
99.4 percent
percent in
in classifying
classifying the
thegroup
group
SUBCaccounts
SUBC accounts
accounts for
for92.8
for 92.8
92.8 percent,
percent, with
withaaaweakly
weakly negative
weaklynegative relationship.
negativerelationship.
relationship.
membership, followed
followed bybypercent,
SUC with
SUC(96.3%)
(96.3%) and
andPC PC(96.1%).
(96.1%).Meanwhile,
Meanwhile,
SUBC
SUBC accounts for 92.8 percent, with a weakly negative
Table77accounts for 92.8 percent, with a weakly negative relationship.
Table relationship.
Table 7
Table
Table 77
Performance
PerformanceAnalysis
Performance Analysisof
Analysis ofofPH
PHLevel
PH Levelof
Level ofofWater
Water
Waterin ininAmpang
Ampang Pecah
AmpangPecahPecahandand Kg.
andKg.
Kg.
Timah
Timah
Timah (Mm)
(Mm)
(Mm)
Performance (n=39)
(n=39)
(n=39)
Analysis of of PH
PHLevel
LevelofofWater
Waterinin Ampang Pecah
Performance Analysis Ampang Pecah andand
Kg.
Kg. Timah
Timah (Mm) (Mm) (n=39)
(n=39)
BC BC
BC PC PC
PC SUBCSUBC
SUBC SUC SUC
SUC
0.8611
0.8611
BC
0.8611 0.875
0.875
PC
0.875 0.8611
0.8611
SUBC
0.8611 0.8333
0.8333
SUC
0.8333
BC PC SUBC SUC
0.8611 0.875 0.8611 0.8333
0.8611
-0.001 0.875
-0.015 0.8611
-0.001 0.8333
𝜀𝜀𝜀𝜀𝜀𝜀 -0.001
-0.001
-0.001 -0.015
-0.015
-0.015 -0.001
-0.001
-0.001
2
𝛿𝛿𝛿𝛿𝛿𝛿=
= 0.8599,
0.8599,
= 0.8599, 𝑟𝑟𝑟𝑟== 0.0793,
0.0793,
𝑟𝑟 =-0.001 𝑟𝑟 22=
𝑟𝑟 = 0.0063
0.0063
0.0793, 𝑟𝑟 = -0.015
0.0063
𝜀𝜀 -0.001
2
𝛿𝛿 = analysis
The 0.8599, 𝑟𝑟 shows
= 0.0793,
that𝑟𝑟accurate
= 0.0063group membership prediction does
The analysis
The analysis shows shows that that accurate
accurategroup
groupmembership
membershippredictionpredictiondoes does
not
notreflect
not reflecthighly
reflect highlycorrelated
highly correlateddatadatasets. Instead
sets.Instead “the better
“thebetter
Instead“the betterthethe prediction
theprediction
prediction
The analysis shows thatcorrelation
accurate group membership prediction does
power, the
power,
power, the weaker
the weaker the
weaker the correlationvalue”. This
value”.This finding
findingisisisrelatively
Thisfinding relatively
relatively
not reflect
possible highly correlated data sets. Instead “the better the prediction
possiblefor
possible forcontinuous
for continuousdata
continuous datasets.
sets.Table contains
Table77contains
containsthe the classification
theclassification
classification
power,
analysis the weaker the correlation value”. This finding is relatively
analysisof
analysis ofthe
of thePH
the PHlevel
levelof ofwater
waterinintwo different
twodifferent locations.
differentlocations.
locations.This This was
Thiswas
was
possible
applied for continuous data sets. Table 7 contains the classification
appliedto
applied toobserve
to observeififthe
observe thePHPHlevel
levelofofwater quality
waterquality
qualityinininthese
these locations
locationsisisis
theselocations
analysisorof
unique the PH levelverifiedof water in two different locations. This was
unique
unique or orvaries.
varies.ItItwas
varies. was verifiedininTable that
Table77that the
thatthe BC,
theBC,BC,PC PC
PCandand SUBC
andSUBC
SUBC
applied
methods to observe if the PH level of water quality in these locations is
methods overfit
methods overfit (ε(ε==−0.001,
overfit −0.001,−0.015, −0.001)based
−0.001)
−0.015,−0.001) based
basedon ononthethe OPEC
theOPEC
OPEC
unique
benchmark. or varies. It was verified in Table 7 that the BC, PC and SUBC
benchmark. On
benchmark. On the
On the other
other hand,
hand,SUC shows
showsaaa96.9
SUCshows 96.9 percent
96.9percent accurate
percentaccurate
accurate
methods overfit
prediction of (ε = −0.001, −0.015, −0.001) based on the OPEC
prediction
predictionof ofaaagroup membership.The
groupmembership. Thecorrelation
correlationvalue
correlation valuedisplayed
value displayedfor
displayed for
for
benchmark.
this data On the other
isispositive
positive hand, SUC shows a 96.9 percent accurate
this
thisdata
datais positiveand veryweak,
andvery weak,0.08.
0.08.
prediction of a group membership. The correlation value displayed for
this
Table
Table
Table
Table data
888 is positive and very weak, 0.08.
Table 8
Performance Analysis ofof Wide
Wide
Performance
Performance Analysis
Analysis of WideFemale
Femaleand
Female andMale
and MaleAedes
Male AedesAlbopictus
Aedes Albopictus
Albopictus
Mosquitoes’
Mosquitoes’
Mosquitoes’ Weight
Weight
Mosquitoes’Weight (mm)
Weight(mm) (n=96)
(mm)(n=96)
(n=96)
Performance Analysis of Wide Female and Male Aedes Albopictus
BC PC SUBC SUC
Mosquitoes’
BC Weight (mm) PC(n=96) SUBC SUC
0.4688 0.4896 0.5100 0.4896 18
18
18
0.4688 0.4896 0.5100 0.4896
𝛿𝛿=0.536, 𝑟𝑟 = 0.218, 𝑟𝑟 2 = 0.047 18
17
Based on Table 8, the proposed SUBC achieves the highest prediction
accuracy with 95.2 percent, followed by PC and SUC with 91.3 percent
prediction accuracy, respectively. Meanwhile, the BC records the
BC PC
BC BC
BC PC PC
PC SUBC SUBC
SUBC SUC SUC
SUC 0.4688 0.48
0.4688 0.4688
0.46880.4896 0.4896
0.4896 0.5100 0.5100
0.5100 0.4896
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30 0.4896
0.4896 2
𝛿𝛿=0.536, 𝑟𝑟 = 0.218, 𝑟𝑟 =
22
𝛿𝛿=0.536, 𝑟𝑟 𝛿𝛿=0.536,
= 0.218, 𝑟𝑟𝑟𝑟 2=
𝛿𝛿=0.536, ==0.218,
0.047 𝑟𝑟𝑟𝑟 =
0.218, = 0.047
0.047
Based on Table 8, the pro
Based
Based on
on Table
Table
Table 8,
8,
8,the
the
theproposed
proposed
proposed SUBC
SUBC
SUBC achieves
achieves
achieves the
the
thehighest
highest
highest
Based on Table 8, the proposed SUBC achieves the highest prediction accuracy prediction
prediction
predictionwith 95.2 percen
accuracy
accuracy with
with
with 95.2
95.2
95.2 percent,
percent,
percent, followed
followed
followed by
by by
PC
PC PC
and
and and
SUC
SUC SUC
with
with
accuracy with 95.2 percent, followed by PC and SUC with 91.3 percent prediction with
91.3
91.3 91.3 accuracy, resp
percent
percent
prediction
prediction
prediction percent accuracy,
accuracy,
prediction
accuracy, respectively.
respectively.
accuracy,
respectively. Meanwhile,
Meanwhile,
respectively.
Meanwhile, theMeanwhile,the
the BC
BC records BC records
thetherecords
BC the
the
records
lowest prediction accurac
lowest
lowest
the prediction
prediction
lowest
lowest prediction ataccuracy
prediction
accuracy accuracy
accuracy
87.5 at
at 87.5
87.5
percent. percent.
at 87.5
Thepercent. The
TheThe
percent.
relationship relationship
relationship
relationship
for for
the Aedes the
forfor
the
Albopictus mosqui
Aedes
Aedes
Aedes Albopictus
the Albopictus
Albopictus
Aedesmosquito mosquito
data
Albopictus is alsodata
mosquito
mosquitodata is
is also
also
positively
data positively
positively
weak.
is also weak.
positivelyweak.
weak.
Table 9
Table
Table999
Table 9 Table
Performance Analysis of
PerformancePerformance
Analysis ofAnalysis
Performance Analysis of
of Laboratory-reared
Female andFemale
Laboratory-reared
Laboratory-reared Female and
and Male
Male Aedes Male Aedes
Aedes Mosquitoes Bo
Albopictus
Performance Analysis of Laboratory-reared Female and Male Aedes
AlbopictusAlbopictus
MosquitoesMosquitoes
Albopictus Mosquitoes
Body Size (mm)Body Size
Size (mm)
Body (n=30) (mm) (n=30)
(n=30)
Albopictus Mosquitoes Body Size (mm) (n=30) BC
BC BC
BCPC PC
PC
SUBC SUBC
SUBC
SUC SUC
SUC
BC PC SUBC SUC 1.00
1.00 1.00
1.00
0.950 0.950
0.950
1.00 1.00
1.00
0.950 0.950
0.950
1.00 0.950 1.00 0.950 𝜀𝜀 -0.002
𝜀𝜀 𝜀𝜀𝜀𝜀
-0.002 -0.002
-0.002
-0.002 -0.002 -0.002
-0.002
-0.002 𝛿𝛿 =0.998, 𝑟𝑟 = −0.095, 𝑟𝑟 2 =
22
𝛿𝛿 =0.998, 𝑟𝑟𝛿𝛿𝛿𝛿==0.998,
−0.095,𝑟𝑟𝑟𝑟 =
=0.998, 𝑟𝑟=2 −0.095,
= 0.009 𝑟𝑟𝑟𝑟 =
−0.095, = 0.009
0.009
In Table 9, the BC and SU
In
In Table
Table
Table 9,
9,
9, the
the
the BC
BC
BC and
and
and SUBC
SUBC
SUBC methods
methods
methods predict
predict
predict the
the
the group
group
group membership
membership
membership
In Table 9, the BC and SUBC methods predict the group membership with 100% accuracy. H
with 100% with
with 100%
100%
100%
accuracy. accuracy.
accuracy.
accuracy. However,
However,
However,
However, both both both
methods
methods methods
overfittedoverfitted
both overfitted
methods overfitted (𝜀𝜀
(𝜀𝜀 = −0.002).(𝜀𝜀 =
=Meanwhile, the
−0.002). Meanwhile,
−0.002).
−0.002). Meanwhile,
Meanwhile, Meanwhile,
the
thePCPCand the
the
andSUC PC
SUC and SUC
PCmethods
and SUC
methods methods
methods
achieve
achieve achieve
achieve
95.2%
95.2% of95.2%
95.2%
of the group
the of
groupthe
ofmembership
the predic
group
group membership
group membership membership
membership prediction. prediction.
prediction.
Therefore,
prediction. Therefore,
Therefore, the
the association
the association
Therefore, shown
the association has shown
association
shown shown
a weak
has has
has aa
negative
a weak correlation
weak
weak
weak negative negative
negative
correlation.
negative correlation.
correlation.
correlation.
Table 10
Table 10 Table
Table 10
10
Table 10 Performance Analysis of
PerformancePerformance
Analysis ofAnalysis
Performance Analysis
Wide and of
of Wide
Wide and
and Laboratory-reared
Laboratory-reared
Laboratory-reared Female and Female
Female and
and Albopictus M
Male Aedes
Performance
Male AedesMale
Male Aedes
Albopictus Analysis
Aedes Albopictus
Albopictus of Wide
MosquitoesMosquitoes
Mosquitoesand
Wing Length Laboratory-reared
Wing
Wing
(mm)Length
Length
(n=20) (mm) Female
(mm) (n=20)
(n=20) and
Male Aedes Albopictus Mosquitoes Wing Length (mm) (n=20) BC
BC BC
BCPC PC
PC
SUBC SUBC
SUBC
SUC SUC
SUC𝛿𝛿 𝛿𝛿𝛿𝛿
BC PC SUBC SUC 1.00
1.00 1.00
1.00
0.925 0.925
0.925
1.00 1.00
0.901.00 0.90
1.00 0.90 0.90
1.00 0.925 𝜀𝜀 -0.007
𝜀𝜀 𝜀𝜀𝜀𝜀
-0.007 -0.007
-0.007
-0.007 -0.007 -0.007-0.007
-0.007 =0.993, 𝑟𝑟 = −0.038, 𝑟𝑟 2 =
22
=0.993, 𝑟𝑟 ==0.993,
−0.038,𝑟𝑟𝑟𝑟 =
=0.993, 𝑟𝑟= −0.038,
2
= 0.001 𝑟𝑟𝑟𝑟 =
−0.038, = 0.001
0.001 Similarly, the data set in T
Similarly,
data setthe
Similarly,
Similarly, the the data
data set
in Table 10in
set Table
Table 10
10 demonstrates
indemonstrates that the BCthat
demonstrates andthe
that the BC
BC and
SUBC and SUBC
SUBCobtain 100 pe
methods
methods
methods100
Similarly,
methods obtain obtain
obtain
the 100
data set
percent percent
100inaccuracy
percent
Table accuracy
accuracy
10 in in predicting
in that
demonstrates
predicting predicting
thethegroup
BC andthe group
theSUBC
group
methods obtain 100 percent accuracy in predicting the group and there is
membership,
membership, and there is also overfitting in both methods (𝜀𝜀 19 = −0.007). 19The accurate g
19
The accurate group predictions for the PC and SUCmethods methodsareare
93.2 percent and 9
93.2 percent and 90.6 percent, respectively. Even though the the
BCBC andand
SUBC show ove
SUBC show overfitting in Table 9 and Table 10, the methods can bebe accepted beca
methods can
accepted because the OPEC value is approximately one. one.Therefore,
Therefore, the correlation
the correlation for this data set is weak and negatively correlated.
correlated.
18 Part B
Table 10
Table14
14
Table
Performance Analysis ofofConfirmed
Performance Analysis Wide and and
Laboratory-reared Female
Discharged Cases for and
Performance
Male Aedes Analysis ofMosquitoes
Albopictus Confirmed Wing
and Discharged
Length Cases
(mm) for
(n=20)
Covid-19 Virus, Nigeria Data (n=163)
Covid-19 Virus, Nigeria Data (n=163)
BC
BC PCPC SUBC
SUBC SUC
SUC 𝛿𝛿
BC PC SUBC SUC
Table 14 1.00
0.798 0.925
0.647 1.00
0.798 0.90
0.497
0.798 0.647 0.798 0.497
𝜀𝜀
𝜀𝜀 -0.244
-0.007
-0.244 -0.099
-0.099 -0.244
-0.007
-0.244
Performance Analysis22of Confirmed and Discharged Cases for
=0.993,
𝛿𝛿= 0.548,𝑟𝑟 =
Covid-19 =−0.038,
𝑟𝑟Virus,
0.153, 𝑟𝑟𝑟𝑟 =
Nigeria 0.001
=Data
0.023(n=163)
Similarly, the data set in Table 10 demonstrates that the BC and SUBC
methods
Table obtain BC
14 reveals 100 BC,
that percent accuracy
PC,PCand SUBC inmethods
SUBC predicting
are the group
SUCoverfitted
(ɛ = -0.244, -0.099, -0.244), based on the OPEC benchmark values. In
0.798 0.647 0.798 0.497 21
contrast, the SUC method attains 90.7 percent accuracy in predicting 19
the group -0.244 with a-0.099
𝜀𝜀 membership, -0.244 correlation for this
very weak positive
2
𝛿𝛿= 0.548,
data set. 𝑟𝑟 = 0.153, 𝑟𝑟 = 0.023
Table
Table15
15
PerformanceAnalysis
Performance Analysisof
ofConfirmed
Confirmedand
andDeath
DeathCases
Casesfor
forCovid-19
Covid-19
Virus, Nigeria Data (n=163)
Virus, Nigeria Data (n=163)
BC
BC PC PC SUBCSUBC SUC SUC
0.834
0.834 0.804 0.8340.834 0.733
0.804 0.733
𝛿𝛿 =0.939, 𝑟𝑟 = 0.65, 𝑟𝑟 2 = 0.423
Table 15 illustrates that the BC and SUBC can predict the group
20
membership correctly with 88.8 percent, while the PC and SUC have
85.6 percent and 78.1 percent accuracy, respectively. Similar to the
Malaysian data set, a moderately strong correlation can also be found
BC 1.00 PC 0.950 SUBC 1.00 SUC0.950
0.834 0.804 0.834 0.733
𝜀𝜀 -0.002 -0.002
22, No. 1 (January) 2023, pp: 1–30
=0.939, 𝑟𝑟 =Journal
𝛿𝛿 =0.998, 0.65, 𝑟𝑟of2 ICT,
−0.095, =
𝑟𝑟 2 0.423
= 0.009
Table
In Table 15 illustrates that the BC and SUBC can predict the group
Table 15 9, the BC and
illustrates thatSUBC
the BC methods predictcan
and SUBC the predict
group membership
the group
membership correctly
with 100% correctly
accuracy. with 88.8 percent, while the PC and SUC have
membership withHowever, bothwhile
88.8 percent, methods
the PC overfitted
and SUC have (𝜀𝜀 =
85.6 percent
−0.002). and 78.1 the
Meanwhile, percent
PC accuracy,
and SUC respectively.
methods achieve Similar
95.2% to
of the
the
85.6 percent and 78.1 percent accuracy, respectively. Similar to the
Malaysian
group data set, prediction.
membership a moderately strong correlation
Therefore, the can also
association be found
Malaysian
in this data data
Nigeria set, aset
moderately
for strong
confirmed correlation
and death can also be found a
cases.
shown has
weak
in this negative correlation.
Nigeria data set for confirmed and death cases.
BC PC SUBC SUC
Table 10 16
Table 16 0.4688 0.4896 0.5100 0.4896
Performance Analysis
0.218, 𝑟𝑟 2ofof
𝛿𝛿=0.536, 𝑟𝑟 =Analysis Wide
0.047 and Laboratory-reared
=Discharged and Death Cases forFemale Covid-19and
Performance
Male Aedes Analysis
Albopictus
Virus, Nigeria Data (n=163) of Discharged
Mosquitoes and
Wing Death
Length Cases
(mm) for Covid-19
(n=20)
Virus,
Based Nigeria Data
on Table (n=163)
8, the proposed SUBC achieves the highest prediction
accuracy with 95.2 BCBC PCPC bySUBC
percent, followed and SUC SUC
PCSUBC withSUC
91.3 percent𝛿𝛿
prediction accuracy, BC respectively.
0.767 PC Meanwhile,
0.73 SUBC
0.761 the BC0.509SUC
records the
1.00 0.9250.73 1.00 0.90 0.509
lowest prediction0.767 accuracy at 87.5 percent. 0.761
The relationship for the
Aedes𝜀𝜀𝜀𝜀 Albopictus mosquito data
-0.145
-0.145
-0.007 is also-0.007
-0.108
-0.108 positively
-0.139 weak.
-0.139
=0.993,
𝛿𝛿 =0.622,𝑟𝑟 = 0.198, 𝑟𝑟𝑟𝑟22 =
−0.038,
𝑟𝑟 = = 0.001
0.039
Similarly,
Table 9 the data set in Table 10 demonstrates that the BC and SUBC
Again,
methods
Again, theobtain
the dataset
data set100
ininTable
Table 16also
percent
16 also revealsinthat
accuracy
reveals that BC,PCPCand
predicting
BC, and SUBC
theSUBC
group
Performance
methods are Analysis
overfitted of
(ɛ Laboratory-reared
= -0.145, -0.108, Female
-0.139), and
while
methods are overfitted (ɛ = -0.145, -0.108, -0.139), while the SUC Male
the Aedes
SUC
Albopictus
method Mosquitoes
predicts 81.8 Body
percent Size
group (mm) (n=30)
membership, and the
method predicts 81.8 percent group membership, and the data show19 data show an
extremely weak and positive correlation.
an extremely weak and positive correlation.
BC PC SUBC SUC
Tables 17 to 19 display the annual road traffic accident data
Tables 17 to 19 display the annual road traffic accident data categorised
categorised based 1.00 on the degrees 0.950 1.00
of fatalities. Based on 0.950
the results in
based on the degrees of fatalities. Based on the results in Table 17 to
Table 17 𝜀𝜀 to Table-0.002
19, the probability-based-0.002 methods (BC and SUBC)
Table 19, the probability-based methods (BC and SUBC) overfit the
overfit
OPEC the 𝑟𝑟OPEC
𝛿𝛿 =0.998,
value. = value.
−0.095,
However, 𝑟𝑟 2 However,
the =PC0.009
method theoutperforms
PC methodtheoutperforms
SUC method the
SUCamethod
with moderate withtoa very
moderate to very
strong strong relationship.
relationship. Based on the Based on the
adopted 22
In Table
adopted 9, the
benchmarkBC and SUBC
value, we methods
may predict
conclude the
that
benchmark value, we may conclude that these two methods (PC and group
these membership
two methods
withand
(PC
SUC) 100%
provide accuracy.
SUC)moreprovide However,
more
accurate accurateboth
predictions formethods
predictions foroverfitted
classification (𝜀𝜀 =
classification
than the
than the probability methods (BC and SUBC). The Pearson
probability methods (BC and SUBC). The Pearson correlation of thisthe
−0.002). Meanwhile, the PC and SUC methods achieve correlation
95.2% of
ofgroup
data this
setdata set is
membership
is very veryprediction.
strong strong andTherefore,
and positively positively associated,
the
associated, mainlymainly
association forshownforsets
data data
has a
sets
weak in Tables
negative
in Tables 17 and 19.17 and 19.
correlation.
Table17
Table 10
Table 17
PerformanceAnalysis
Performance AnalysisofofCar Wideandand Laboratory-reared
Motorbike Road Traffic Female and
Accident
Performance
Male Aedes Analysis
Albopictus of Mosquitoes
Car and Motorbike
Wing Road(mm)
Length Traffic Accident
(n=20)
(n=30)
(n=30)
BCBC PC PC SUBCSUBC SUC SUC 𝛿𝛿
1.00 BC 0.83 PC 1.00 1.00SUBC 0.82 0.90 SUC
1.00
1.00 0.925
0.83 1.00 0.82
𝜀𝜀 𝜀𝜀 -0.022
-0.007
-0.022 -0.007
-0.022 -0.022
2
𝛿𝛿=0.993,
=0.978, 𝑟𝑟𝑟𝑟==−0.038,
0.946, 𝑟𝑟𝑟𝑟2 ==0.895
0.001
Similarly, the data set in Table 10 demonstrates that the BC and SUBC
Table 17 reveals
methods obtainthat
100thepercent
PC and accuracy
SUC achieve
in 85.3 percentthe
predicting and group
83.4
21
percent correct group prediction, with a very strong positive
correlation value of 0.946. Meanwhile, the probability-based methods
(BC and SUBC) in Table 18 show overfitted values (ɛ = -0.007), in 19
Based on TableBC 8, the proposed PC PC SUBC SUBCachieves the highest prediction
BC SUBC SUC SUC
accuracy with 95.2 1.00 percent, followed
0.83 by PC
1.00 and SUC
0.82with 91.3 percent
Journal
prediction accuracy, of ICT, 22, No. 1 (January) 2023, pp: 1–30BC records the
1.00 respectively. 0.950 Meanwhile, 1.00 the 0.950
lowest
𝜀𝜀 𝜀𝜀 prediction
-0.022 accuracy at 87.5 percent.
-0.022-0.002 The relationship for the
-0.0022
Aedes
𝛿𝛿𝛿𝛿=0.978, Albopictus mosquito data is also positively weak.
Table 17 𝑟𝑟reveals
=0.998, 𝑟𝑟==0.946, that𝑟𝑟 the
−0.095, 𝑟𝑟=2 0.895
=PC0.009and SUC achieve 85.3 percent and
83.4
Table percent
Table17 correct group prediction,
9 reveals that the PC and SUC achieve with a very strong positive
85.3 percent and 83.4
In Table
correlation 9, the
value BC
of and
0.946. SUBC methods
Meanwhile,
percent correct group prediction, with a very strong thepredict the group
probability-based membership
methods
positive
(BCwithand100%
Performance
correlation SUBC)
valueaccuracy.
Analysis
in
of Table
0.946. However,
of 18 both
Laboratory-reared
show overfitted
Meanwhile, methods
Female(ɛand
values
the probability-basedoverfitted
Male
= -0.007), (𝜀𝜀
Aedes
methodsin =
Albopictus
−0.002).
contrast
(BC andtoSUBC)Meanwhile,
Mosquitoes
the PCinand Tablethe
SUCPCSize
Body
18 and (mm)
methods
show SUC methods
(n=30)
which
overfitted achieve
accurately
values 95.2%
(ɛ =predict of
thethe
-0.007), in
contrast to the PC and SUC methods which accurately predict thea
group
group membership
membership atprediction.
86.6 percent Therefore,
and 85.6 the association
percent, shown
respectively, has
with
weak
agroup negative
moderately
membership correlation.
positiveBCat correlation
86.6 percent PCofand
0.613 SUBC
85.6for this data
percent, SUC
set.
respectively, with
aTable
moderately
10 positive correlation of 0.613 for this data set.
1.00 0.950 1.00 0.950
Table 18
Table 𝜀𝜀18 -0.002 -0.002
Performance Analysis 2of Wide and Laboratory-reared Female and
𝛿𝛿 =0.998, 𝑟𝑟 =Analysis
Performance −0.095, 𝑟𝑟of = Car0.009Severe and Motorbike Severe Road
Male Aedes Albopictus
Performance Analysis of Mosquitoes
Car Severe Wing
and Length
Motorbike (mm) (n=20)
Severe Road
Traffic Accident Injuries (n=25)
Traffic
In Table Accident
9, the BC Injuries
and SUBC (n=25)methods predict the group membership
BC PC SUBC SUC 𝛿𝛿
with 100% accuracy. BC However, PC both methodsSUBC overfitted
SUC (𝜀𝜀 =
−0.002). Meanwhile, BC PC SUC SUBC SUC
1.00the PC
1.00
1.00
and
0.925
0.86 0.86 methods
1.001.00achieve
1.00
0.9095.2%
0.84 of the
0.84 shown
group membership-0.007 prediction. Therefore, the-0.007 association has a
𝜀𝜀 -0.007 -0.007
weak𝜀𝜀negative -0.007 correlation.
22
-0.007
𝛿𝛿=0.993,
=0.993,𝑟𝑟𝑟𝑟==−0.038,
0.613, 𝑟𝑟𝑟𝑟 ==0.376
0.001
Similarly,
Table 10 the data set in Table 10 demonstrates that the BC and SUBC
methods
Table 19 obtain 100 percent accuracy in predicting the group
Performance
Table 19
Performance AnalysisofofCar
Analysis Wide andand
Slight Laboratory-reared
MotorbikeSlight Female and
SlightRoad
Road
Performance
Performance Analysis
Analysis of
of Car
Car Slight
Slight and
and Motorbike
Motorbike Slight Road
Performance
Male Aedes
Traffic Analysis
Albopictus
Accident of Mosquitoes
InjuriesCar Slight and
(n=25) Motorbike
Wing Slight(n=20)
Length (mm) Road Traffic19
Traffic
Traffic Accident
Accident Injuries (n=25)
Accident InjuriesInjuries
(n=25) (n=25)
BC
BC PC
PC SUBC
SUBC SUC
SUC 𝛿𝛿
BC
BC PC
PC SUBC
SUBC SUC
SUC
1.00 BC
1.00 0.76 PC
0.76 1.00 SUBC0.70
1.00 0.70 SUC 23
1.00
1.00 0.76
0.925 1.00
1.00 0.70
0.90
1.00 0.76 1.00 0.70
𝜀𝜀
𝜀𝜀𝜀𝜀 𝜀𝜀 -0.117
-0.007
-0.117 -0.117 -0.117
-0.007
-0.117 -0.117
-0.117 -0.117
𝛿𝛿𝛿𝛿𝛿𝛿=0.993, 0.968, 𝑟𝑟𝑟𝑟22 =
=0.883,𝑟𝑟 𝑟𝑟==−0.038, = 0.937
0.001
=0.883,
=0.883, 𝑟𝑟𝑟𝑟 = 0.968, 𝑟𝑟𝑟𝑟22 =
= 0.968, = 0.937
0.937
Similarly, the data set in Table 10 demonstrates that the BC and SUBC
Similar
methods
Similar
Similar
Similar to
to the
the results
obtain
tothe results
results 100
results ininin
in Table
percent
Table
Table
Table 19,
19,
19, the
the
the probability-based
accuracy
19, the in predicting
probability-based
probability-based
probability-based classifiers
the groupare
classifiers
classifiers
classifiers are
are
overfitted,
are overfitted,
overfitted,
overfitted, while
while
while the
while
the classification
the classification
the classification
classification using
using PC
usingusing
PC
PC and and
and SUC
PCSUC
SUC methods
and methods
SUC
methods attain
methods
attain
attain
86.1
attain
86.1 percent
86.1
percent and
percent
and 79.3
and
79.3 percent
79.3 percent
percent accuracy
accuracy
accuracy inin
86.1 percent and 79.3 percent accuracy in predicting the groupin predicting
predicting
predicting the
the
the group
group
group 19
membership.
membership. This
membership.
membership. This
This data
This data
data
data set set
set reveals a very
reveals aa very
set reveals strong
very strong
very positive
strong positive
strong correlation
positive correlation
positive correlation
correlation
whichisisis0.968.
which
which 0.968.
0.968.
Table20
Table
Table 2020summarises
summarisesthe
summarises thebest-performed
the best-performedmethod
best-performed methodfor
method forthe
for thecontinuous
the continuous
continuous
continuous
data
data sets
data sets based
sets based
based on on
on thethe minimum
the minimum misclassification
minimum misclassification
misclassification rate rate (𝜀𝜀).
rate (𝜀𝜀). The
(𝜀𝜀).The results
results
Theresults
The results
show
show
show that that
that the the proposed
the proposed
proposed methodsmethods outperformed
methods outperformed
outperformed the the conventional
the conventional
conventional
conventional
methods.
methods. Table 20 also includes the comparative
methods. Table 20 also includes the comparative analysis of
Table 20 also includes the comparative analysis
analysis of the
of the
the
the
performance
performance
performance of of different
of different methods with respect to the Pearson
different methods
methods with with respect
respect to to2 the Pearson
the Pearson
Pearson
correlation
correlation (𝑟𝑟) and
and thethe coefficient
the coefficient of determination 22 ) values. Based
(𝑟𝑟
correlation (𝑟𝑟)
(𝑟𝑟)and coefficient ofof determination
determination (𝑟𝑟 values.
(𝑟𝑟 ))values. Based
values.Based
Based
onthe
on theresults
resultsin inthis
thistable,
table,we wemay
mayconclude
concludethat thataaaweak
weakpositive
positiveor oraaa
on theresults
the resultsininthis thistable,
table,wewemaymayconclude
conclude thatthatweak
a weak positive or
positive
very
very weak
very weak negative
weak negative correlation
negative correlation
correlation is is associated
is associated
associated with with
with the the minimum
the minimum
minimum
misclassificationrate
22
misclassification
misclassification ratefor
rate foraaacontinuous
for continuousdata
continuous dataset.
data set.We
set. Wealso
We alsoobserve
also observethat
observe that
that
2
very
very small
very small 22 values correspond to the minimum misclassification rate.
𝑟𝑟
small 𝑟𝑟𝑟𝑟 values
values correspond
correspond to to the
the minimum
minimum misclassification
misclassification rate. rate.
Theresults
The
The resultsof
results ofofthe
thedata
the datasets
data setsin
sets inTable
in Table20
Table 20are
20 aredepicted
are depictedin
depicted inFigure
in Figure2.
Figure 2.2.
data
datasets
sets based
based on on thethe minimum
minimum misclassification
misclassification rate (𝜀𝜀). The
rate (𝜀𝜀). The results
results
show
showthat that
thatthe the proposed
theproposed
proposedmethods methods outperformed
methodsoutperformed
outperformedthe the conventional
theconventional
conventional
show
show that the proposed methods outperformed the conventional
methods.
methods.TableTable
Table20 20 also includes the comparative analysis of the
methods.
methods. Table 2020alsoalsoincludes
also includesthe
includes thecomparative
the comparativeanalysis
comparative analysisof
analysis ofofthe the
the
performance
performance of
of
Journal different
different
of ICT, 22, methods
methods
No. 1 (January)with
with respect
respect
2023, pp: 1 – toto the
30 the Pearson
Pearson
performance
performance of of different
different methods
methods with with respect
respect to to2 thethe Pearson
Pearson
correlation
correlation (𝑟𝑟)
(𝑟𝑟) and
and the
the coefficient
coefficient ofof determination
determination (𝑟𝑟22 2))values.
(𝑟𝑟 values. Based
Based
correlation
correlation (𝑟𝑟) and the coefficient of determination (𝑟𝑟 ) values.Based
(𝑟𝑟) and the coefficient of determination (𝑟𝑟 ) values. Based
on
on the
the results
results in
in this
this table,
table, we
we may
may conclude
conclude that
thata a weak
weak positive
positive or a
on the
on athe
or
results
veryresults
weak
in this
innegativetable,
this table, we may conclude
we may conclude
correlation is
that a weak
that awith
associated weak positive
thepositive
minimum ororaa a
or
very weak
veryweakweaknegativenegative correlation
negativecorrelation is associated
correlationisisisassociated
associatedwith withthe the minimum
theminimum
minimum
very
very weak negative correlation associated with the minimum
misclassification
misclassification
misclassification rate
rate
ratefor
foraaacontinuous
for continuous
continuous data
data
dataset.
set.
set.We
We
We also
also
also observe
observe
observe that
that
that
misclassification
misclassification 2 rate
rate for
for aa continuous
continuous data data set.
set. We
We also also observe
observe that that
very
very
very small
small
small 𝑟𝑟2𝑟𝑟2 values
2
values correspond
correspond
correspond to
totothe
the
the minimum
minimum
minimum misclassification
misclassification
misclassification rate.
rate.
rate.
very
verysmall
small𝑟𝑟𝑟𝑟 valuesvaluescorrespond
correspondto tothe theminimum
minimummisclassification
misclassificationrate. rate.
The
The results
results
Theresults of
resultsof ofthethe data
data sets
sets inTable
in Table 2020are
are depicted
depicted inininFigure
Figure 2.2.2.
The
The results ofofthe thedata
the datasets
data setsin
sets ininTable
Table20
Table 2020arearedepicted
are depictedin
depicted inFigure Figure2.
Figure 2.
Table
Table 20
20
Table20
Table
Table 2020
Comparative Analysis
ComparativeAnalysis ofthe
Analysisof the Best Methods for Continuous Data
Comparative
Comparative
Comparative Analysis
Analysis ofofthe
of theBest
the BestMethods
Best
Best Methodsfor
Methods
Methods forContinuous
for
for ContinuousData
Continuous
Continuous Data
Data
Data
r 𝑟𝑟 22 PEC OPEC(𝛿𝛿) 𝜀𝜀 Best Methods
rrr r PEC
PECPEC OPEC(𝛿𝛿)
PEC OPEC(𝛿𝛿) 𝜀𝜀𝜀𝜀 𝜀𝜀
𝑟𝑟𝑟𝑟22𝑟𝑟
OPEC(𝛿𝛿) Best
Best
Best
Best Methods
Methods
Methods
Methods
= 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 −−𝑃𝑃𝑃𝑃𝑃𝑃
=== 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂−− 𝑃𝑃𝑃𝑃𝑃𝑃
𝑃𝑃𝑃𝑃𝑃𝑃
𝑃𝑃𝑃𝑃𝑃𝑃
0.283 0.080
0.283 0.080 0.993
0.080 0.993 0.999
0.993 0.999
0.999 0.006
0.006 BC
BC &&SUBC
0.283
0.283
0.283 0.080
0.080 0.993
0.993 0.999
0.999 0.006
0.006
0.006 BC
BCBC
&&& SUBC
SUBC
SUBC
SUBC
-0.148
-0.148 0.022
-0.148 0.022 0.61
0.022 0.61
0.61 0.702
0.702
0.702 0.092
0.092
0.092 SUBC
SUBC
SUBC
-0.148
-0.148 0.022
0.022 0.61
0.61 0.702
0.702 0.092
0.092 SUBC
SUBC
-0.063
-0.063 0.0004
-0.063 0.0004 0.571
0.0004 0.571
0.571 0.606
0.606
0.606 0.035
0.035
0.035 SUBC
SUBC
SUBC
-0.063
-0.063 0.0004
0.0004 0.571
0.571 0.606
0.606 0.035
0.035 SUBC
SUBC
-0.251
-0.251 0.063
-0.251 0.063 0.696
0.063 0.696
0.696 0.905
0.905
0.905 0.209
0.209
0.209 SUC
SUC
SUC
-0.251
-0.251 0.063
0.063 0.696
0.696 0.905
0.905 0.209
0.209 SUC
SUC
-0.196
-0.196 0.038
-0.196 0.038 0.536
0.038 0.536
0.536 0.539
0.539
0.539 0.003
0.003
0.003 BC
BC
BC
-0.196
-0.196 0.038
0.038 0.536
0.536 0.539
0.539 0.003
0.003 BC
BC
0.079 0.006 0.833 0.859 0.026 SUC
0.079 0.006 0.833 0.859 0.026 SUC
24
24
0.218 0.047 0.51 0.536 0.026 SUBC 24
24
0.218 0.047 0.51 0.536 0.026 SUBC
-0.095 0.009 0.95 0.998 0.048 PC & SUC
-0.095 0.009 0.95 0.998 0.048 PC & SUC
-0.038 0.001 0.925 0.993 0.068 PC
-0.038 0.001 0.925 0.993 0.068 PC
Figure 2
Figure 2
Comparative Analysis of Continuous Data
Comparative Analysis of Continuous Data
0.4 0.3
Performance values
Performance values
0.3 0.25
0.2 0.2
0.1 0.15
0 0.1
-0.1 1 2 3 4 5 6 7 8 9 0.05
0
-0.2
1 2 3 4 5 6 7 8 9
-0.3
Number of experiments
Number of experiments
r MISCLASSIFICATION RATE
MISCLASSIFICATION RATE r square
23
(a) (b)
rr rr PEC
PECPEC
PEC OPEC(𝛿𝛿)
OPEC(𝛿𝛿)
𝑟𝑟 2𝑟𝑟 𝑟𝑟2 2
OPEC(𝛿𝛿) 𝜀𝜀 𝜀𝜀 𝜀𝜀 Best
Best Methods
Best
Methods
BestMethods
Methods
== 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
25
=𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
− 𝑃𝑃𝑃𝑃𝑃𝑃
−−𝑃𝑃𝑃𝑃𝑃𝑃
𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝑃𝑃𝑃𝑃𝑃𝑃
0.283
0.283
0.379 0.080
0.080
0.080 0.993
0.2830.144 0.993
0.993 0.999
0.492 0.999
0.999
0.512 0.006
0.006
0.02 0.006 BCBC
&SUC
BC&
SUBC
&SUBC
SUBC
0.659-0.1480.434
-0.148
-0.148 0.022
0.022 0.725
0.022 0.61
0.61
0.61 0.864
0.702
0.702
0.702 0.139
0.092
0.092
0.092 PC
SUBC
SUBC
SUBC
0.305 0.093 0.825 0.867 0.042 BC & SUBC
-0.063
-0.063
0.153 0.0004
0.0004
0.00040.571
-0.0630.023 0.571
0.571
0.497 0.606
0.606
0.606
0.548 0.035
0.035
0.035
0.051 SUBC
SUBC
SUBC
SUC
0.65 0.423 0.834 0.939 0.105 BC & SUBC
-0.251
-0.251
-0.251 0.063
0.063
0.063 0.696
0.696
0.696 0.905
0.905
0.905 0.209
0.209
0.209 SUC
SUC
SUC
0.198 0.039 0.509 0.622 0.113 SUC
0.946-0.1960.895
-0.196
-0.196 0.038
0.038 0.83
0.038 0.536
0.536
0.536 0.978
0.539
0.539
0.539 0.148
0.003
0.003
0.003 PC
BCBC
BC
0.613 0.376 0.86 0.993 0.133 PC
0.968 0.937 0.76 0.883 0.123 PC
2424
24
Figure 3
1.2 1
Performance values
Performance values
1 0.8
0.8
0.6
0.6
0.4
0.4
0.2 0.2
0 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
Number of experiments Number of experiments
(a) (b)
24
DISCUSSION
86.1 percent and 79.3 percent accuracy in predicting the group
Table 20 summarises
membership. This data setthe best-performed
reveals a very strong methodpositivefor the continuous
correlation
data sets
which is 0.968. based on the minimum misclassification rate (𝜀𝜀). The results
show that Journal theof ICT,
proposed
22, No. 1methods pp: 1–30
outperformed
(January) 2023, the conventional
Tablemethods. Table 20thealso
20 summarises includes themethod
best-performed comparative for theanalysis
continuousof the
dataperformance
sets based onof the different
minimum methods with respect
misclassification rate (𝜀𝜀).toThe theresults
Pearson
correlation and theDISCUSSION
coefficient of determination 2
) values. Based
show that the(𝑟𝑟) proposed methods outperformed the(𝑟𝑟conventional
methods. Table 20 also includes the comparative analysispositive
on the results in this table, we may conclude that a weak of theor a
Based on
very the comparative
weak negative performance
correlation analysis,
is
performance of different methods with respect to the Pearson associated we observed
with the that
minimum
different methods
misclassification
correlation performed
(𝑟𝑟) and therate differently
for a continuous
coefficient based on the characteristics
data set.(𝑟𝑟We) values.
of determination 2 also observe
Basedthat
of on
thethedataresults
very sets. We
small hadtable,
𝑟𝑟 2this
in values shown wethat
maythe
correspond to SUC method
the minimum
conclude that a weakdid not overfitor arate.
misclassification
positive
thevery
data
The when the
results
weak OPEC benchmark
of the data
negative evaluation
sets in Table
correlation 20 arewas
is associated applied.
depicted
with in theWe have2.
Figure
minimum
also revealed that all the methods performed differently
misclassification rate for a continuous data set. We also observe that depending
on very
the Table
data
small types,
20 as summarised
𝑟𝑟 2 values correspondintoTables 20 and 21.
the minimum This study hasrate.
misclassification
shown that for continuous data, as reported
The results of the data sets in Table 20 are depicted in Figure in Table 20, a relatively
2.
minimum misclassification
Comparative Analysisrate could
of the BestbeMethods
associated with very weak
for Continuous Data
negative
Table 20 and weak positive correlation values. This analysis was also
applicabler to the 𝑟𝑟 2 Meanwhile, PEC for a discrete
OPEC(𝛿𝛿) 𝜀𝜀 data set (Table 21),
Best it
Methods
indicated that a Analysis
weak to of a strong positive = 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
correlation − 𝑃𝑃𝑃𝑃𝑃𝑃
is related to a
Comparative the Best Methods for Continuous Data
0.283 0.080 0.993 0.999 0.006 BC & SUBC
minimum misclassification rate. A similar performance analysis is
portrayed by 𝑟𝑟 2 0.022
r -0.148 The PEC results showed
0.61OPEC(𝛿𝛿) overfitting
0.702 𝜀𝜀 for
0.092 the discrete
Best data
Methods
SUBC
sets was more than for the continuous ones. Thus,
= 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 it can be inferred
− 𝑃𝑃𝑃𝑃𝑃𝑃
0.283
that the -0.063 0.080
proposed 0.0004
SUC 0.9930.571
and SUBC0.9990.606 0.0060.035 better
methods performed BC &than SUBC
SUBC
the
conventional
-0.148 methods (BC and PC)
-0.2510.0220.063 0.610.696 0.7020.905 for both types
0.0920.209of data sets. Based
SUBCSUC
on the comparative analysis, the proposed methods were found to be
more-0.063 -0.1960.0004
robust than0.038the 0.571 0.536 0.6060.539
conventional methods. 0.0350.003 SUBCBC
CONCLUSION
25
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
ACKNOWLEDGMENT
REFERENCES
26
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
28
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
29
Journal of ICT, 22, No. 1 (January) 2023, pp: 1–30
30