Feature 4election For $lassification 6sing 1article 4warm 0ptimization

IEEE EUROCON 2017, 6-8 JULY 2017, OHRID, R.
MACEDONIA
Feature 4election for $lassification 6sing

1article 4warm 0ptimization
LucijaBrezočnik
Faculty of Electrical Engineering and Computer Science
University of Maribor, Slovenia
lucija.brezocnik@um.si
Abstract—This paper proposes a method for the problem of decreased dimensionality of the data and, therefore, reduced
processing high-dimensional data. When one has thousands of computational time and the elimination of unnecessary fea-
features (attributes) in a dataset, it is hard to achieve an efficient tures.
feature selection. To cope with this problem, we propose the
use of a binary particle swarm optimization algorithm combined This paper is divided into eight sections. The next section
with the C4.5 as a classifier in the fitness function for the presents the curse of dimensionality, Section III introduces
selection of informative attributes. The results obtained on 11 related work. The proposed method is described in detail in
datasets were analyzed statistically and reveal that the proposed section IV. Section V comprises the experiment planning,
method, called BPSO+C4.5, outperforms known classifiers, i.e., execution and presentation of the obtained results. The sta-
C4.5, Naive Bayes, and SVM.
tistical analysis in section VI is intended for the statistical
Keywords—swarm intelligence, particle swarm optimization, processing of the results, in which we compare our method
feature selection, classification with established classification methods. In section VII the
results of the proposed method are compared with the results
I. I NTRODUCTION of other researchers that address the same problem. Finally,
section VIII summarizes this paper and provides plans for
We live in an era of big data, where data explosion has
further work.
become ubiquitous. The reason for this is due primarily
to increasingly easier data acquisition with automated data II. C URSE OF DIMENSIONALITY
collection tools and databases/repositories with practically no High-dimensional data is a serious problem for existing data
space limits. Because of that, researchers deal with data which mining and machine learning algorithms because of the so-
is rich, but information poor. Such data consists typically called curse of dimensionality [1]. This refers to the known
of a vast number of features (attributes). One might expect phenomenon that data in high-dimensional space become
that the analysis of such high-dimensional data constitutes an sparse. The presence of a vast number of features also affects
advantage but, in practice, it turns out that this is far from learning models, which tend to overfit and, therefore, can cause
the case. The manual identification and exhaustive search for performance degradation on unseen data. For example, in a
an optimal subset of informative features in a given dataset is large search space, there is a total of 2n (n represents the
nearly impossible. number of features) feasible solutions (feature subsets) [2].
Due to the continuous growth of databases and capacity It means that a dataset with 10,000 features per instance has
of storage devices, data processing is struggling to keep 210000 solutions. With each additional feature, the complexity
up with data acquisition. Supervised Machine Learning is of a problem doubles. To solve the issue of the curse of
required in most classification problems from the real world. dimensionality, authors have begun studying techniques for
The association between instances and classes is well known, dimensionality reduction.
but we do not have information about which attributes of Dimensionality reduction comprises two different tech-
instances are necessary and which are not. Thus, for a better niques, i.e. feature selection and feature extraction. The result
presentation of a given domain more attributes are used, which of the latter is a new feature space with low dimensionality,
results in the presence of irrelevant and redundant attributes unlike the subset of relevant features in the feature selection.
of the target concept. Therefore, their elimination is essential. The proposed method uses feature selection, since a further
The irrelevant attribute has no direct connection with the analysis in feature extraction is very problematic, as it builds
target concept but has an impact on the learning process of a set of new features and loses the physical meanings of the
a classifier, and the redundant attribute does not add any original features.
additional value to the target concept. Because of the high Feature selection from a set of features X selects subset
dimensionality of data, the construction of a suitable classifier A ⊆ X via specific optimization techniques. It maintains
based on a whole set of features is almost impossible. the physical meanings of features, which usually leads to
The solution for the high-dimensional data problem is better classification accuracy, lower complexity and improved
currently tackled with artificial intelligence methods. The comprehensibility for resulting classifier models.
proposed method BPSO+C4.5 falls within the scope of swarm The authors in [4], based on their review of feature relevance
intelligence and extends it with feature selection. The use in related articles, classify features into three disjointed cat-
of the latter has many advantages, primarily improved clas- egories, i.e. strongly relevant, weakly relevant, and irrelevant
sification accuracy within high-dimensional data problems, features (Fig. 1).
966
978-1-5090-3843-5/17/$31.00 2017
c European Union
IEEE EUROCON 2017, 6-8 JULY 2017, OHRID, R. MACEDONIA
algorithm based on particle swarm optimization was proposed

in the hybrid PSO by Yue et al. [12] and its performance
evaluated by a genetic algorithm. BPSO, combined with Prob-
abilistic Neural Network (PNN) as a classifier, was presented
by Selvaraj and Janakiraman [13]. Hybrid PSO and the tabu
search (TS) algorithm was proposed by Chuang et al. [14].
Selected features are usually evaluated by the classification
Fig. 1: Strongly relevant, weakly relevant, and irrelevant
methods SVM [15], [16] and KNN [17], [18].
feature [3]
Some other swarm intelligence algorithms were also used
for feature selection, such as Artificial Bee Colony (ABC) with
Feature Fi is strongly relevant iff a rough set proposed by Suguna and Thanushkodi in [19], the
bat algorithm (BA) [20], and artificial immune system (AIS)
P(C|Fi , Si ) = P(C|Si ); Si = F \ {Fi }, (1) [21].
where
C class label, IV. P ROPOSED METHOD BPSO+C4.5
F full set of features, The proposed method named BPSO+C4.5 is based on the
Fi selected feature, binary version of PSO [9]. The main difference between BPSO
P(C|Fi , Si ) probability distribution of different classes and PSO is noticeable in the particle movement in a search
given the feature Fi and given the features space that is binary. Therefore, only binary values 0 or 1 are
in Si , assigned to the position vectors of particles. The BPSO+C4.5
P(C|Si ) probability distribution of different classes is presented in Algorithm 1.
given the features in Si . The first step of the proposed method is initialization using
A strongly relevant feature indicates that it must always be a population of a user defined number of particles. Each
in the optimal subset of features. It carries information about particle presents a feasible solution (a subset of solutions,
the class, which is not repeated in any other feature, so that i.e. informative features) in an n-dimensional search space.
its removal from the optimal subset will result in the loss of For example, the particle in 5-dimensional space, e.g. 10001,
the ability to distinguish between classes. represents the solution in which only the first and the last
Attribute Fi is weakly relevant iff feature is selected.
Through iterations, particles update themselves by tracking
P(C|Fi , Si ) = P(C|Si ) ∧ ∃Si ⊂ Si :
(2) two extremes. The first extreme is known as individual extreme
P(C|Fi , Si ) = P(C|Si ). and represents the best position of each particle found so
Weak relevance means that the feature does not necessarily far. Individual extreme of the i-th particle is pBesti =
always have to be selected in an optimal subset, but in certain (pBest1i , pBest2i , . . . , pBestni ). The second extreme repre-
circumstances may become important, e.g. in combination sents the best solution found so far in the entire swarm,
with any other feature. gBest = (gBest1 , gBest2 , . . . , gBestn ).
Attribute Fi is irrelevant iff The movement of particles in a swarm is influenced by
velocity and position vectors. The position vector of the i-
∀Si ⊆ Si , P(C|Fi , Si ) = P(C|Si ). (3) th particle is Xi = [x1i , x2i , . . . , xni ] and its velocity vector
The irrelevant attribute is the one that can be omitted Vi = [vi1 , vi2 , . . . , vin ], where xdi ∈ {0, 1}, d = 1, 2, . . . , n
without hesitation from the optimal subset, because it does (n represents the number of features) in i = 1, 2, . . . , m
not affect the ability of classification in any way. (m represents the number of particles). The meaning of the
xdi component of position vector of the i-th particle is the
III. R ELATED WORK following: If xdi = 1, the d-th feature of the i-th particle is
The problem of feature selection can be tackled with various selected, otherwise, if xdi = 0, the feature is not selected.
approaches including, more recently, with swarm intelligence Velocity is adjusted based on the equation in lines 18
techniques. This section focuses only on articles that cover – 26 in pseudo code (Algorithm 1). It includes inertia, as
feature selection based on some swarm intelligence algorithm. well as a cognitive and social component, respectively. The
Ant colony optimization (ACO) is often merged with rough inertia component is responsible for controlling the velocity
set theory [5], [6] for feature selection. Classification perfor- of each particle. The cognitive component with the cognitive
mance is used as a fitness function in [7] and [8]. Possibly coefficient c1 can be represented as particle memory, because
the biggest disadvantage of ACO during subset construction it tends to direct particles to their individual best-known
is not considering the probabilistic behaviour of ants. positions found so far, and the last, social component, via
In the PSO area, both BPSO [9] and PSO [10] have been social coefficient c2 , tends to direct particles to the globally
used for feature selection. The main difference between the best-known position of the swarm. In the case of c1 > c2 ,
above-mentioned approaches is in the bit-string presentation. the searching behavior gravitates to the individual extremes of
The number 1 and numbers larger than the threshold θ particles, otherwise if c1 < c2 , the searching behavior gravi-
represent the selected feature in BPSO and PSO, respectively. tates to the global extreme of the swarm. Position adjustment
Lin et al. [11] combined PSO with a linear discriminant is based on an equation in lines 27 – 31 in the pseudo code
analysis (LDA) as a classification function. A rough set reduct (Algorithm 1) and is presented in detail in subsection IV-A.
967
Fig. 2: The general scheme of the proposed method BPSO+C4.5
A. Sigmoid function Algorithm 1 Proposed method BPSO+C4.5

new
For position adjustment the particle velocity vid is first Input: c1 , c2 , vmax , vmin , ω, maximum iterations
converted into the value in the range [0, 1] using Sigmoid Output: Global best particle gBest (feature subset) and
function (Fig. 3) based on (4) its fitness value f
1 initialization of population();
sigmoid(v id new ) = new , (4) 1:
1 + e−vid 2: while maximum iterations do
where 3: for i = 1 to number of particles do
e is the base of the natural logarithm. 4: f = fitness function of a particle;
5: fs = features selected in a particle;
The value obtained is then compared with uniformly dis-
6: if f (Xi ) > f (pBesti ) then
tributed randomly generated value between 0 and 1 (function
7: pBesti = Xi ; update personal best
U (0, 1)). A decision on the new position of the particle xid is
8: end if
now probabilistic. That means that the bigger the probability
new 9: if f (Xi ) > f (gBest) then
vid , the bigger the value of the Sigmoid function, therefore,
10: gBest = Xi ; update global best
the probability of the value 1 assigned to the xnew
id increases.
11: end if
new
If vid new
increases, the function sigmoid(vid ) will be limited
new 12: if f (Xi ) = f (gBest) and fs(Xi ) < fs(gBest) then
to 1. For example, if vid > 6, the probability of xnew is
new
id 13: gBest = Xi ; update global best
almost 1, but not exactly 1. Thus, for vid = 6 the probability
14: end if
xnew
id = 1 is 0.998, and the probability of xnew
id = 0 is 0.002.
15: end for
16: for i = 1 to number of particles do
1.0 17: for d = 1 to number of features do
new
sigmoid(vid )
18: vid new = ω× vid old +
0.8 19: c1 r1 (pBestid old − xid old ) +
20: c2 r2 (gBestd old − xid old )
new
0.6 21: if vid > vmax then
22: vid new = vmax
0.4 23: end if
24: if vid new < vmin then
25: vid new = vmin
0.2 26: end if
new
vid 27: if sigmoid(v id new ) > U (0, 1) then
28: xid new = 1
−6.0 −4.0 −2.0 2.0 4.0 6.0 29: else
Fig. 3: Sigmoid function 30: xid new = 0
31: end if
32: end for
B. Fitness function 33: end for
The goal of the fitness function is the assessment of each 34: end while
particle’s quality. Fig. 2 presents a simplified diagram of
the proposed method with the exposed fitness function. The
input to the fitness function represents a particle with selected method was selected because it is a “white box” method and,
features (features marked with 1). The classifier is then built therefore, easy to interpret. The output of the C4.5 classifica-
based on the selected features. The output of the fitness tion method is a decision tree that is easily undestandable by
function is classification accuracy (5) of a classifier, built with the expert who validates the obtained results.
the C4.5 algorithm using a subset of features, defined by a In the case where the fitness function value of the i-th
specific particle: particle equals the fitness function value of gBest, a second
number of correctly classified instances evaluation takes place. In this evaluation, we compare the
accuracy = . number of selected features in the i-th particle with the number
number of instances
(5) of selected features in gBest. If the i-th particle has fewer
Classification accuracy is one of the most often used metrics selected features than gBest, then gBest is replaced with the
for classification method assessment. The C4.5 classification i-th particle.
968
TABLE I: Experiment results – classification accuracy and feature reduction rate

Datasets C4.5 NB SVM BPSO+C4.5 Improved accuracy Feature reduction rate
(%) (%) (%) (%) (%) (%)
1 Primary Tumor 42.79 ± 3.60 49.56 ± 2.53 41.00 ± 3.77 52.45 ± 1.43 9.66 45.65
2 Ionosphere 90.87 ± 2.62 82.60 ± 6.50 92.29 ± 3.87 97.71 ± 1.30 6.84 70.71
3 Soybean 91.22 ± 3.33 92.68 ± 2.47 88.15 ± 3.22 96.69 ± 1.19 5.47 54.29
4 Movement libras 65.83 ± 4.24 61.39 ± 4.54 29.16 ± 8.34 82.56 ± 5.70 16.73 58.67
5 SRBCT 79.70 ± 11.96 98,75 ± 2.80 97.57 ± 3.32 100 ± 0 20.30 55.04
6 Leukemia1 94.48 ± 7.57 94.48 ± 3.10 52.60 ± 2.96 100 ± 0 5.52 53.63
7 DLBCL 72.75 ± 12.04 82.00 ± 11.39 75.33 ± 2.74 99.23 ± 1.13 26.48 53.02
8 CNS 45.00 ± 13.95 65.00 ± 18.07 65.00 ± 3.73 96.00 ± 6.52 51.00 52.19
9 Brain Tumor2 56.00 ± 15.17 72.00 ± 8.37 30.00 ± 0 94.00 ± 5.48 38.00 52.17
10 Prostate Tumor 80.28 ± 7.33 61.67 ± 10.00 50.95 ± 1.30 98.21 ± 2.17 17.93 52.30
11 Leukemia2 88.76 ± 8.07 94.38 ± 3.15 38.86 ± 3.10 92.86 ± 3.48 8.74 52.25
V. E XPERIMENT 2. The lower (vmin ) and the upper (vmax ) bounds of velocity
The experimental approach was used in order to evaluate were set to −4 and 4, respectively. The value 0.8 was assigned
the performance of the proposed method. We did not use any to the inertia weight (ω). The number of iterations and the
preprocessing on selected datasets in order to demonstrate the number of particles in the swarm was set at 100 and 200,
power of the proposed method. respectively.
In the research model, five latent variables were defined, TABLE II: Used datasets
of which three are exogenous (independent), namely: Classifi-
cation techniques, Evolutionary techniques, and Evolutionary Number of
Datasets Domain
parameters, and two endogenous (dependent), namely, Perfor- Features Classes Samples
mance of classification and Complexity of the model. Indicators 1 Primary Tumor Medicine 18 22 339
for each one of them were defined since the latent variable is 2 Ionosphere Physics 35 2 351
an abstract idea, which cannot be measured directly. 3 Soybean Biology 36 19 683
4 Movement-libras Medicine 91 15 360
The first latent exogenous variable, called Classification 5 SRBCT Medicine 2309 4 83
techniques, covers classification methods C4.5, Naive Bayes, 6 Leukemia1 Medicine 5328 3 72
SVM, and the proposed method BPSO+C4.5. Evolutionary 7 DLBCL Medicine 5470 2 77
8 CNS Medicine 7130 2 60
techniques are the second latent exogenous variable of which 9 Brain Tumor2 Medicine 10368 4 50
only the BPSO is used. The last exogenous variable is Evo- 10 Prostate Tumor Medicine 10510 2 102
lutionary parameters. In the fitness function, we used the 11 Leukemia2 Medicine 11226 3 72
classification method C4.5.
The endogenous variable Performance of classification has B. Results
indicators AUC, F-measure, and accuracy. All three indicators From Table I it is evident that the proposed method
take up values in the interval [0, 1], but the accuracy is usually BPSO+C4.5 is superior to all the compared classification
expressed as a percentage. The last latent endogenous variable methods. The most significant classification accuracy improve-
is Complexity of the model, which measures the number of ment regarding the C4.5 classification method was achieved
features. on the CNS dataset, i.e. 51.00%. On two datasets (SRBCT
A. Used datasets and proposed method settings and Leukemia1) the proposed method managed to improve
classification accuracy to 100%.
We tested our method by using 11 different datasets, as In all cases, on average, less than half of the features
shown in Table II. The meaning of the columns in Table II is were selected from the original datasets. The most significant
as follows: Sequence number, name, and domain of the dataset, feature elimination was noticeable in the Ionosphere dataset,
the number of features, classes, and samples. All datasets are i.e. 70.71%.
freely available on the Internet. We downloaded them from the The proposed method was also compared to all classification
UCI Machine Repository [22] and the web page of Plymouth methods using F-measure and AUC (Table III). The results of
University [23]. the F-measure of our proposed method are better in all 11
Before running the proposed method, a cross-validation datasetes. The results of AUC of our proposed method are
process was carried out on the dataset. The separation of the better in eight datasets, in the remaining three AUC results, the
proposed method and cross-validation process was used to NB classification method is slightly better (avg. 0.029). The
obtain as much autonomy of the proposed method as possible. best possible result of both metrics, i.e. 1, was obtained with
Firstly, a 5-fold cross-validation was used to divide initial data the proposed method on three datasets SRBCT, Leukemia1,
into 5 equal stratified parts. The classification model was then and DLBCL.
built on training sets (4 parts), in which only selected features
are used, and then tested on the testing set (1 part). VI. S TATISTICAL ANALYSIS
In the experiment, the following settings were used. The The purpose of the statistical analysis was to prove if the
social and cognitive coefficients c1 and c2 were both set to proposed method is statistically significantly better than other
969
Fig. 4: Ranks of C4.5, NB, SVM, and BPSO+C4.5
TABLE III: Experiment results – F-measure and AUC of demonstrates the results of a Friedman’s Two-way Analysis
BPSO+C4.5 of Variance by Ranks. The proposed method BPSO+C4.5
Datasets F-measure AUC (mean rank = 3.86) attains the highest rank, followed by
1 Primary Tumor 0.454 0.788 classification methods Naive Bayes (mean rank = 2.45), C4.5
2 Ionosphere 0.980 0.982 (mean rank = 2.15), and SVM (mean rank = 1.55).
3 Soybean 0.969 0.991
4 Movement-libras 0.823 0.917
5 SRBCT 1 1
6 Leukemia1 1 1
7 DLBCL 1 1
8 CNS 0.949 0.938
9 Brain Tumor2 0.938 0.969
10 Prostate Tumor 0.990 0.990
11 Leukemia2 0.973 0.981
established classifier methods, i. e., C4.5, NB, and SVM. All

of the used statistical tests were conducted in SPSS software.
For the probability of rejecting the null hypothesis when it is
true, we used a significance level of 0.05 (α = 0.05). The
letter p denotes a statistical significance level of our findings.
To select the appropriate statistical tests, the Shapiro-Wilk
test of normality was conducted first. The results of the test Fig. 5: Box plot of algorithms classification accuracy
show that the tested classification methods and the proposed
method results do not follow a normal distribution. That is TABLE IV: Wilcoxon signed-rank test with Holm-Bonferroni
why we continued with nonparametric tests. The Friedman correction
test is a non-parametric version of ANOVA, and it ranks all of
the tested algorithms. Its null hypothesis is rejected because C4.5 – NB – SVN –
BPSO+C4.5 BPSO+C4.5 BPSO+C4.5
of p < 0.001, i.e. there is no difference between the tested Z -6.215 -5.969 -6.276
algorithms. p rounded 0.000 0.000 0.000
The box plot of algorithms and their classificatin accuracy p with Holm-
0.000*** 0.000*** 0.000***
is presented in Fig. 5. Algorithms sorted by their medians Bonferroni correction
are as follows: BPSO+C4.5 (median = 97.50), Naive Bayes
(median = 80.00), C4.5 (median = 80.00), and SVM (median VII. C OMPARISON OF THE RESULTS
= 53.33). The proposed method BPSO+C4.5 not only has the
The results obtained by the proposed method were com-
smallest range, but also the smallest interquartile range.
pared with the results published in the article [14]. The authors
A post-hoc analysis for determining if our proposed method in the latter combined BPSO with Tabu Search (from now on
performs better than other classification methods has been referred to as a BPSO+TS) and presented only the result of
conducted because of the demonstrated statistically significant the best classification accuracy on each dataset and the particle
difference. The Wilcoxon signed-rank test was used for an with the smallest number of selected features.
investigation of whether our proposed method performs better Table V demonstrates that the best values of classification
than other tested algorithms (classification methods) on all accuracy obtained by our proposed method BPSO+C4.5 are
datasets. The Holm-Bonferroni correction was included be- equal or better than the best values of BPSO+TS. In the
cause of the multiple comparisons problem. Table IV demon- comparison of features selected in Table VI, our proposed
strates the results of the Wilcoxon signed-rank test with a method outperforms the BPSO+TS method in both mean and
Holm-Bonferroni correction. Our proposed algorithm is sta- the best results.
tistically significantly better than classification methods C4.5
(p < 0.001), Naive Bayes (p < 0.001), and SVM (p < 0.001). VIII. C ONCLUSION
By proven statistically significant differences, the algo- The proposed method BPSO+C4.5 was implemented suc-
rithms can be sorted according to their performance. Fig. 4 cessfully and offers a solution for the well-known problem of
970
TABLE V: Classification accuracy of BPSO+C4.5 and R EFERENCES

BPSO+TS
[1] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning: Data Mining, Inference, and Prediction, 2nd ed. New York,
Datasets Classification accuracy
USA: Springer-Verlag, 2013.
BPSO+C4.5 BPSO+TS [2] I. Guyon and A. Elisseeff, “An introduction to variable and feature
[mean] [best] [best] selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, March 2003.
[3] J. Li et al., “Feature Selection: A Data Perspective,” CoRR, vol.
1 SRBCT 100 100 100 abs/1601.07996, 2016, pp. 1–75.
2 Leukemia1 100 100 100 [4] L. Yu and H. Liu, “Efficient Feature Selection via Analysis of Relevance
3 DLBCL 99.23 100 100 and Redundancy,” J. Mach. Learn. Res., vol. 5, pp. 1205–1224, December
4 Brain Tumor2 94.00 100 92.65 2004.
5 Prostate Tumor 98.21 100 95.45 [5] M. Majdi and E. Derar, “Ant Colony Optimization based Feature Selection
6 Leukemia2 92.86 100 100 in Rough Set Theory,” Int. J. Comput. Sci. Electron. Eng. (IJCSEE), vol.
1, no. 2, pp. 244–247, 2013.
TABLE VI: Features selected by BPSO+C4.5 and BPSO+TS [6] Y. Chen, D. Miao, and R. Wang, “A rough set approach to feature
selection based on ant colony optimization,” Pattern Recognit. Lett., vol.
31, no. 3, pp. 226–233, February 2010.
Datasets Features selected [7] S. Kashef and H. Nezamabadi-Pour, “A new feature selection algorithm
BPSO+C4.5 BPSO+TS based on binary ant colony optimization,” in Proc. 5th Conf. Inf. Knowl.
[mean] [best] [best] Technol. (IKT), Shiraz, Iran, 2013, pp. 50–54.
[8] X. Zhao et al., “Feature selection based on improved ant colony optimiza-
1 SRBCT 1037.76 1007 1084 tion for online detection of foreign fiber in cotton,” Appl. Soft Comput.,
2 Leukemia1 2470.16 2434 2577 vol. 24, pp. 585–596, November 2014.
3 DLBCL 2569.48 2508 2671 [9] J. Kennedy and R. Eberhart, “A discrete binary version of the particle
4 Brain Tumor2 4958.60 4847 5086 swarm algorithm,” in Proc. IEEE Int. Conf. Syst. Man Cybern., vol. 5,
5 Prostate Tumor 5012.92 4923 5320 Orlando, FL, USA, 1997, pp. 4104–4108.
6 Leukemia2 5359.92 5305 5609 [10] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proc.
IEEE Int. Conf. Neural Networks, vol. 4, Perth, WA, Australia, 1995, pp.
1942–1948.
[11] S.-W. Lin and S.-C. Chen, “PSOLDA: A particle swarm optimization
classifying high-dimensional data. The results were obtained approach for enhancing classification accuracy rate of linear discriminant
in the form of an experimental research method and were analysis,” Appl. Soft Comput., vol. 9, no. 3, pp. 1008–1015, June 2009.
[12] B. Yue, W. Yao, A. Abraham, and H. Liu, “A New Rough Set Reduct
analyzed statistically. Unlike genetic algorithms, the proposed Algorithm Based on Particle Swarm Optimization,” in Proc. Bio-inspired
method requires neither a large number of iterations, nor large Modeling of Cognitive Tasks: Second International Work-Conference on
initial population of particles. the Interplay Between Natural and Artificial Computation, IWINAC 2007,
La Manga del Mar Menor, Spain, June, 2007, pp. 397–406.
The BPSO+C4.5 method performs well in both the low and [13] G. Selvaraj and S. Janakiraman, “Improved Feature Selection Based on
high-dimensional data. The use of C4.5 in fitness function Particle Swarm Optimization for Liver Disease Diagnosis,” in Swarm,
was selected based on the high classification accuracy of the Evolutionary, and Memetic Computing: 4th International Conference,
SEMCCO 2013, Chennai, India, 2013, pp. 214–225.
classification method and its presentation of the results. The [14] L.-Y. Chuang, C.-H. Yang, and C.-H. Yang, “Tabu Search and Binary
resulting structure of the decision tree is simple and compre- Particle Swarm Optimization for Feature Selection Using Microarray
hensible because of the removed irrelevant features. Therefore, Data,” J. Comput. Biol., vol. 16, no. 12, pp. 1689–1703, December 2009.
[15] C. Zhang and H. Hu, “Using PSO algorithm to evolve an optimum input
it offers a better understanding of the relationship between subset for a SVM in time series forecasting,” in Proc. IEEE Int. Conf.
features. Because of that, the results can be interpreted easily Syst. Man Cybern. (SMC), Waikoloa Village, HI, USA, 2005, vol. 4, pp.
by a specialist in the dataset domain. 3793–3796.
[16] E. K. Tang, P. N. Suganthan, and X. Yao, “Feature selection for microar-
The experiment was carried out to evaluate the performance ray data using least squares SVM and particle swarm optimization,” in
of the proposed method and to compare its results with Proc. IEEE Symp. Comput. Intell. Bioinformat. Comput. Biol. (CIBCB),
the results obtained from classification methods C4.5, Naive La Jolla, CA, USA, 2005, pp. 1–8.
[17] B. Yue, M. Zhang, and W. N. Browne, “New fitness functions in binary
Bayes, and SVM. The results show that our proposed method particle swarm optimisation for feature selection,” in Proc. IEEE Congr.
outperforms one of the most frequently used and established Evol. Comput. (CEC), Brisbane, QLD, Australia, 2012, pp. 1–8.
classification methods, while using, on average, less than half [18] B. Yue, M. Zhang, and W. N. Browne, “Novel initialisation and
updating mechanisms in PSO for feature selection in classification,” in
of the original features. We implemented the BPSO+C4.5 Applications of Evolutionary Computation (LNCS 7835), Heidelberg,
method in such a way that it is possible to change the fitness Germany: Springer, 2013, pp. 428–438.
function and propose hybrid fitness functions easily, or even [19] N. Suguna and K. Thanushkodi, “A Novel Rough Set Reduct Algorithm
for Medical Domain Based on Bee Colony,” J. of Comput., vol. 2, no. 6,
modify the classification method used in the fitness function. pp. 49–54, June 2010.
There are plenty of possibilities for further work because [20] R. Y. M. Nakamura et al., “BBA: A Binary Bat Algorithm for Feature
of the statistically significant proven difference between the Selection,” in 25th SIBGRAPI Conf. Graph. Patterns Images, Ouro Preto,
Brazil, 2012, pp. 291–297.
proposed method and tested classification methods. Firstly, the [21] S.-W. Lin and S.-C. Chen, “Parameter tuning, feature selection and
use of preprocessing to lower data dimensionality in advance weight assignment of features for case-based reasoning by artificial
will be carried out. The main motivation is in interdisciplinary immune system,” Appl. Soft Comput., vol. 11, no. 8, pp. 5042–5052,
December 2011.
research with experts in genetics to find gene subsets from data [22] M. Lichman, “UCI Machine Learning Repository,” 2013. [Online].
for some specific disease, e.g. cancer classification. Available: http://archive.ics.uci.edu/ml. [Accessed: Nov. 20, 2016].
[23] Plymouth University, “Microarray Cancers.” [Online]. Available:
http://www.tech.plym.ac.uk/spmc/links/bioinformatics/microarray/micro-
ACKNOWLEDGMENT array cancers.html [Accessed: Nov. 20, 2016].
I thank the mentor, Full Prof. Dr. Vili Podgorelec who
inspired me with research on this interesting area and for all
his helpful tips.
971

Feature 4election For $lassification 6sing 1article 4warm 0ptimization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Feature 4election For $lassification 6sing 1article 4warm 0ptimization

Uploaded by

Copyright:

Available Formats

IEEE EUROCON 2017, 6-8 JULY 2017, OHRID, R.

Feature 4election for $lassification 6sing

algorithm based on particle swarm optimization was proposed

Fig. 2: The general scheme of the proposed method BPSO+C4.5

A. Sigmoid function Algorithm 1 Proposed method BPSO+C4.5

TABLE I: Experiment results – classiﬁcation accuracy and feature reduction rate

Fig. 4: Ranks of C4.5, NB, SVM, and BPSO+C4.5

established classiﬁer methods, i. e., C4.5, NB, and SVM. All

TABLE V: Classiﬁcation accuracy of BPSO+C4.5 and R EFERENCES

You might also like