RM 2

Abstract
Feature selection plays a significant role in improving the performance of the

machine learning algorithms in terms of reducing the time to build the learning
model and increasing the accuracy in the learning process. Therefore, the
researchers pay more attention on the feature selection to enhance the
performance of the machine learning algorithms. Identifying the suitable feature
selection method is very essential for a given machine learning task with high-
dimensional data. Hence, it is required to conduct the study on the various feature
selection methods for the research community especially dedicated to develop the
suitable feature selection method for enhancing the performance of the machine
learning tasks on high-dimensional data
INTRODUCTION
The problems used by machine learning techniques usually have a lot of features.
Hence, It is difficult to find an optimal set of feature and omit redundant ones. In any
dataset, some of the features are unimportant due to the existence of redundancy
and irrelevancy of these features. Hence, considering such features is not beneficial
and typically lead to poor classification accuracy. Therefore, feature selection seeks
to enhance classification efficiency by selecting from the initial wide range of
features only a tiny subset of appropriate features. The removal of irrelevant and
redundant features will, therefore, decrease the data dimensionality, accelerate the
process of learning by simplifying the model learned as well as performance-
boosting [1], [2]. Feature selection is an NP hard problem with 2n states where n is
the number of features. The complexity of the problem is increasing dramatically as
N is growing with improvements in data collection methods
in many fields. Unlike feature selection, feature extraction approaches such as
Principal Component Analysis (PCA) [3] and Linear Discriminant Analysis (LDA) [4],
new features are created from the original features to procedure a new reduced
search space by merging or transforming the original features utilizing some
functional mapping. Hence, in this review our concern on feature selection methods.
Feature selection has two primary conflicting goals, namely, maximizing th
perfromance of classification and minimizing features number to overcome the curse
of dimensionality. The selection of features can be treated as a
multi-objective problem to balance the trade-off between these two conflicting
goals. In recent years, research has pro-gressed towards the development of feature
selection based multi-objective methods to solve these issues. Given the variety of
techniques and methods that now exist, this paper aims to critically analyse the
latest related studies to the problem focusing on research outputs from 2012 to
2019. The major databases and leading publishers such as Web of Science,
Science Direct, IEEE Xplore, Scopus, Science Direct, etc. which are used to collect the
research papers used for this review. Our goal is to provide a comprehensive review
of the latest work in the domain of multi-objective feature selection and discuss the
challenges and current issues for future work. Also, this review aims to attract
researchers working on multi-objective optimization paradigms to further study
the effective methods to address new challenges in feature selection.
FEATURE SELECTION
Feature selection is considered as an essential pr-processing step in most machine
learning methods. Features which are irrelevant or redundant can adversely affect
model performance [5]. With irrelevant data features, the exactness of the models
can be decreased and the model learns based on
irrelevant features [6]. Therefore, feature selection refers to the method of acquiring
the subset from an original feature, which selects the appropriate features in the
dataset [2]. Some of the benefits of feature selection are as follows:
I. Reducing overfitting reduces redundant data implies

decreasing opportunities for noise-based decision-
making.
II. Improving precision and decreasing misleading data
implies improving the precision of modelling.
III. Reduce training time, decrease data points, decrease
the complexity of algorithms and accelerate algorithm training.
In the process of feature selection, irrelevant and redundant features or noise in the
data may be hinder in many situations, because they are not relevant and important
with respect to the class concept such as microarray data analysis [3]. When the
number of samples is much less than the features, then machine learning gets
particularly difficult, because the search space will be sparsely populated. Therefore,
the model will not able to differentiate accurately between noise and relevant data
[4]. There are two major approaches to feature selection. The first is Individual
Evaluation, and the second is Subset Evaluation. Ranking of the features is known as
Individual Evaluation [5]. In Individual Evaluation, the weight of an individual feature
is assigned according to its degree of relevance. In Subset Evaluation, candidate
feature subsets are constructed using search strategy. The general procedure for
feature selection has four key steps as shown in Figure 1.  Subset Generation 
Evaluation of Subset  Stopping Criteria  Result Validation Subset generation is a
heuristic search in which each state specifies a candidate subset for evaluation in the
search space. Two basic issues determine the nature of the subset generation
process. First, successor generation decides the search starting point, which
influences the search direction. To decide the search starting points at each state,
forward, backward, compound, weighting, and random methods may be considered
[7]. Second, search organization is responsible for the feature selection process with
a specific strategy, such as sequential search, exponential search [9, 10] or random
search [11]. A newly generated subset must be evaluated by a certain evaluation
criteria. Therefore, many evaluation criteria have been proposed in the literature to
determine the goodness of the candidate subset of the features. Base on their
dependency on mining algorithms, evaluation criteria can be categorized into
groups: independent and dependent criteria [8]. Independent criteria exploit the
essential characteristics of the training data without involving any mining algorithms
to evaluate the goodness of a feature set or feature. And dependent criteria involve
predetermined mining algorithms for feature selection to select features based on
the performance of the mining algorithm applied to the selected subset of features.
Finally, to stop the selection process, stop criteria must be determined. Feature
selection process stops at validation procedure. It is not the part of feature selection
process, but feature selection method must be validate by carrying out different
tests and comparisons with previously established results or comparison with the
results of competing methods using artificial datasets, real world datasets, or both.
Figure-1
The relationship between the inductive learning method and feature selection
algorithm infers a model. There are three general approaches for feature selection.
First, the Filter Approach exploits the general characteristics of training data with
independent of the mining algorithm [6]. Second, the Wrapper Approach explores
the relationship between relevance and optimal feature subset selection. It searches
for an optimal feature subset adapted to the specific mining algorithm [12]. And
third, the Embedded Approach is done with a specific learning algorithm that
performs feature selection in the process of training.
Feature Selection: A literature Review
Many, feature selection methods have been proposed in the literature, and their
comparative study is a very difficult task. Without knowing the relevant features in
advance of the real data set, it is very difficult to find out the effectiveness of the
feature selection methods, because data sets may include many challenges such as
the huge number of irrelevant and redundant features, noisy data, and high
dimensionality in term of features or samples. Therefore, the performance of the
feature selection method relies on the performance of the learning method. There
are many performance measures mentioned in the literature such as accuracy,
computer resources, ratio of feature selection, etc. Most researchers agree that
there is no so-called “best method” [6]. Therefore, the new feature selection
methods are constantly increasing to tackle the specific problem (as mentioned
above) with different strategies.  To ensure a better behavior of feature selection
using an ensemble method [84, 85]  Combining with other techniques such as tree
ensemble [86], and feature extraction [87]  Reinterpreting existing algorithms [88,
89]  Creating a new method to deal with still-unresolved problems [90, 91]  To
combine several feature selection methods [92, 93] Many comparative studies of
existing feature selection methods have been done in the literature, for example, an
experimental study of eight filter methods (using mutual information) is used in 33
datasets [94], and for the text classification problem, 12 feature selection methods
are compared [95]. The capability of the survival ReliefF algorithm (sReliefF) and
tuned sReliefF approach are evaluated in [96]. Seven filters, two embedded
methods, and two wrappers are applied in 11 synthetic datasets (tested by four
classifiers), which are used for comparative study of feature selection performances
in the presence of irrelevant features, noise in the data, redundancy, and the small
ratio between the number of attributes and samples [6]. Related to the high-
dimensional dataset (in both samples and attributes), the performance of feature
selection methods are studied for the multiple-class problem [90, 97, 98, 99]. In a
theoretical perspective, guidelines to select feature selection algorithms are
presented, where algorithms are categorized based on three perspectives, namely
search organization, evaluation criteria, and data mining tasks. In [2],
characterizations of feature selection algorithms are presented with their definitions
of feature relevance. In the application perspective, many real-world applications
like intrusion detection [100, 101], text categorization [95,102, 109, 110], DNA
microarray analysis [103], music information retrieval [104], image retrieval [105],
information retrieval [106], customer relationship management [107], Genomic
analysis [83, 103] and remote sensing [108] are considered.
LIMITATIONS:
Despite the applicability, achievements and prospects of current multi-objective
feature selection techniques, significant problems and difficulties remain and will be
discussed.
1) HIGH SCALABILITY OF DATA

The most demanding problem is that the size of the data is getting extremely large
because of the trend of big data [30]. In the past years, the selection of features
from a dataset with 30 features was considered as large-scale/ high-dimensional
feature selection [31]. Yet, the features number in several fields today can simply
reach to even millions of features. This expands the computational expense and
requires a powerful search mechanism, but these two aspects also have their
problems, so the issue cannot only be resolved by maximizing the computing power.
Developing new approaches and techniques will be a necessity.
2) THE COST OF COMPUTATION
Multi-objective feature selection methods involve a large number of evaluations
which lead to a serious problem of being computationally expensive.
Wrapper methods are claimed to be less effective than filter methods, nonetheless,
experiments have demonstrated the opposite in most cases [69]. A rough set theory
[9], [70] is a filter type which needs more time than a simple wrapper one. Another
example of the filter method is mutual information [71], [72], known to take less
time for execution, but the accuracy performance generally takes more than the
majority of wrapper methods. Thus, proposing an effective measure to multi-
objective feature selection problem is still an open topic to be investigated. To do so,
two primary factors, must be regarded to decrease computational costs
which are: 1) an effective search technique 2) and a rapid evaluation measure [1].
3) SEARCH MECHANISMS
Selection of features is an NP-hard issue with large complicated solution spaces [73],
[74]. Thus, a robust global search algorithm is needed, and recent multi-objective
methods still can enhance further. The enhanced search mechanism should
be able to search the entire search space and should leverage local areas if
necessary. Also, the search mechanisms may comprise local search, hybridization of
search mechanisms of different multi-objective techniques, hybridization of
conventional approaches and multi-objective [75], surrogate approaches [76], etc.
4) EVALUATION MEASURES
One of the main factors in multi-objective feature selection is the evaluation metrics
that shapes the fitness function. It greatly affects the time of computation, the
performance of the classification and the search space. A large portion of the
execution time is consumed on the assessment procedure of the wrapper strategy
and several filter strategies [64]. Although there are some current rapid assessment
measures, mutual information is an example [71], [77], [78] they assess features
independently instead of groups of features
RESEARCH PROBLEM:
RESEARCH QUESTION:
The following research questions are formulated:
1) What are the search techniques that have been used to select the optimal
features?
2) What are the evaluation measures that have been used for evaluating the
selected features?
3) Identify how many objectives used in the idea of multi-objective?

RM 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RM 2

Uploaded by

Copyright:

Available Formats

Abstract

Feature selection plays a significant role in improving the performance of the

I. Reducing overﬁtting reduces redundant data implies

Feature Selection: A literature Review

1) HIGH SCALABILITY OF DATA

You might also like