You are on page 1of 16

Knowledge-Based Systems 250 (2022) 109092

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

A noise-aware fuzzy rough set approach for feature selection



Xiaoling Yang a,b , Hongmei Chen a,b , , Tianrui Li a,b , Chuan Luo c
a
School of Computing and Artificial Intelligence, Southwest Jiaotong University, China
b
National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, China
c
College of Computer Science, Sichuan University, China

article info a b s t r a c t

Article history: Feature selection has aroused extensive attention and aims at selecting features that are highly relevant
Received 22 December 2021 to classification from raw datasets to improve the performance of a learning model. Fuzzy rough set
Received in revised form 16 May 2022 theory is a powerful mathematical method for feature selection. The classical fuzzy rough set model
Accepted 17 May 2022
is very sensitive to the noise while the noise samples in classification data often appear. In addition,
Available online 26 May 2022
fuzzy rough set theory does not fit well when the density distribution of the samples in the dataset
Keywords: varies greatly. Thus, it is of great significance to improve the robustness of fuzzy rough set models
Fuzzy rough set and its adaptability to data for feature selection. Inspired by these issues, we focus on the robust
Robust feature selection fuzzy rough set approach for feature selection. We first propose a robust fuzzy rough set model
Noisy data based on data distribution to achieve the purpose of anti-noise i.e., Noise-aware Fuzzy Rough Sets
Density distribution (NFRS) model. This model proposes a novel search mechanism, which weakens the sensitivity of the
Dependency function approximation operator to noise by considering the distribution of samples in the decision classes to
weight the samples, further obtains three kinds of samples, i.e., intra-class samples, boundary samples,
and outlier samples. Then, the degrees of relevance of the feature for class is defined by the dependency
function based on the NFRS model to evaluate the significance of the feature subset. On this basis,
an evaluation function about feature significance is constructed, which simultaneously considers the
relevance and redundancy of a candidate feature provided for the selected subset and the remaining
feature subset. A novel forward greedy search algorithm is presented to select a feature sequence. The
selected features are subsequently evaluated with downstream classification tasks. Experimental using
real-world datasets demonstrate the effectiveness of the proposed model and its superiority against
comparison baseline methods.
© 2022 Elsevier B.V. All rights reserved.

1. Introduction evaluation function of feature for fuzzy rough feature selection al-
gorithm [17]. Fuzzy rough set theory can characterize uncertainty
Feature selection (or called attribute reduction in the fuzzy data and further select more informative features for downstream
rough set) is a process of select feature subset from the raw learning tasks. However, there may be shortcomings in applying
dataset for downstream learning tasks [1–3]. As only features FRS due to its sensitivity to noise in uncertainty data [18–21].
with more information are reserved through feature selection, For this reason, the sensitivity of FRS to noise receives certain
it is favorable to research raw feature space and then possible attention in recent years.
to construct an explicable the learning model. Feature selection Noise samples often appear in data so that is also one of
method has been widely used in cluster learning [4,5], pattern the reasons for data uncertainty. Generally speaking, there are
recognition [6,7], and classification learning [8,9]. Fuzzy rough set two types of noise samples in the data [19]. One is that the
(FRS) theory introduced by Dubois and Prade [10] is an important conditional attribute of the sample is anomalous (i.e., attribute
mathematical theory for feature selection by tackling uncertainty noise) [22], and the other is that the decision attribute of the
in data [11–14]. On this basis, the feature selection methods using sample is anomalous (i.e., class noise) [23]. Two types of noise
FRS have been extensively theoretically studied [15,16]. Wang have different impacts on the dependence degree and down-
et al. proposed an inner product dependency to construct the stream learning task [19,24]. Given a specific feature selection
algorithm, its performance depends largely on the quality of
∗ Corresponding author. the data. Accordingly, improving the quality of the data can be
E-mail addresses: yangxl@my.swjtu.edu.cn (X. Yang),
achieved by improving the anti-noise performance of the fuzzy
hmchen@swjtu.edu.cn (H. Chen), trli@swjtu.edu.cn (T. Li), cluo@scu.edu.cn rough set model [25,26]. The construction of approximation op-
(C. Luo). erations and information granulation is the main focus of many

https://doi.org/10.1016/j.knosys.2022.109092
0950-7051/© 2022 Elsevier B.V. All rights reserved.
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

studies to improve the adaptability and robustness of FRS. For operators and further built a feature selection algorithm based on
example, Hu et al. proposed some robust fuzzy rough sets models the kernelized fuzzy rough sets [40]. Naz et al. familiarized the
by analyzing the reason for noise sensitivity in FRS [19]. Cornelis idea of fuzzy bipolar soft sets and studied their algebraic struc-
et al. advocated a mitigated approach, in which the membership tures and their applications in decision-making problems [41].
of the lower and upper approximations is determined by means Malik et al. further developed the concept of rough fuzzy bipo-
of the ordered weighted average operators [27]. Wang et al. lar soft sets to handle decision-making problems by using the
further studied the equivalent expressions of granular variable uncertainty of the lower and upper approximations [42]. In this
precision fuzzy rough sets based on fuzzy (co)implications [28]. It paper, the separability of feature is measured by considering the
motivates further developments in fuzzy rough theory from the intra-class aggregation and between-class dispersion, which is
perspective of practice. estimated by using approximation operators. The relevance of the
As the robust fuzzy rough sets model has the advantage of feature can be measured by considering the intra-class aggrega-
dealing with fuzzy and uncertain information, this paper focuses tion and between-class dispersion of a candidate feature in the
on the robust fuzzy rough set approach for feature selection. The selected feature subsets. In addition, the redundancy of the fea-
robust fuzzy rough set methods have been applied effectively ture can be measured in the remaining feature subsets. In order to
in the feature selection [19,29–31]. For example, considering obtain a more comprehensive relation between feature and class
the proportion of samples belonging to different classes in the using the lower and upper approximations, we simultaneously
neighbor of a given sample, Li et al. proposed a different classes’ consider the relevance and redundancy of feature to better select
ratio fuzzy rough set model [32]. Hu et al. proposed the soft optimal feature subsets in the feature evaluation function. On this
fuzzy rough set model and the soft fuzzy dependency function in basis, a robust feature selection algorithm (FNIB) is designed by
noisy data for feature selection [29]. The optimal feature subset is applying the novel feature evaluation function in fuzzy rough set
selected by using FRS to estimate the classification performance theory. It can select features which have less redundancy between
of the feature subset [25,33,34]. Due to the fact that the fuzzy features and more relevance between features and class using the
rough set is very sensitive to the noise in the uncertainty data, the feature evaluation function. The comparison experiments with
evaluation of the uncertainty is not accurate [19]. Thus, An et al. other feature selection algorithms demonstrate the effectiveness
proposed a probabilistic fuzzy rough model by considering the and robustness of the proposed algorithm.
distribution information of data [35]. On this basis, the probability The rest of this paper is organized as follows. Section 2 intro-
granular distance-based fuzzy rough sets was proposed to reduce
duces the related concepts about this paper. Section 3 presents
the impact of noise data, whereas the amount of calculation is
the proposed noise-aware fuzzy rough sets model. The feature
very large and it ignores the relationship between the label of
selection algorithm including its computational complexity anal-
samples in the neighborhood of samples [36]. However, the data
ysis and the evaluation of feature significance are discussed in
distribution of most datasets is not known in practice and cannot
Section 4. Extensive experiments are conducted in Section 5.
be modeled by a single distribution. In addition, the certainty and
Finally, Section 6 concludes this paper.
possibility of a sample being correctly classified are estimated
by the degree of membership of the sample belonging to the
2. Preliminaries
upper and lower approximations. Therefore, both the upper and
lower approximations contain the uncertainty of data, which can
In this Section, we first review some notational conventions
be used as evaluation measure for feature selection. Inspired
[43], and then discuss the related concepts about the kernelized
by these ideas, we modify the method of calculating the fuzzy
relation to construct a novel fuzzy rough set for feature selection fuzzy rough sets [44].
by considering the effect of noise samples.
Considering the density and location of the sample with its 2.1. Notational conventions
neighbors from the same class, the samples can be categorized
into intra-class samples, boundary samples, and noise samples. Let ⟨U, C, D⟩ be expressed as a fuzzy decision information
We further introduce a weight of the sample based on the density system, where U is a nonempty universe, C = Cc ∪ Cn is the
and location of the sample in the fuzzy relation. Firstly, the set of the conditional attribute (i.e., Cn is a numerical attribute
information of intra-class samples is more reliable, and they subset and Cc is a categorical attribute subset), and D is called the
are assigned high weight. Then, the boundary sample has an decision attribute. For simplicity, we let D = {d}. V is the value
impact on the data uncertainty and has a certain possibility domain of attributes. Va is the domain of attribute a ∈ C ∪ D,
to become a noise sample, thus its information is not reliable V = ∪Va is the domain of attribute set C ∪ D. Va (x) is the value
enough, and it has a lower weight. Furthermore, the information of sample x in attribute a. Given a mapping X (·) : U → [0, 1], X
of the noise samples has a high deviation, thus we should reduce is called a fuzzy set on U, and X (x) is the membership degree
its influence on data uncertainty. Therefore, the weighted fuzzy of sample x ∈ U to X . Let F (U) be a fuzzy power set on U,
relation between samples is defined by considering the weight then X ∈ F (U). The crisp set is a special case of fuzzy set. Let
and neighborhood information of samples. We further propose a N : [0, 1] → [0, 1] be a monotonic decreasing function, if
Noise-aware Fuzzy Rough Sets (NFRS) model, which is robust by N (0) = 1, N (1) = 0, and N (N (a)) = a (∀a ∈ [0, 1]), N is called
reducing the influence of noisy samples. According to the defini- an involutive negator.
tion of lower approximations, the degree of dispersion between
classes is estimated by the lower approximation of fuzzy deci- 2.2. Fuzzy rough set
sions. Similarly, the upper approximation of fuzzy decision can
be also used to estimate the degrees of aggregation of intra-class Given ⟨U, C, D⟩, and B ⊆ C, RB : U × U → [0, 1] is a fuzzy
samples. relation on U induced by B. RB is also a fuzzy set on U × U, then
The efficient feature evaluation function is useful to select RB ∈ F (U × U). For a fuzzy set X ∈ F (U), ∀x, y ∈ U, kB (x, y)
features with more information and further improves data qual- is a fuzzy relation calculated by a kernel function on B, and the
ity [37,38]. Hu et al. designed a novel significance of feature upper and lower approximations of kernelized fuzzy rough set are
for feature selection by considering intra-class compactness and defined below [44],
between class sparsity [39]. Chen et al. explored the sample sim-
ilarity matrices in spectral graph theory by using approximation kB X (x) = inf max {N (kB (x, y)) , X (y)} , (1)
y∈U

2
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

kB X (x) = sup min {kB (x, y), X (y)} , (2) 3.1.2. Local density factor of sample
y∈U
The neighborhood information of the sample is depicted by
where kB X and kB X are a pair of fuzzy sets. the local density of the sample, and the data distribution of the
Assuming that D is a decision attribute, the universe U is sample can also be presented. Next, we define the local density of
divided into crisp equivalence classes, denoted as U/D = {D1 , D2 , the sample within the decision classes to describe the distribution
. . . , Dr }. The fuzzy positive⋃region of D on B in fuzzy rough set information of sample.
is defined as POSB (D) = X ∈U/D RB X . The dependency degree
of D on B is defined as γB (D) = B |POS (D)|
. The approximation Definition 2. Given ⟨U, C, D⟩ and B ⊆ C, for any x ∈ Di , NBU (x, k)
|U|
operator of the classical fuzzy rough set model adopts a sen- is the k-nearest neighbors samples of x in U with respect to B ⊆ C
sitive max–min statistical strategy, which is very sensitive to (k is a positive integer and y ̸ = x). NBS (x) is the neighborhood
noise in classification data. Therefore, considering the distribu- of sample in NBU (x, k), such that ∀y ∈ NBS (x), it has y ∈ Di and
tion information of data, An et al. proposed a probabilistic fuzzy y ∈ NBU (x, k). The local density of the sample x within the decision
rough model [35]. On this basis, combining data distribution classes is defined as
HE B (x, y)

and neighborhood information granular, the probability gran-
( )
1+ y∈NBS (x)
ular distance-based fuzzy rough sets was proposed to reduce LDB (x) = 1/ , (4)
the impact of noise data, whereas the amount of calculation is 1 + |NBU (x, k)|
very large [36]. Furthermore, the distance between samples is
where | · | is the cardinality of neighborhood.
transformed into a weighted distance between granules by using
the probability density value of the neighboring samples, but If the sample x is a noise, |NBS (x)| ≤ k (or the distance
it ignores the relationship between the label of samples in the between x and its neighboring samples in NBS (x) is larger), the
neighborhood of samples. The data distribution of most datasets value of the local density of the sample within the decision classes
is not known in practice and cannot be modeled by a single should be smaller. Therefore, the value of the local density of
distribution. The evaluation of the uncertainty is not accurate in the sample within the decision classes (local density for short)
the classical fuzzy rough set model. On this basis, the robust fuzzy could be used to reflect the distribution information of samples.
rough set model has been applied effectively, and the certainty In addition, local density of the sample does not fit well when the
and possibility of a sample being correctly classified can be better density distribution of the samples in the dataset varies greatly.
estimated by the degree of membership of the sample belonging If the value of the local density is used to identify noise, and the
to the robust upper and lower approximations.
difference of the local density of the sample in the neighborhood
3. Noise-aware fuzzy rough set model is ignored, which leads to incorrect conclusions. So, the value of
the local density does not directly reflect whether the sample is
Since the classical FRS model is sensitive to noisy information, noise. Then, we introduce local density factor to recognize noise
this Section presents the Noise-aware Fuzzy Rough Set model by using the local density of sample.
(NFRS). NFRS consists of the neighborhood information of sample,
approximations based on weighted fuzzy relation, and property anal- Definition 3. Given ⟨U, C, D⟩, ∀x ∈ Di , and LDB (x) is the local
ysis. The main theories of each component are presented in the density of the sample within the decision classes, the local density
following. factor of sample x in NBU (x, k) is defined as
∑ LDB (y)
3.1. The neighborhood information of sample y∈NBU (x,k)
LDB (x)
LDF(x) = . (5)
The neighborhood of a sample is a collection of samples that |NBU (x, k)|
are similar to the sample. The distance function between samples
The information of sample is related to the information of
is utilized to measure the similarity among different samples. In
its neighboring samples. Then, the local density of the sample
order to obtain the neighborhood information of sample, we first
is compared with its neighboring samples, and the difference is
define the distance function.
used to determine whether it is noise (see Eq. (5)). If the distribu-
3.1.1. Hybrid Euclidean distance function tion of the sample is significantly different from the distribution
Data in practical applications usually include numerical, cat- of the neighboring samples, the farther the local density factor
egorical, and mixed data. The hybrid Euclidean distance metric value of the sample x is from 1. That is, if the sample is a noise
(HE) is introduced to effectively deal with the mixed data. sample, the local density factor value of the sample x is greater
than 1. Thus, the larger the local density factor value of the
Definition 1. Given ⟨U, C, D⟩ and C = Cc ∪ Cn , for any x, y ∈ sample is more likely that this sample is a noise sample.
U and B ⊆ C ∪ D, the hybrid Euclidean distance function is
computed by 3.1.3. The weighted fuzzy relation between samples
√∑ The information provided by noise samples is unreliable. When
HEB (x, y) = d{a} (x, y)2 , (3) calculating the uncertainty of the information system, the influ-
∀a∈B ence of noise samples should be reduced. Next, we define the
weight of samples to perceive the noise in datasets by considering
where
the density distribution and neighborhood information of the
|V (x) − Va (y)| , if a ∈ Cn ,

⎨ a sample.
1, if a ∈ Cc and Va (x) ̸ = Va (y) ,

d{a} (x, y) =
⎩ 0,
⎪ if a ∈ Cc and Va (x) = Va (y) , Definition 4. Given ⟨U, C, D⟩, for any x ∈ U, the weight of
1, if Va (x) = ∗ or Va (y) = ∗. sample is⎧defined as follows.
According to Definition 1, the hybrid Euclidean distance func- ⎪0,
⎪ NBS (x) = 0 or LDF(x) > λ,
1, { NBS (x) = k and LDF(x) ≤ λ,

tion can not only measure the distance between samples in
WB (x) = } .
mixed datasets, but also measure the distance between samples S
LDF(x)∗|NB (x)|
⎩min , 1 , other .


in incomplete datasets. U |NB (x,k)|

3
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

Fig. 1. The visualization of two-dimensional datasets.

An example is given in Fig. 1 to illustrate the weight of samples between samples reflects the noise information in the dataset.
and possible noise samples in the two-dimension data. It has red Thus, the weighted fuzzy relation between samples can better
class that is randomly distributed data that satisfy the Gaussian measure the uncertainty of the information system by reducing
distribution with a mean value of (2, 5) in two datasets. Specif- the impact of unreliable samples on the information system, and
ically, Fig. 1(a) shows the blue class that satisfies the uniform it is further used in the following sections. Next, we introduce the
distribution, and Fig. 1(b) shows the blue class that satisfies the approximations based on the weighted fuzzy relation.
Gaussian distribution with a mean value of (6, 5). Intuitively,
the sample x is more likely to be noise than the sample y in 3.2. Approximations based on weighted fuzzy relation
Fig. 1(a) and the samples (i.e., x, y, and z) are probably noisy in
Fig. 1(b). If only the local density of the sample (see Definition 3)
As shown in Section 3.1, the weighted fuzzy relation is con-
is used, the sample y is more likely to be noise than the sample
structed by combining Gaussian kernel function with the weight
x in Fig. 1(a), which leads to incorrect conclusions. According
value of samples. The lower and upper approximations are mod-
to Definition 4, Figs. 1(c) and 1(d) show the weight value of
the samples in two datasets. Note that the weight value of the eled by using the weighted fuzzy relation to construct robust
unmarked sample is 1. From Figs. 1(c) and 1(d), the weight of the approximation operators. Next, we will define the noise-aware
samples defined by considering the distribution of the samples fuzzy rough set model (NFRS).
and the neighborhood information leads to correct conclusions,
and it can adapt to situations when the density distribution of the Definition 6. Given ⟨U, C, D⟩, for any x, y ∈ U and B ⊆ C.
samples in the dataset varies greatly. Furthermore, as shown in RWB (x, y) indicates the weighted fuzzy relation between samples x
Fig. 1, we can divide the samples into three parts, i.e., intra-class and y. The membership degree of the sample x belonging to the
samples, boundary samples, and outlier samples. Next, we make lower and upper approximations of X ∈ F (U) with respect to B
a detailed description as follows. are defined as

B (x, y) , X (y) ,
RB X (x) = inf max N RW
{ ( ) }
(1) When a sample is closer to its decision class and the distri- (8)
∀y∈U
bution of the sample in its neighborhood is less different,
the weight value of the sample should be 1 or very close B (x, y), X (y)}.
RB X (x) = sup min{RW (9)
to 1. This sample is called as an intra-class sample. ∀y∈U
(2) If a sample is located in the middle region between differ- For classification tasks, for any X ∈ U/D, the membership
ent decision class (we call this area as boundary region), the degree of the sample x belonging to the lower and upper approx-
weight value of sample should be less 1, and this sample is imations can be simplified as follows
called as a boundary sample.
B (x, y)
N RW ,
{ ( )}
(3) If a sample which is closer to other decision classes or RB X (x) = inf (10)
/X
∀y∈
it is a sample with anomalous distribution by using the
B (x, y) .
RB X (x) = sup RW
{ }
neighborhood information of sample, the weight value of (11)
sample should be 0. This sample is called as an outlier ∀y∈X

sample. The proposed noise-aware fuzzy rough set model reduces the
The weight value of the sample is determined by considering impact of the sample with unreliable information when comput-
the local density and location of the sample with its neighbors ing the lower and upper approximations. If the weighted fuzzy
from the same decision class. That is, the weight value of the relation is used, the noise sample is ignored in the process of
sample can perceive the noise in datasets. computing the lower approximation, thus the positive region of
the information system will grow in size. Similarly, the upper
Definition 5. Given ⟨U, C, D⟩, for any x, y ∈ U and B ⊆ C, the approximation is calculated by the proposed fuzzy relation, which
fuzzy relation of samples is defined as can be understood as the possibility of being correctly classi-
fied. Thus, the novel lower and upper approximations achieve
HEB (x, y)2
( )
RGB (x, y) = exp − . (6) robustness to noise by using the weighted fuzzy relation, which
2δ 2 considers the local density and location of the sample. The visu-
The fuzzy relation between samples is computed by Gaussian alization of Definition 6 on Dataset2 is shown in Fig. 2, where
kernel function. Combining Definitions 4 and 5, the weighted the membership values of the lower and upper approximations
fuzzy relation between samples is defined as are described in the legend. Note that the membership value of
the lower approximation is related to the nearest sample with a
B (x, y) = WB (y)RB (x, y).
RW G
(7) different decision (See Fig. 2(a)). The lower approximation further
Due to considering the local density and location of the sam- reflects the degree of dispersion between classes. Similarly, the
ple, the value of the weight reflects the reliability of the informa- upper approximation reflects the degree of aggregation of intra-
tion provided by the sample. Then, the weighted fuzzy relation class samples (See Fig. 2(b)). According to Figs. 1 and 2, these
4
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

Fig. 2. The visualization of Definition 6 on Dataset2.


(⋃ ) ⋃
Table 1 ii. RB X ∈U/D X = X ∈U/D RB X .
A fuzzy decision information system.
U c1 c2 c3 c4 d c1 c2 c3 c4 d
x1 0.2 0.4 0.8 0.6 1 0.500 1.000 0.875 0.625 1 ⋂ For any X ∈ U/D, x,{y ∈ U, and B ⊆ C, we have
Proof.
RB ( X ∈U/D X )(x) = infy∈U max N (RB (x, y)) , minX ∈U/D {X (y)} =
}
x2 0.3 0.3 0.4 0.6 1 1.000 0.667 0.375 0.625 1

) {N (RB (x, y)
( ) , X (y)} )= minX ∈U/D RB X (x)
x3 0.1 0.2 0.6 0.5 0 0.000 0.333 0.625 0.500 0
{ } { }
min(X ∈U/D infy∈U max
x4 0.1 0.2 0.3 0.3 0 0.000 0.333 0.250 0.250 0 ⋂ ⋂ ⋂
x5 0.3 0.3 0.1 0.9 1 1.000 0.667 0.000 1.000 1 = X ∈U/D RB X (x) . Hence, RB = X ∈U/D RB X . In a
X ∈U/D X
x6 0.15 0.1 0.9 0.1 0 0.250 0.000 1.000 0.000 0 (⋃ ) ⋃
similar way, we can obtain RB X ∈U/D X = X ∈U/D RB X . □

conclusions are easily understood. Next, this will be described in Property 2. Given ⟨U, C, D⟩, for any X ∈ U/D and B ⊆ C, there
detail in Section 4. Example 1 shows these the membership value are
of the lower and upper approximations.
i. N RB (X ) = RB (N (X )),
( )

ii. N RB (X ) = RB (N (X )).
( )
Example 1. A fuzzy decision information system ⟨U, C, D⟩ is
shown in the left of Table 1, where U = {x1 , x2 , . . . , x6 }, C =
{c1 , c2 , c3 , c4 } with V = [0, 1], D = {d}, U/D = {D1 , D2 }, where Proof. For any x, y ∈ U, and B ⊆ C, we have N RB (X ) (x) =
( )
D1 = {x1 , x2 , x5 } and D2 = {x3 , x4 , x6 }. 1 − RB (X ) (x) = 1 − infy∈U max {N (RB (x, y)) , X (y)} = supy∈U min
{RB (x, y), 1 − X (y)} = RB (N (X )) (x). Hence, N( RB (X )) = RB
( )
Let the radius of nearest neighbor samples k be 3 for the
fuzzy decision information system, δ is set to 1, λ is 1.5, and the (N (X )). In a similar way, we can obtain N RB (X ) = RB
min–max normalization method is used in order to normalize (N (X )). □
the data (see the right of Table 1). The weight of samples is
WC (U) = {0.573, 1, 1, 1, 1, 1}, calculated by using Definition 4. Property 3. Given ⟨U, C, D⟩, for any X1 , X2 ∈ U/D and X1 ⊆ X2 ,
The W W there are
⎛ fuzzy relation matrices RC is obtained as ⎞follows, RC =
0.573 0.737 0.680 0.542 0.531 0.480
i. RB (X1 ) ⊆ RB (X2 ),
⎜0.422 1.000 0.552 0.531 0.869 0.409⎟
⎜0.390 0.552 1.000 0.903 0.417 0.754⎟
⎜ ⎟ ii. RB (X1 ) ⊆ RB (X2 ).
⎜0.311 0.531 0.903 1.000 0.420 0.671⎟.
Proof. For any x ∈ U and y ∈ U, we have RB (X1 ) (x) =
⎜ ⎟
⎝0.304 0.869 0.417 0.420 1.000 0.222⎠
0.275 0.409 0.754 0.671 0.222 1.000 infy∈U max {N (RB (x, y)) , X1 (y)} ≤ infy∈U max {N (RB (x, y)) , X2 (y)}
The lower and upper approximations are calculated as follows. = RB (X2 ) (x). Hence, RB (X1 ) ⊆ RB (X2 ). In a similar way, we can
RC D1 = {0.320, 0.448, 0.000, 0.000, 0.580, 0.000}, obtain RB (X1 ) ⊆ RB (X2 ). □
RC D2 = {0.263, 0.000, 0.448, 0.469, 0.000, 0.591},
RC D1 = {0.737, 1.000, 0.552, 0.531, 1.000, 0.409}, The properties given above can provide a fundamental theory
RC D2 = {0.680, 0.552, 1.000, 1.000, 0.420, 1.000}. for the construction of feature selection algorithm.

3.3. Property analysis 4. Feature selection method based on NFRS

Some properties of the noise-aware fuzzy rough set model are


In this Section, we propose a novel feature selection algorithm
discussed in this Subsection.
based on the proposed NFRS. We call this algorithm FNIB (as
referred to Feature selection algorithm based on NFRS with the
Property 1. Given ⟨U, C, D⟩, for any X ∈ U/D, and B ⊆ C, there
are Intra-class aggregation and Between-class dispersion). This Sec-
(⋂ ) ⋂ tion includes three parts, i.e., uncertainty measures, evaluation
i. RB X ∈U/D X = X ∈U/D RB X , function, and the greedy feature selection algorithm.
5
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

4.1. Uncertainty measures Example 2. The continuation of Example 1, the uncertainty of


the fuzzy decision information system on the basis of the upper
The upper and lower approximations of the decision class can and lower approximations is calculated as follows.
reflect the intra-class aggregation and between-class dispersion The fuzzy positive and non-negative region of the decision
in the dataset (see Section 3.2). To this end, we next measure the D with respect to attribute set C are calculated as POSRC (D) =
uncertainty of the fuzzy decision information system on the basis {0.320, 0.448, 0.448, 0.469, 0.580, 0.591}, UPRC (D) = {0.737,
of the upper and lower approximations. 1.000, 1.000, 1.000, 1.000, 1.000}.
The fuzzy dependency of the decision D based on the fuzzy
Definition 7. Given ⟨U, C, D⟩ and B ⊆ C, for any X ∈ U/D positive and non-negative region with respect to C are calculated
and x ∈ U, RB (X ) (x) is the membership degree of the sample as γRC (D) = 0.476, γRC (D) = 0.956.
x belonging to the lower approximation of X . The membership
degree of the sample x belonging to the fuzzy positive region of 4.2. Evaluation function
the decision D with respect to attribute subset B is defined as

POSRB (D)(x) = sup RB (X ) (x). (12) Let B denote the selected condition attribute subset, and D
∀X ∈U/D denote the decision attribute set. γRB (D) and γRB (D) indicate the
degrees of between-classes dispersion and intra-class aggregation
POSRB (D)(x) means the degree to which the sample x is cor-
of the fuzzy decision information system, respectively. The fuzzy
rectly classified. It also describes the degree of between-classes
membership of the decision D based on the fuzzy positive and
dispersion provided by sample x. Next, we will define a evaluation
non-negative region of D is employed to calculate the significance
function based on the fuzzy positive region to measure the de- of B relative to the decision D.
gree of between-classes dispersion in fuzzy decision information
system. Definition 11. Given ⟨U, C, D⟩, γRB (D) and γRB (D) indicate the
degrees of between-classes dispersion and intra-class aggrega-
Definition 8. Given ⟨U, C, D⟩ and B ⊆ C, POSRB (D)(x) is the mem- tion of the fuzzy decision information system, respectively. The
bership degree of the sample x belonging to the fuzzy positive separability of B for data is defined as
region of the decision D with respect to B. The fuzzy dependency
of the decision D based on the positive region with respect to B SDRW (D) = γRB (D) + γRB (D). (16)
B
is defined as
∑ ⏐ ⏐ When B = φ , we set SDRW (D) = 0. The range of SDRW (D)
⏐POSR (D)(x)⏐ B B
∀x∈U B is [0, 2]. SDRW (D) is the measure of uncertainty, it describes the
γRB (D) = .
W (x)̸ =0
(13) B
separability of B relative to the decision D based on the intra-class
|U|
aggregation and between-class dispersion. In addition, SDRW (D)
γRB (D) is a dispersion measurement of between-class samples B
can describe the relevance of B. Next, we introduce the separabil-
under B. Based on Section 3.1.3, the sample x is an outlier sample,
ity of the candidate attribute c relative to the selected condition
when W (x) = 0. For this reason, the uncertainty information
attribute subset B for the decision D.
provided by x is unreliable, we should further exclude the effect of
x on the uncertainty measure. In practice, a large γRB (D) indicates Definition 12. Given ⟨U, C, D⟩, SDRW (D) is the separability of
that the dispersion of between-class samples is large. B
attribute subset B. The relevance of the candidate attribute c rel-
ative to the selected condition attribute subset B for the decision
Definition 9. Given ⟨U, C, D⟩ and B ⊆ C, for any X ∈ U/D
D is defined as
and x ∈ U, RB (X ) (x) is the membership degree of the sample
x belonging to the upper approximation of X . The membership Sep(c , B, D) = SDRW (D) − SDRW (D). (17)
B∪{c } B
degree of the sample x belonging to the fuzzy non-negative region
of the decision D with respect to B is defined as Given the selected condition attribute subset B, Sep(c , B, D)
describes the separability that can only be provided by candidate
UPRB (D)(x) = sup RB (X ) (x). (14) attribute c. In other words, a large Sep(c , B, D) indicates that
∀X ∈U/D
the relevance provided by c is large. The separability influence
UPRB (D)(x) means the degree to which the sample x may provided by B for the condition attribute set is defined as follows.
be correctly classified. It describes the degree of aggregation of
intra-class samples provided by sample x. Next, we will define Definition 13. Given ⟨U, C, D⟩ and B ⊆ C, the separability
a evaluation function to measure the degree of aggregation of influence provided by B for the condition attribute set is defined
intra-class samples. as
IN(B, D) = SDRW (D) − SDRW (D). (18)
Definition 10. Given ⟨U, C, D⟩ and B ⊆ C, UPRB (D)(x) is the C C− B

membership degree of the sample x belonging to the upper IN(B, D) describes the separability influence provided by B for
approximation of the decision D with respect to B. The fuzzy the condition attribute set. It also reflects the global redundancy
dependency of the decision D based on the fuzzy non-negative of B for the condition attribute set. In other words, a large IN(B, D)
region with respect to B is defined as indicates that the global redundancy provided by B is small. Next,
∑ we introduce the redundancy of the candidate features c relative
∀x∈U UPRB (D)(x)
to B for the decision D.
γRB (D) = .
W (x)̸ =0
(15)
|U|
Definition 14. Given ⟨U, C, D⟩, and B ⊆ C, the redundancy of
γRB (D) is a measure of the intra-class aggregation of all deci-
the candidate attribute c relative to B for the decision D is then
sion classes under B in the fuzzy decision information system. In
defined as follows.
practice, a large γRB (D) indicates that the intra-class aggregation
of all decision classes under B is large. Ind(c , B, D) = IN(B ∪ {c }, D) − IN(B, D). (19)
6
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

Ind(c , B, D) describes the separability influence that only the The lower and upper approximations are calculated as follows.
candidate features c can provide for the selected feature subset. Rc1 D1 = {0.031, 0.245, 0.000, 0.000, 0.245, 0.000},
It also reflects the redundancy of the candidate feature c in B. Rc1 D2 = {0.118, 0.000, 0.393, 0.393, 0.000, 0.245},
Namely, a large Ind(c , B, D) indicates that the redundancy pro- Rc2 D1 = {0.393, 0.199, 0.054, 0.054, 0.199, 0.000},
vided by B is small. That is, the separability of feature for data is Rc2 D2 = {0.000, 0.054, 0.199, 0.199, 0.054, 0.393},
used to describe the relevance and redundancy of feature. Given Rc3 D1 = {0.173, 0.315, 0.223, 0.371, 0.495, 0.167},
Definitions 13 and 14, the significance of an attribute is then
Rc3 D2 = {0.432, 0.223, 0.315, 0.192, 0.167, 0.495},
defined as follows.
Rc4 D1 = {0.068, 0.068, 0.031, 0.000, 0.245, 0.000},
Rc4 D2 = {0.068, 0.068, 0.118, 0.245, 0.000, 0.393}.
Definition 15. Given ⟨U, C, D⟩, and B ⊆ C, the significance of c
on B is defined as follows, Rc1 D1 = {0.882, 1.000, 0.607, 0.607, 1.000, 0.755},
Rc1 D2 = {0.969, 0.755, 1.000, 1.000, 0.755, 1.000},
Sig(c , B, D) = Sep(c , B, D) + Ind(c , B, D). (20)
Rc2 D1 = {1.000, 0.946, 0.801, 0.801, 0.946, 0.607},
The significance of the candidate attribute c handles two parts Rc2 D2 = {0.607, 0.801, 0.946, 0.946, 0.801, 1.000},
of information, i.e., the selected feature subset B and the remain- Rc3 D1 = {0.568, 0.777, 0.685, 0.808, 0.833, 0.505},
ing feature subset C − B −{c }. Given the selected feature subset B, Rc3 D2 = {0.827, 0.685, 0.777, 0.629, 0.505, 0.833},
Sep(c , B, D) evaluates the separability of the candidate feature c Rc4 D1 = {0.932, 0.932, 0.882, 0.755, 1.000, 0.607},
relative to the decision attribute D for the selected feature subset Rc4 D2 = {0.932, 0.932, 0.969, 1.000, 0.755, 1.000}.
B, and Ind(c , B, D) evaluates the separability provided by the The fuzzy positive and non-negative region of the decision D
candidate feature c for the remaining feature subsets. A feature with respect to a single feature are calculated as follows.
evaluation function in forward greedy searching is established as POSRc1 (D) = {0.118, 0.245, 0.393, 0.393, 0.245, 0.245},
below. POSRc2 (D) = {0.393, 0.199, 0.199, 0.199, 0.199, 0.393},
c ∗ = arg max Sig(c , B, D). (21) POSRc3 (D) = {0.432, 0.315, 0.315, 0.371, 0.495, 0.495},
c ∈C −B POSRc4 (D) = {0.068, 0.068, 0.118, 0.245, 0.245, 0.393}.
Such a feature selection method deal with both the relevance UPRc1 (D) = {0.969, 1.000, 1.000, 1.000, 1.000, 1.000},
and global redundancy of features by using the separability of the UPRc2 (D) = {1.000, 0.946, 0.946, 0.946, 0.946, 1.000},
candidate feature c in feature set. In each iteration, we further UPRc3 (D) = {0.827, 0.777, 0.777, 0.808, 0.833, 0.833},
consider dynamic relation between the selected features and the UPRc4 (D) = {0.932, 0.932, 0.969, 1.000, 1.000, 1.000}.
remaining features to make decision, namely, the separability The fuzzy dependency of the decision D based on the fuzzy
with respect to the decision attribute D and the separability with positive and non-negative region with respect to a single feature
respect to the remaining feature subset C − B. The detail of feature are calculated as follows.
evaluation is shown in Example 3. γRc1 (D) = 0.273, γRc2 (D) = 0.264, γRc3 (D) = 0.165, γRc4 (D) =
0.170.
Example 3. The continuation of Example 2, the details of FNIB γRc (D) = 0.995, γRc (D) = 0.964, γRc (D) = 0.278, γRc (D) =
1 2 3 4
are calculated as follows. 0.811.
The separability for a single feature is calculated as follows.
The weight of samples for a single feature are Wc1 (U) =
SDRW (D) = 1.268, SDRW (D) = 1.228, SDRW (D) = 0.443, and
{0.667, 1.000, 1.000, 1.000, 1.000, 1.000}, Wc2 (U) = {1.000, c1 c2 c3

0.667, 0.667, 0.667, 0.667, 1.000}, Wc3 (U) = {0.000, 0.000, SDRW (D) = 0.981.
c4
0.000, 0.000, 0.833, 0.833}, and Wc4 (U) = {0.667, 0.667, 0.000, The relevance for a single feature is calculated as follows.
1.000, 1.000, 1.000}, respectively. The fuzzy relation matrices are Sep(c1 , Φ , D) = 1.268, Sep(c2 , Φ , D) = 1.228, Sep(c3 , Φ , D) =
obtained
⎛ as follows. 0.443, and Sep(c4 , Φ , D) = 0.981.
0.667 0.882 0.882 0.882 0.882 0.969

The weight of samples for the remaining feature subset of
⎜0.588 1.000 0.607 0.607 1.000 0.755⎟ a single feature is WC −{c1 } (U) = {1, 1, 1, 1, 1, 1}, WC −{c2 } (U) =
⎜0.588 0.607 1.000 1.000 0.607 0.969⎟ {1, 1, 1, 1, 1, 1}, WC −{c3 } (U) = {1, 1, 1, 1, 1, 1}, and WC −{c4 } (U) =
⎜ ⎟
W
R c1 = ⎜ ⎟,
⎜0.588 0.607 1.000 1.000 0.607 0.969⎟ {1, 1, 1, 1, 1, 1}, respectively. The fuzzy relation matrices for the
⎝0.588 1.000 0.607 0.607 1.000 0.755⎠
remaining feature subset of a single feature are obtained as fol-
⎛0.646 0.755 0.969 0.969 0.755 1.000⎞ lows.
1.000 0.631 0.534 0.534 0.631 0.607 1.000 0.835 0.467 0.372 0.601 0.300
⎛ ⎞
⎜0.946 0.667 0.631 0.631 0.667 0.801⎟ ⎜0.835 1.000 0.552 0.531 0.869 0.329⎟
⎜0.801 0.631 0.667 0.667 0.631 0.946⎟ ⎜0.467 0.552 1.000 0.903 0.417 0.778⎟
⎜ ⎟ ⎜ ⎟
RW
c2 = ⎜0.801 ⎟, RWC −{c1 } = ⎜0.372 ⎟,
⎜ 0.631 0.667 0.667 0.631 0.946⎟ ⎜ 0.531 0.903 1.000 0.420 0.692⎟
⎝0.946 0.667 0.631 0.631 0.667 0.801⎠ ⎝0.601 0.869 0.417 0.420 1.000 0.179⎠
⎛0.607 0.534 0.631 0.631 0.534 1.000⎞ ⎛0.300 0.329 0.778 0.692 0.179 1.000⎞
0.000 0.000 0.000 0.000 0.568 0.827 1.000 0.779 0.515 0.410 0.561 0.480
⎜0.000 0.000 0.000 0.000 0.777 0.685⎟ ⎜0.779 1.000 0.354 0.340 0.869 0.310⎟
⎜0.000 0.000 0.000 0.000 0.685 0.777⎟ ⎜0.515 0.354 1.000 0.903 0.267 0.797⎟
⎜ ⎟ ⎜ ⎟
RW
c3 = ⎜0.000 ⎟, RWC −{c2 } = ⎜0.410 ⎟,
⎜ 0.000 0.000 0 .000 0. 808 0. 629⎟ ⎜ 0 .340 0.903 1.000 0. 269 0. 709 ⎟
⎝0.000 0.000 0.000 0.000 0.833 0.505⎠ ⎝0.561 0.869 0.267 0.269 1.000 0.168⎠
⎛0.000 0.000 0.000 0.000 0.505 0.833⎞ ⎛0.480 0.310 0.797 0.709 0.168 1.000⎞
0.667 0.667 0.000 0.932 0.932 0.823 1.000 0.835 0.425 0.400 0.778 0.293
⎜0.667 0.667 0.000 0.932 0.932 0.823⎟ ⎜0.835 1.000 0.345 0.324 0.932 0.302⎟
⎜0.661 0.661 0.000 0.969 0.882 0.882⎟ ⎜0.425 0.345 1.000 0.969 0.307 0.809⎟
⎜ ⎟ ⎜ ⎟
RW = ⎟. RW = ⎟,
c4
⎜0.621 0.621 0.000 1.000 0.755 0.969⎟ C −{c3 }
⎜0.400 0.324 0.969 1.000 0.263 0.889⎟
⎜ ⎜
⎝0.621 0.621 0.000 0.755 1.000 0.607⎠ ⎝0.778 0.932 0.307 0.263 1.000 0.222⎠
0.548 0.548 0.000 0.969 0.607 1.000 0.293 0.302 0.809 0.889 0.222 1.000
7
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

1.000 0.737 0.415 0.353 0.569 0.354


⎛ ⎞
Let B denote the selected feature subset using a forward
⎜0.737 1.000 0.337 0.345 0.932 0.302⎟ greedy algorithm, C is the conditional feature set, C − B is the
⎜0.415 0.337 1.000 0.932 0.286 0.855⎟
⎜ ⎟
RW =⎜ ⎟. remaining feature subset, and c is the current candidate features.
C −{c4 }
⎜0.353 0.345 0.932 1.000 0.337 0.692⎟ We select a candidate feature using the feature evaluation criteria
⎝0.569 0.932 0.286 0.337 1.000 0.222⎠ (i.e., Eq. (21)) in each iteration. Formally, the proposed feature
0.354 0.302 0.855 0.692 0.222 1.000 selection algorithm FNIB is listed in Algorithm 1.
The lower and upper approximations for the remaining feature
Algorithm 1 The FNIB algorithm
subset of a single feature are calculated as follows.
RC −{c1 } D1 = {0.533, 0.448, 0.000, 0.000, 0.580, 0.000}, Input: Fuzzy decision system ⟨U, C, D⟩, B ⊆ C, a predefined thresholds
RC −{c1 } D2 = {0.000, 0.000, 0.448, 0.469, 0.000, 0.671}, d, and three predefined parameters λ, k, and δ .
Output: A reduct of C, i.e., the selected features Re .
RC −{c2 } D1 = {0.485, 0.646, 0.000, 0.000, 0.731, 0.000},
1: Initialize Se , Re ← ∅, and B ← C − Se , where Se is an ordered
RC −{c2 } D2 = {0.000, 0.000, 0.485, 0.590, 0.000, 0.520}, sequence of selected features;
RC −{c3 } D1 = {0.575, 0.655, 0.000, 0.000, 0.693, 0.000}, 2: Calculate U/D;
RC −{c3 } D2 = {0.000, 0.000, 0.575, 0.600, 0.000, 0.698}, 3: while B ̸ = ∅ do
RC −{c4 } D1 = {0.585, 0.655, 0.000, 0.000, 0.663, 0.000}, 4: for any c ∈ B do
5: for any x ∈ U do
RC −{c4 } D2 = {0.000, 0.000, 0.585, 0.647, 0.000, 0.646}.
6: Calculate WC −Se −{c } (x) and WSe ∪{c } (x);
RC −{c1 } D1 = {1.000, 1.000, 0.552, 0.531, 1.000, 0.329}, 7: end for
RC −{c1 } D2 = {0.467, 0.552, 1.000, 1.000, 0.420, 1.000}, 8: for any x, y ∈ U do
Se ∪{c } (x, y) and RC −Se −{c } (x, y);
Calculate RW W
RC −{c2 } D1 = {1.000, 1.000, 0.515, 0.410, 1.000, 0.480}, 9:
10: end for
RC −{c2 } D2 = {0.515, 0.354, 1.000, 1.000, 0.269, 1.000},
11: Calculate Sep(c , Se , D) and Ind(c , Se , D);
RC −{c3 } D1 = {1.000, 1.000, 0.425, 0.400, 1.000, 0.302}, 12: Calculate Sig(c , Se , D) according to Eq. (20);
RC −{c3 } D2 = {0.425, 0.345, 1.000, 1.000, 0.307, 1.000}, 13: end for
RC −{c4 } D1 = {1.000, 1.000, 0.415, 0.353, 1.000, 0.354}, 14: Select the feature c ∗ with maximum value Sig(c ∗ , Se , D);
RC −{c4 } D2 = {0.415, 0.345, 1.000, 1.000, 0.337, 1.000} 15: Se ← Se ∪ {c ∗ };
The positive and non-negative region for the remaining feature 16: B ← B − c∗;
subset of a single feature can be calculated as follows. 17: end while
18: Select the feature subset Re from an ordered sequence of features
POSRC −{c } (D) = {0.533, 0.448, 0.448, 0.469, 0.580, 0.671},
1 Se with the highest classification accuracy by using classifiers;
POSRC −{c } (D) = {0.485, 0.646, 0.485, 0.590, 0.731, 0.520},
2 19: return Re .
POSRC −{c } (D) = {0.575, 0.655, 0.575, 0.600, 0.693, 0.698},
3
POSRC −{c } (D) = {0.585, 0.655, 0.585, 0.647, 0.663, 0.646}. The computational complexity of FNIB mainly consists of three
4
UPRC −{c } (D) = {1.000, 1.000, 1.000, 1.000, 1.000, 1.000}, parts. In the first part, Steps 5–7 are to obtain the weight of
1
UPRC −{c } (D) = {1.000, 1.000, 1.000, 1.000, 1.000, 1.000}, samples, its time complexity is O(kn2 ), and its space complexity
2 is O(n). In the second part, Steps 8–10 are to obtain the fuzzy
UPRC −{c } (D) = {1.000, 1.000, 1.000, 1.000, 1.000, 1.000},
3 relation between samples and its time complexity is O(n2 ), and
UPRC −{c } (D) = {1.000, 1.000, 1.000, 1.000, 1.000, 1.000}.
4 its space complexity is O(n2 ). In the three part, Steps 3–17 are
The fuzzy dependency of the decision D based on the fuzzy
to find the best candidate feature until Step 3 does not hold, its
positive and non-negative region for the remaining feature subset
time complexity is O(km2 n2 ), and its space complexity is O(mn2 ).
of a single feature are calculated as follows.
In summary, the overall time and space complexity of FNIB are
γRC −{c } (D) = 0.525, γRC −{c } (D) = 0.576,
1 2 O(km2 n2 ) and O(mn2 ), respectively.
γRC −{c3 } (D) = 0.633, γRC −{c4 } (D) = 0.630.
γRC −{c } (D) = 1.000, γRC −{c } (D) = 1.000, 5. Experiments
1 2
γRC −{c } (D) = 1.000, γRC −{c } (D) = 1.000.
3 4
The separability for the remaining feature subset of a single In this Section, the robustness and effectiveness of the pro-
feature is calculated as follows. posed NFRS model are evaluated by series of experiments, respec-
SDRW (D) = 1.525, SDRW (D) = 1.576, SDRW (D) = 1.633, tively. To verify the robustness of the NFRS model and differences
C −{c1 } C −{c2 } C −{c3 }
in the effect of noise distribution on models, we first contrast the
and SDRW (D) = 1.630.
C −{c4 } dependence degree. The classification performance of selected
The separability influence for a single feature is calculated as features of FNIB are then compared with other algorithms on real-
follows. world datasets. These experiments are performed on a Windows
INRW (D) = SDRW (D) − SDRW (D) = γRC (D) + γRC (D) − system Intel(R) Core(TM) i5-8500 CPU @ 3.00 GHz, 8 GB RAM and
C −{c1 } C C−{c1 }
SDRW (D) = −0.093, INRW (D) = −0.144, INRW (D ) = MATLAB development environment. That is, all experiments are
C−{c1 } C −{c2 } C −{c3 }
−0.200, and INRW (D) = −0.198. performed in the same environment.
C −{c4 }
Then, the redundancy for a single feature is calculated as 5.1. The effectiveness of the FNIB algorithm
follows.
Ind(c1 , Φ , D) = 0.093, Ind(c2 , Φ , D) = 0.144, Ind(c3 , Φ , D) = We experiment on twenty-four real-world datasets. All datasets
0.200, and Ind(c4 , Φ , D) = 0.198. are from UCI Machine Learning Repository,1 Arizona State Univer-
The significance for a single feature is calculated as follows. sity,2 and KEEL-Dataset Repository.3 The information of datasets
Sig(c1 , Φ , D) = 1.361, Sig(c1 , Φ , D) = 1.372, Sig(c1 , Φ , D) = is summarized in Table 2. Some datasets are incomplete, and
0.643, and Sig(c1 , Φ , D) = 1.178.
the missing values in the incomplete dataset are filled with the
most frequently values on those attributes. Categorical values in a
4.3. The greedy feature selection algorithm

In this Section, we present the forward greedy Feature se- 1 http://archive.ics.uci.edu/ml/index.php.


lection algorithm based on our NFRS model with the Intra-class 2 https://jundongl.github.io/scikit-feature/datasets.html.
aggregation and Between-class dispersion (FNIB). 3 https://sci2s.ugr.es/keel/datasets.php.

8
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

Table 2
The summary of the experimental datasets.
No. Datasets Abbreviation Samples Features Classes Data type
1 Glass identification Glass 214 9 6 Numerical
2 Sonar, mines vs. rocks Sonar 208 60 2 Numerical
3 Ecoli Dataset Ecoli 336 8 8 Numerical
4 Ionosphere Iono 351 34 2 Numerical
5 Pima Indians diabetes Pima 768 8 2 Mixed
6 Credit Approval Credit-A 690 16 2 Mixed
7 Car Evaluationn Car 1728 6 4 Categorical
8 Wilt Wilt 4889 5 2 Numerical
9 Statlog (German Credit Data) German 1000 20 2 Numerical
10 Dermatology Derma 366 34 6 Mixed
11 Lymphography Lymph 148 18 4 Categorical
12 Wisconsin prognostic breast cancer Wpbc 198 33 2 Numerical
13 Zoo Zoo 101 17 7 Categorical
14 Autos Autos 205 25 6 Mixed
15 Statlog(heart) Heart 270 13 2 Mixed
16 Hepatitis Hepati 155 19 2 Mixed
17 Parkinsons Parkin 197 23 2 Numerical
18 SPECT SPECT 267 22 2 Categorical
19 Sick Sick 3772 29 2 Mixed
20 Breast Cancer Wisconsin Wbc 699 10 2 Numerical
21 Lung_discrete Lung 73 325 7 Numerical
22 Colon Colon 62 2000 2 Numerical
23 Yale database Yale 165 1024 15 Numerical
24 ORL ORL 400 1024 40 Numerical

dataset are replaced by integers if a comparison algorithm cannot shown in Tables 3–4, and the best results are highlighted in bold.
process categorical data. Furthermore, in order to reduce the Two classifiers are executed on the features selected from each
influence of dimension on the experimental results, we perform dataset.
min–max normalization on each dataset. From the results in Tables 3–4, we can observe that the clas-
We compare FNIB with seven baseline algorithms including sification performance on the selected features using the FNIB
the minimal-redundancy maximal-relevance feature selection al- algorithm is better than the performance on the all features. This
gorithm (mRMR) [45], the mutual information-based feature se- shows that some features are redundant in the data and the
lection algorithm (MIFS) [46], the fast correlation-based filter selected features can help improve classification performance. For
(FCBF) [47], the rough fuzzy bipolar soft sets with the Intra- two classifiers, the FNIB algorithm achieves better results on 16
class aggregation and Between-class dispersion (RFBSIB) [42], and and 18 datasets, respectively. From the perspective of dimen-
the feature selection algorithm based on the fuzzy dependency sionality reduction, the average number of features selected by
function of some existing robust fuzzy rough set models, i.e. FRS- the FNIB algorithm is not the smallest, while the highest average
FS [10], PFRS-FS [35], PGDFRS-FS [36] and SFRS-FS [29]. Note that classification accuracy can be achieved.
RFBSIB is based on the rough fuzzy bipolar soft set model [42],
and adopts the evaluation function of the proposed features in 5.2. Robustness analysis of NFRS
order to better evaluate the effectiveness of the proposed NFRS
model. The selected features using each algorithm are then eval- To analyze the robustness of our NFRS model and differences
uated on downstream classification tasks. The raw features are in the effect of noise distribution on models, we compare the
also evaluated as a baseline. To have a comprehensive evaluation, dependency degree with different fuzzy rough sets models on
we use two classifiers to evaluate the performance of the selected two two-dimensional datasets (as shown in Figs. 1(a) and 1(b))
features against the raw features, i.e., K-Nearest Neighbor (KNN, and real-world datasets. Note that the dependency degree of our
K=3) and Support Vector Machine (SVM). NFRS model is calculated by Eq. (13). In order to achieve this goal,
In this experiment, each feature selection algorithm is first we analyze the sensitivity curve of dependence degree calculated
applied to each dataset. We adopt 5-fold cross-validation on each with different fuzzy rough sets models (i.e., FRS [10], PFRS [35],
dataset in the experiments. For each dataset, data are randomly PGDFRS [36] and SFRS [29]). First, we add a new sample to the
divided into five subsets. Each of the five subsets is used as two two-dimensional datasets and change the location of this
test set and the remaining four subsets are used as training sample, which is (x, 5) (i.e., x = −5, −4.65, −4.3, −3.95, . . . , 30
set. That is, every feature selection algorithm is executed five in Dataset1, and x = −2, −1.88, −1.76, . . . , 10 in Dataset2). As
times on each dataset. Then, the features selected by each feature shown in Figs. 1(a) and 1(b), the region where x ≤ 4 belongs to
selection algorithm are used to build a classifier on training set. red class and vice versa. Consequently, if x ≤ 4, this sample is
The performance of the classifier built on the selected features is marked as blue class in order to make the newly added sample
evaluated on test data. We record and calculate the average and noise, and vice versa as red class. Then, we add some samples
standard deviations of the classification accuracy of each method to the two two-dimensional datasets, and some random attribute
in Tables 3–8, (i.e., Acc±std). Note that the average accuracy value on the two-dimensional space of the Figs. 1(a) and 1(b) as
indicates the trend of concentration of the classification accuracy attribute noise [19]. Finally, we add some class noise into two
and the standard deviation indicates the degree of dispersion of datasets by randomly selecting some samples from the datasets
the classification accuracy. to change their class [19]. We calculate the dependence degree
For simplicity, the parameter k was set to 10. We also ad- with different fuzzy rough models when the location and noise
just the number of selected features and two other parameters level of the samples changes. For real-world datasets, we add
of the algorithm to compare the optimal performance of these some samples with some random attribute value and some class
feature selection algorithm for each dataset. Then, the optimal value to real-world datasets as noise. The experimental results
classification performance of these feature selection algorithm is are illustrated in Fig. 3 (two two-dimensional datasets) and Fig. 4
9
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

Fig. 3. Dependency comparison of two visualization two-dimensional datasets.

Fig. 4. Comparison results in terms of dependency under different noise settings.

(four real-world datasets), where the vertical axis denotes the greater effect on the dependence degree when the location of the
fuzzy dependency degree with different fuzzy rough sets models noise sample changes by observing Dataset1. Furthermore, these
on datasets. dependence degree with different fuzzy rough set model decrease
These curves of the dependence degree reflect the robustness when the class noise level and attribute noise level increases in
performance of different fuzzy rough sets with noise samples Dataset1 and Dataset2. In fact, if the attribute noise (like x as
(as shown Figs. 3 and 4). The more robustness of fuzzy rough shown in Fig. 1) is increased, the dependence will increase as the
sets has a high degree of dependence and the curve is relatively fuzzy relation between samples decreases. This is the reason why
flat. First, the dependence degree calculated with different fuzzy these dependence degree with different fuzzy rough set model
rough set model changes significantly when the location of the increase slightly when the attribute noise level increases in real-
noise sample changes by observing FRS, PFRS, and PGDFRS. For world datasets. The randomly generated noise has a large number
Dataset1 and Dataset2, the curves of the dependence degree of samples like x. These curves show that the NFRS model has
calculated with the SFRS and NFRS models is relatively small good robustness performance for both attribute noise and class
when the location of the class-noisy sample changes. If the den- noise. Next, the robustness and effectiveness of the FNIB would
sity of the sample is more concentrated, the noise sample has a been verified in some real-world datasets.
10
X. Yang, H. Chen, T. Li et al.
Table 3
Classification performance in terms of accuracy (%) using classifier KNN on twenty-four datasets.
Datasets raw MRMR FCBF MIFS FRS-FS SFRS-FS PFRS-FS PGDFRS-FS RFBSIB FNIB
Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num
Glass 66.35 ± 2.25 9 71.96 ± 9.62 7 66.86 ± 6.13 9 66.86 ± 6.13 9 62.65 ± 7.88 6.6 66.86 ± 6.13 9 66.86 ± 6.13 9 66.86 ± 6.13 9 66.86 ± 6.13 9 70.56 ± 7.53 2
Sonar 81.25 ± 6.61 60 86.49 ± 3.85 39 84.13 ± 5.86 56 88.88 ± 6.64 47 79.83 ± 6.91 11.8 85.62 ± 4.97 43.4 84.58 ± 4.83 26.2 82.12 ± 6.29 10 85.07 ± 2.80 19 90.87 ± 2.65 35
Ecoli 85.10 ± 3.68 7 85.71 ± 2.26 7 85.71 ± 2.26 7 85.71 ± 2.26 7 85.95 ± 3.27 6 85.71 ± 2.26 6 85.71 ± 2.26 7 84.83 ± 4.05 6.6 85.71 ± 2.26 7 85.71 ± 2.26 5
Iono 85.19 ± 3.57 34 88.88 ± 5.30 7 88.30 ± 4.48 14 89.73 ± 6.36 18 89.74 ± 5.67 11.4 89.46 ± 4.58 16 88.03 ± 6.91 15.4 87.74 ± 5.13 21.2 91.44 ± 3.05 5 93.16 ± 3.97 4
Pima 73.96 ± 3.75 8 72.92 ± 1.01 8 74.23 ± 3.88 6 72.53 ± 2.81 8 72.53 ± 2.81 8 72.53 ± 2.81 8 72.53 ± 2.81 8 72.53 ± 2.81 8 72.53 ± 2.81 8 74.74 ± 3.20 2
Credit-A 83.48 ± 3.38 15 85.22 ± 3.28 15 85.22 ± 3.28 15 85.22 ± 3.28 15 85.80 ± 3.98 12.6 85.94 ± 3.56 12 85.37 ± 3.75 13 85.22 ± 3.28 13.6 85.22 ± 3.28 15 87.10 ± 3.37 9
Car 88.89 ± 2.72 6 92.88 ± 0.63 6 92.88 ± 0.63 6 92.88 ± 0.63 6 70.02 ± 0.11 2 79.29 ± 12.71 3.6 92.88 ± 0.63 6 83.51 ± 8.88 1.4 92.88 ± 0.63 6 92.88 ± 0.63 5
Wilt 95.14 ± 0.32 5 95.47 ± 0.29 5 96.80 ± 0.41 5 98.14 ± 0.15 3 94.98 ± 0.91 4 95.47 ± 0.29 5 95.47 ± 0.29 5 95.14 ± 0.32 5 98.31 ± 0.28 3 98.14 ± 0.15 1
German 69.90 ± 0.89 20 71.80 ± 1.15 15 70.60 ± 3.58 12 70.90 ± 2.43 20 71.90 ± 4.04 8 72.60 ± 1.56 7.6 69.40 ± 2.22 18.8 69.90 ± 1.78 19 69.40 ± 1.78 16 73.40 ± 3.15 12
Derma 96.99 ± 1.79 34 97.81 ± 0.77 28 96.71 ± 1.60 35 98.08 ± 1.58 29 96.72 ± 2.65 8 96.18 ± 1.78 19 97.26 ± 1.68 25.2 88.27 ± 5.09 14.4 96.98 ± 1.54 32 97.80 ± 2.09 28
Lymph 76.85 ± 8.84 18 85.04 ± 6.06 15 83.77 ± 6.02 12 80.54 ± 7.23 18 83.71 ± 5.32 7.6 81.66 ± 6.27 10 81.21 ± 6.51 16.8 79.75 ± 6.11 17.6 83.19 ± 5.00 18 85.78 ± 4.73 12
Wpbc 71.21 ± 1.50 33 78.78 ± 5.81 11 75.23 ± 6.57 33 75.23 ± 6.57 33 80.29 ± 6.01 12.4 78.79 ± 3.73 23.8 79.77 ± 8.58 16.4 77.31 ± 6.75 14 79.28 ± 4.21 6 82.32 ± 5.07 23
Zoo 91.05 ± 4.24 16 95.99 ± 5.50 16 94.04 ± 6.37 13 94.99 ± 5.03 15 92.93 ± 7.60 7.2 91.92 ± 2.94 4.4 94.09 ± 6.27 9.2 83.24 ± 1.75 3.2 95.04 ± 6.92 12 95.99 ± 5.50 12
Autos 54.49 ± 7.17 25 69.85 ± 11.11 6 71.34 ± 7.47 7 64.22 ± 6.86 24 70.11 ± 7.86 12.6 66.33 ± 12.02 17.4 66.23 ± 7.83 17 61.45 ± 5.95 13.6 74.02 ± 2.11 3 77.01 ± 8.23 4
Heart 78.89 ± 5.00 13 80.37 ± 8.03 9 81.48 ± 6.93 11 79.63 ± 7.05 11 81.48 ± 5.24 9.8 81.48 ± 5.40 5.6 79.63 ± 9.26 9.4 79.26 ± 8.43 12.6 80.37 ± 8.03 13 82.96 ± 3.56 5
Hepati 67.10 ± 4.21 19 69.03 ± 5.86 18 70.32 ± 7.36 9 67.10 ± 3.53 19 65.81 ± 6.29 13 70.32 ± 5.30 16.2 67.10 ± 2.70 17.2 65.81 ± 1.77 19 68.39 ± 2.70 18 72.90 ± 6.69 3
Parkin 93.33 ± 5.32 22 93.90 ± 5.75 6 94.37 ± 1.05 16 92.83 ± 2.07 22 95.37 ± 2.86 8.8 94.87 ± 2.57 11.2 95.37 ± 2.86 9.6 92.83 ± 2.07 20.6 93.37 ± 3.75 19 95.42 ± 4.12 16
SPECT 78.26 ± 4.01 22 79.77 ± 1.65 22 79.77 ± 1.65 22 79.77 ± 1.65 22 79.40 ± 0.21 2 78.28 ± 0.90 9.2 79.76 ± 1.75 10.8 79.77 ± 1.65 13.8 79.78 ± 0.78 22 80.14 ± 1.81 17
Sick 96.10 ± 0.95 29 96.47 ± 0.38 22 97.53 ± 0.63 9 96.39 ± 0.61 29 97.27 ± 0.48 2 96.18 ± 0.88 15.40 96.90 ± 0.70 19.4 93.88 ± 0.94 3.2 96.74 ± 0.53 24 97.35 ± 0.31 16
Wbc 96.42 ± 1.14 9 96.85 ± 1.39 9 96.85 ± 1.39 9 96.85 ± 1.39 9 96.42 ± 0.87 3.8 97.00 ± 1.37 8.8 97.14 ± 1.43 8 96.14 ± 0.83 9 97.28 ± 1.28 8 97.14 ± 1.33 6
Lung 84.80 ± 5.96 325 93.31 ± 6.73 102 87.77 ± 5.39 226 89.10 ± 3.47 208 84.80 ± 5.96 325 83.38 ± 8.24 263.8 70.45 ± 13.75 3.6 84.80 ± 5.96 325 90.81 ± 10.62 54 92.06 ± 8.44 155
Colon 75.77 ± 16.55 2000 83.97 ± 9.85 121 75.77 ± 16.55 361 79.10 ± 15.92 801 77.44 ± 12.28 4 77.56 ± 8.39 2 75.90 ± 10.93 12.6 67.56 ± 10.66 4.4 75.77 ± 16.55 301 82.18 ± 14.82 211
Yale 59.03 ± 6.37 1024 59.68 ± 6.54 881 58.63 ± 8.11 801 58.58 ± 7.29 1001 50.01 ± 7.93 11 57.15 ± 8.44 47.6 52.40 ± 10.49 50.2 36.27 ± 10.36 8.2 59.03 ± 6.37 1021 59.63 ± 5.86 666
ORL 86.25 ± 3.54 1024 86.00 ± 2.85 1001 85.75 ± 2.74 1001 86.50 ± 1.63 841 86.25 ± 3.54 16 86.00 ± 1.37 104.8 82.75 ± 4.54 46.8 60.00 ± 3.64 11.4 87.00 ± 1.90 1021 87.25 ± 2.24 1021
Avg. 80.66 ± 4.32 199.04 84.09 ± 4.40 99.00 83.09 ± 4.60 112.29 82.91 ± 4.27 133.96 81.31 ± 4.61 21.40 82.11 ± 4.52 27.91 81.53 ± 4.96 15.86 78.09 ± 4.61 24.33 83.52 ± 3.97 110.83 85.52 ± 4.24 94.58
11

Table 4
Classification performance in terms of accuracy (%) using classifier SVM on twenty-four datasets.
Datasets raw MRMR FCBF MIFS FRS-FS SFRS-FS PFRS-FS PGDFRS-FS RFBSIB FNIB
Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num
Glass 51.87 ± 9.36 9 56.85 ± 9.98 7 58.39 ± 6.05 9 54.91 ± 11.18 9 58.71 ± 8.61 7 56.81 ± 10.05 7.4 57.33 ± 10.55 7.8 54.91 ± 11.18 9 56.96 ± 6.12 8 59.59 ± 10.66 6
Sonar 77.38 ± 6.37 60 81.18 ± 5.88 39 80.71 ± 6.45 56 80.69 ± 6.29 47 78.33 ± 4.37 11.8 79.81 ± 4.39 32.4 79.33 ± 3.56 26.2 78.81 ± 8.55 10 80.69 ± 6.29 60 83.18 ± 2.91 26
Ecoli 81.82 ± 7.01 7 82.75 ± 5.22 7 82.42 ± 3.89 7 82.42 ± 3.89 7 82.75 ± 5.22 6 82.42 ± 3.89 6 82.42 ± 3.89 7 80.95 ± 6.51 0 82.42 ± 3.89 7 82.42 ± 3.89 5
Iono 88.03 ± 2.40 34 87.74 ± 3.16 7 87.45 ± 4.48 14 89.17 ± 2.99 18 88.04 ± 2.11 11.4 89.45 ± 2.98 24.6 88.03 ± 2.99 15.4 88.02 ± 4.72 6.6 87.45 ± 4.48 34 89.18 ± 2.96 27
Pima 76.55 ± 5.18 8 77.22 ± 2.15 8 77.35 ± 1.62 6 76.83 ± 3.09 8 76.83 ± 3.09 8 76.83 ± 3.09 8 76.83 ± 3.09 8 76.83 ± 3.09 21.2 77.22 ± 3.30 7 77.74 ± 1.56 4
Credit-A 85.51 ± 2.71 15 85.51 ± 2.32 16 85.51 ± 2.32 15 85.51 ± 2.32 15 85.51 ± 2.32 9.8 85.51 ± 2.32 9.2 85.51 ± 2.32 11.6 85.51 ± 2.32 8 85.51 ± 2.32 14 85.51 ± 2.32 1
Car 83.39 ± 0.80 6 84.20 ± 1.57 6 83.85 ± 1.76 6 84.32 ± 1.46 6 77.43 ± 1.09 2 80.50 ± 2.11 4.4 83.91 ± 1.69 6 81.19 ± 2.48 13.6 84.32 ± 1.46 5 83.56 ± 0.87 5
Wilt 94.61 ± 0.67 5 94.61 ± 0.04 5 94.61 ± 0.04 5 94.61 ± 0.04 3 94.61 ± 0.04 4 94.61 ± 0.04 1 94.61 ± 0.04 5 94.61 ± 0.67 5 94.61 ± 0.04 1 94.61 ± 0.04 1

Knowledge-Based Systems 250 (2022) 109092


German 73.80 ± 3.63 20 74.20 ± 1.60 15 72.90 ± 2.38 12 73.60 ± 1.98 20 72.90 ± 2.38 20 73.30 ± 2.68 18.8 72.70 ± 2.20 18.8 72.90 ± 2.38 20 72.90 ± 2.38 20 74.90 ± 2.33 12
Derma 97.00 ± 2.24 34 98.35 ± 1.16 28 97.82 ± 1.82 34 98.09 ± 0.74 29 96.46 ± 2.98 8 97.55 ± 1.49 19.6 97.82 ± 1.82 25.2 89.63 ± 3.22 14.4 98.09 ± 1.54 33 98.09 ± 0.74 28
Lymph 80.97 ± 9.50 18 87.23 ± 4.12 15 86.59 ± 4.95 12 85.19 ± 3.53 18 83.87 ± 5.69 7.6 86.59 ± 6.03 15.6 85.88 ± 4.74 16 84.50 ± 2.67 17.6 85.19 ± 3.53 18 87.97 ± 5.72 12
Wpbc 77.21 ± 6.76 33 78.82 ± 3.61 11 78.32 ± 3.99 33 78.32 ± 3.99 33 76.27 ± 1.18 5.6 77.81 ± 3.97 22.4 76.78 ± 1.88 20.4 78.32 ± 3.99 33 78.32 ± 3.99 33 78.82 ± 3.61 23
Zoo 95.99 ± 4.19 16 97.10 ± 4.29 16 94.14 ± 7.81 13 94.14 ± 7.81 15 93.98 ± 6.36 7.2 93.08 ± 4.55 7.2 94.09 ± 6.27 9.2 84.19 ± 1.47 5.4 95.04 ± 5.84 15 96.09 ± 6.23 12
Autos 60.49 ± 6.31 25 61.93 ± 7.91 6 61.93 ± 7.91 7 62.42 ± 7.12 24 61.42 ± 9.64 12.6 62.42 ± 9.48 17 61.93 ± 7.54 17.8 60.95 ± 7.75 21.4 62.42 ± 7.12 23 63.40 ± 6.02 4
Heart 84.07 ± 4.83 13 85.19 ± 4.54 9 84.81 ± 4.01 11 85.19 ± 5.24 11 84.44 ± 5.65 9.8 85.19 ± 5.71 8.4 84.81 ± 4.97 11.2 84.07 ± 5.94 13 84.44 ± 6.36 13 86.30 ± 4.83 5
Hepati 70.32 ± 6.61 19 70.32 ± 5.30 18 70.32 ± 5.30 9 71.61 ± 9.78 19 70.97 ± 5.10 8.6 70.32 ± 6.20 13.4 71.61 ± 5.77 17.4 70.97 ± 6.45 17.4 70.97 ± 3.95 16 72.26 ± 3.68 3
Parkin 86.67 ± 6.88 22 87.72 ± 6.05 6 86.69 ± 4.12 16 87.20 ± 5.05 22 87.20 ± 3.52 4.2 87.22 ± 6.43 9.4 87.20 ± 5.05 11.4 87.20 ± 3.52 8 87.20 ± 5.05 19 88.24 ± 6.35 16
SPECT 81.30 ± 4.64 22 83.53 ± 3.31 23 83.92 ± 4.75 23 81.65 ± 3.61 22 79.40 ± 0.21 2 83.52 ± 4.85 9.2 81.66 ± 3.31 12.8 81.28 ± 1.80 14.4 83.54 ± 3.95 16 82.78 ± 2.70 17
Sick 93.88 ± 0.94 29 93.88 ± 0.06 22 93.88 ± 0.06 9 93.88 ± 0.06 29 93.88 ± 0.06 2 93.88 ± 0.94 15.4 93.88 ± 0.06 22.4 93.88 ± 0.94 3.2 93.88 ± 0.06 1 93.88 ± 0.06 16
Wbc 96.57 ± 0.31 9 96.57 ± 2.16 9 96.57 ± 1.85 9 96.57 ± 1.85 9 96.86 ± 2.06 5.8 96.86 ± 2.75 4.8 96.71 ± 1.72 8.8 96.57 ± 0.31 9 96.57 ± 1.85 8 96.85 ± 1.99 6
Lung 87.76 ± 7.24 325 91.77 ± 5.01 102 89.09 ± 7.99 226 87.76 ± 6.47 208 87.76 ± 7.24 325 87.76 ± 7.24 263.8 77.30 ± 15.62 3.6 87.76 ± 7.24 325 90.52 ± 7.25 63 93.02 ± 5.08 72
Colon 79.23 ± 15.40 2000 82.44 ± 11.66 121 83.97 ± 7.89 361 80.90 ± 14.09 801 74.36 ± 5.79 4 80.51 ± 7.77 2 75.64 ± 13.41 16.4 68.85 ± 17.65 4.4 83.97 ± 7.89 441 83.97 ± 7.89 451
Yale 71.24 ± 7.62 1024 72.86 ± 3.53 881 71.88 ± 9.64 801 70.66 ± 7.23 1001 56.47 ± 7.32 11 62.28 ± 7.91 41.8 60.31 ± 8.05 50.2 34.85 ± 6.97 8.2 72.23 ± 4.14 781 72.82 ± 4.50 971
ORL 95.00 ± 3.54 1024 94.75 ± 4.37 1001 94.50 ± 3.49 1001 94.75 ± 3.58 841 95.00 ± 3.54 16 91.00 ± 3.47 104.8 87.75 ± 3.89 57.2 53.75 ± 6.79 11.4 95.25 ± 3.69 1001 95.25 ± 3.69 1001
Avg. 82.11 ± 5.21 199.04 83.61 ± 4.12 99.08 83.23 ± 4.36 112.29 82.93 ± 4.72 133.96 81.39 ± 3.98 21.23 82.30 ± 4.60 27.78 81.42 ± 4.77 16.91 77.94 ± 4.94 25.03 83.32 ± 4.02 109.88 84.18 ± 3.78 113.50
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

Fig. 5. Performance comparison of average classification accuracy vs. the values of α and λ.

5.3. The robustness analysis of the FNIB algorithm Parkin and Wpbc on original datasets, 20% class noise level, and
20% attribute noise level, respectively. From Fig. 5, we have the
In order to analyze the robustness of the FNIB algorithm, we following observations:
contaminate the label and attribute of all train subsets with 20%
(1) The values of parameters δ and λ have influence on the
noise levels. Its method for generating the contaminated train
classification accuracy on four datasets in the proposed
subsets is the same as that in Section 5.2. The subset of data used
FNIB algorithm. The parameter δ has certain impact on
for testing is not contaminated. In each dataset, the remaining
the optimal classification accuracy on all datasets. It also
experimental protocol is shown in Section 5.1.
demonstrates that the number of outlier samples con-
Tables 5–8 show the classification performance comparison
trolled by λ has an effect on the classification performance.
of different feature selection algorithm on test set with KNN
(2) For the same dataset, observing the contour map under the
and SVM, respectively. By comparing Tables 5–8 with Tables 3–
3D coloring surface, the contour density roughly shows an
4, the classification accuracy is reduced when there is noise
increasing relationship between the original datasets, 20%
in dataset. However, for the average classification accuracy on
attribute noise level, and 20% class noise level. It can be
datasets with class noise and attribute noise, the FNIB algorithm
seen from Fig. 5 that there are relatively more contours
obtains higher classification performance than other algorithms
parallel to the λ axis, so overall δ has a greater effect
in terms of accuracy using SVM and KNN. The results demonstrate on classification performance than λ within the range of
the FNIB algorithm is robust to noise with class and attribute, parameters studied.
due to the reason that the proposed algorithm considers the (3) When the range of δ is [0.1, 0.4] and the range of λ is
different influence of intra-class samples, boundary samples, and [1, 1.8] (as known from the contour map trend), the classi-
outlier samples with respect to the target sample, and the uncer- fication accuracy of the features selected is promising and
tainty measure information with the intra-class aggregation and the optimal classification accuracy can be achieved in most
between-class dispersion, simultaneously. datasets. This shows that the feature selected by the FNIB
algorithm correlates with the class.
5.4. Sensitivity analysis of FNIB
6. Conclusions
The proposed FNIB algorithm includes two parameter i.e., a
kernel parameter δ and a predefined parameter λ, which may Feature selection method has been widely used in cluster
affect the effectiveness of the algorithm FNIB, we conduct two learning, pattern recognition, and classification learning. This pa-
parameter analysis on this experiment. In order to verify the per presents a novel robust feature selection approach based on
range of parameters and its impact, we perform a grid search the noise-aware fuzzy rough set model for classification, called
(from 0.1 to 1.0 in a step of 0.1 for δ and from 1 to 3 in a the feature selection algorithm based on our NFRS model with
step of 0.2 for λ). Using the average classification accuracy under the intra-class aggregation and between-class dispersion (FNIB).
the corresponding parameters, the 3D shaded surface is created FNIB couples a robust fuzzy rough set model (i.e., the noise-
and a contour map is created under the 3D shaded surface. The aware fuzzy rough set model), the intra-class aggregation, the
experimental results of FNIB on four datasets are shown in Fig. 5. between-class dispersion, and feature selection into a joint frame-
We record the comparison results on the datasets Hepati, Lymph, work. The NFRS model considers the distribution information of
12
X. Yang, H. Chen, T. Li et al.
Table 5
Classification performance in terms of accuracy (%) using classifier KNN on twenty-four datasets (Attribute noise level is 20%).
Datasets raw MRMR FCBF MIFS FRS-FS SFRS-FS PFRS-FS PGDFRS-FS RFBSIB FNIB
Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num
Glass 50.50 ± 6.26 9 57.32 ± 8.27 7 57.30 ± 9.82 7 54.14 ± 6.43 10 54.69 ± 4.68 8.4 55.04 ± 7.19 8.4 54.59 ± 6.93 8.8 54.14 ± 6.43 9 54.14 ± 6.43 9 61.55 ± 6.34 5
Sonar 71.20 ± 5.02 60 75.48 ± 8.55 42 72.11 ± 7.56 40 77.87 ± 4.38 46 69.76 ± 10.20 5 73.10 ± 9.33 45.4 73.63 ± 6.66 34.2 71.21 ± 8.63 8.8 74.48 ± 6.96 32 77.43 ± 3.44 22
Ecoli 64.07 ± 3.38 7 71.93 ± 3.46 6 67.12 ± 3.98 8 67.12 ± 3.98 8 69.80 ± 4.43 6 67.12 ± 3.98 7 67.12 ± 3.98 7 67.73 ± 4.12 6.8 67.22 ± 6.40 6 71.43 ± 4.24 4
Iono 77.77 ± 3.12 34 84.32 ± 3.95 4 82.90 ± 4.87 20 83.18 ± 7.62 8 79.75 ± 4.29 5.4 83.19 ± 4.78 11.2 80.06 ± 3.61 23.4 78.34 ± 3.87 12.2 82.62 ± 4.24 6 85.76 ± 3.87 4
Pima 65.24 ± 2.21 8 69.66 ± 2.82 8 69.01 ± 0.87 8 68.23 ± 2.14 8 67.84 ± 2.78 8 67.84 ± 2.78 8 67.84 ± 2.78 8 67.84 ± 2.78 8 69.66 ± 1.29 5 70.83 ± 1.54 4
Credit-A 68.41 ± 3.32 15 69.86 ± 3.59 8 70.30 ± 3.77 15 69.42 ± 1.87 8 69.27 ± 2.93 9.4 68.70 ± 3.09 10.4 68.41 ± 3.31 14 69.14 ± 2.86 14.4 70.30 ± 3.77 14 70.87 ± 1.71 10
Car 72.97 ± 2.48 6 72.51 ± 1.35 6 71.70 ± 0.82 7 72.51 ± 1.35 6 71.70 ± 0.82 6 71.70 ± 0.82 6 71.70 ± 0.82 6 71.70 ± 0.82 6 71.70 ± 0.82 6 72.97 ± 2.77 4
Wilt 94.03 ± 0.67 5 94.07 ± 0.40 5 94.11 ± 0.39 4 94.07 ± 0.40 5 94.07 ± 0.40 5 94.07 ± 0.40 5 94.07 ± 0.40 5 94.03 ± 0.75 5 94.07 ± 0.40 5 94.19 ± 0.87 3
German 66.50 ± 1.76 20 67.70 ± 5.33 4 66.20 ± 3.83 2 65.90 ± 2.90 21 65.90 ± 2.90 20 66.80 ± 3.29 19 65.90 ± 3.25 1 66.00 ± 3.22 14.6 65.90 ± 2.90 20 69.40 ± 2.75 4
Derma 74.31 ± 4.81 34 76.50 ± 2.95 33 74.04 ± 1.39 22 74.57 ± 3.94 33 73.47 ± 4.10 34 70.21 ± 6.38 18.6 63.14 ± 8.92 10 73.47 ± 4.10 34 74.58 ± 1.71 33 76.21 ± 2.50 26
Lymph 62.53 ± 5.19 18 69.68 ± 7.74 18 68.74 ± 7.91 14 68.23 ± 9.66 19 69.41 ± 13.65 12.8 68.78 ± 3.55 5 65.98 ± 6.19 9.2 67.26 ± 9.63 15 70.42 ± 7.21 14 74.53 ± 12.80 2
Wpbc 70.18 ± 7.17 33 76.78 ± 5.63 23 74.76 ± 2.96 27 75.28 ± 8.45 18 78.81 ± 4.07 14 77.28 ± 4.62 30 76.77 ± 6.21 24 74.79 ± 6.19 24.8 76.85 ± 8.19 22 78.32 ± 8.34 8
Zoo 61.30 ± 20.07 16 70.25 ± 5.09 12 70.58 ± 5.15 12 68.71 ± 10.97 17 68.71 ± 10.97 16 69.04 ± 9.81 10.8 57.24 ± 8.99 5.6 68.71 ± 10.97 16 69.51 ± 8.77 15 72.61 ± 12.23 14
Autos 31.71 ± 9.13 25 52.58 ± 7.31 3 51.03 ± 4.54 12 37.37 ± 7.52 26 50.55 ± 5.50 6 52.51 ± 3.18 1 45.63 ± 3.91 1 40.14 ± 10.78 13.8 54.74 ± 8.76 5 61.41 ± 6.37 1
Heart 64.07 ± 3.63 13 68.15 ± 6.33 6 71.48 ± 5.49 10 65.56 ± 7.36 8 70.00 ± 3.56 9.6 66.67 ± 6.14 4 64.07 ± 6.49 10 65.56 ± 3.84 11 71.48 ± 4.26 9 72.22 ± 2.27 3
Hepati 66.45 ± 9.70 19 67.74 ± 8.83 20 69.68 ± 8.10 18 67.74 ± 8.83 20 67.74 ± 3.23 6.8 66.45 ± 14.35 10.6 67.74 ± 4.56 16 69.03 ± 7.77 18.2 67.74 ± 8.83 19 72.90 ± 9.29 11
Parkin 84.62 ± 5.85 22 85.11 ± 3.45 21 84.56 ± 5.09 23 85.06 ± 4.83 19 83.07 ± 1.45 11.6 85.62 ± 1.58 19.8 83.60 ± 3.83 14.4 84.56 ± 5.09 22 84.57 ± 5.05 21 87.70 ± 5.51 6
SPECT 77.88 ± 5.03 22 80.51 ± 3.94 8 79.40 ± 0.21 2 79.78 ± 0.78 8 77.16 ± 2.42 22 77.16 ± 2.42 22 77.52 ± 1.99 10 77.16 ± 2.42 22 79.40 ± 0.21 1 79.40 ± 0.21 1
Sick 93.29 ± 0.82 29 94.94 ± 0.71 5 94.38 ± 0.42 10 93.27 ± 0.52 15 94.38 ± 0.65 3 93.43 ± 0.79 21.2 93.35 ± 0.32 17.4 93.85 ± 0.92 23.8 93.58 ± 0.26 20 94.22 ± 0.48 7
Wbc 92.13 ± 2.37 9 93.14 ± 3.29 8 92.70 ± 1.46 7 92.42 ± 3.01 10 92.28 ± 3.24 9 92.28 ± 3.24 9 93.14 ± 3.69 8.6 96.57 ± 0.93 9 92.99 ± 3.04 8 93.42 ± 2.91 6
Lung 82.29 ± 11.73 325 89.10 ± 3.10 56 87.57 ± 3.04 151 87.57 ± 3.04 221 86.23 ± 4.84 325 77.95 ± 14.84 199.6 57.81 ± 14.20 15.2 82.29 ± 11.73 325 89.48 ± 9.54 215 90.52 ± 7.25 73
Colon 72.69 ± 16.81 2000 83.97 ± 8.81 121 75.77 ± 14.80 121 82.31 ± 12.88 881 74.23 ± 10.61 3.8 75.90 ± 9.21 2 75.77 ± 5.99 13.6 56.79 ± 19.29 4.4 77.31 ± 12.50 281 82.18 ± 10.75 301
Yale 60.78 ± 5.63 1024 58.79 ± 5.69 921 59.72 ± 6.26 961 60.21 ± 4.59 1001 45.93 ± 10.34 11.4 47.88 ± 13.07 39.2 47.50 ± 5.26 59.8 30.79 ± 7.66 8.6 61.34 ± 4.71 1021 60.78 ± 5.63 1021
ORL 85.25 ± 2.05 1024 86.50 ± 2.29 1001 85.75 ± 0.61 1001 86.50 ± 0.94 921 85.25 ± 2.05 15.4 84.75 ± 5.55 122.6 79.50 ± 4.01 22.6 40.75 ± 6.41 10.6 86.00 ± 1.05 1001 85.75 ± 1.43 961
Avg. 71.26 ± 5.76 199.04 75.69 ± 4.70 97.79 74.62 ± 4.31 104.25 74.04 ± 4.93 139.08 73.33 ± 4.75 23.90 73.06 ± 5.60 26.49 70.50 ± 4.85 14.37 69.24 ± 5.88 26.79 75.00 ± 4.89 116.17 77.36 ± 4.81 104.21
13

Table 6
Classification performance in terms of accuracy (%) using classifier SVM on twenty-four datasets (Attribute noise level is 20%).
raw MRMR FCBF MIFS FRS-FS SFRS-FS PFRS-FS PGDFRS-FS RFBSIB FNIB
Datasets
Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num
Glass 41.15 ± 6.03 9 53.22 ± 6.17 7 51.40 ± 3.99 7 50.91 ± 6.55 10 52.27 ± 7.00 8.4 51.82 ± 7.08 8.4 52.27 ± 7.00 8.8 50.91 ± 6.55 9 51.36 ± 6.55 8 53.71 ± 5.15 4
Sonar 68.28 ± 9.29 60 75.93 ± 4.95 42 73.64 ± 6.57 40 74.56 ± 5.82 46 69.79 ± 6.18 12.2 70.23 ± 8.47 45.4 69.82 ± 7.11 36.8 72.18 ± 5.09 8.8 73.64 ± 6.57 60 76.44 ± 6.39 7
Ecoli 61.60 ± 1.56 7 66.96 ± 1.30 6 66.93 ± 4.08 8 66.93 ± 4.08 8 66.93 ± 4.08 7 66.93 ± 4.08 7 66.93 ± 4.08 7 66.93 ± 4.08 6.8 66.93 ± 4.08 7 67.54 ± 1.61 4
Iono 81.48 ± 3.02 34 82.90 ± 2.30 4 81.76 ± 5.11 20 83.75 ± 3.77 8 82.04 ± 3.33 12.8 82.62 ± 4.34 25.2 82.62 ± 2.56 25.8 80.91 ± 4.68 12.2 81.75 ± 4.27 29 84.03 ± 4.49 27
Pima 71.36 ± 1.76 8 73.18 ± 1.11 8 72.92 ± 0.74 8 72.27 ± 0.74 8 71.87 ± 1.82 8 71.87 ± 1.82 8 71.87 ± 1.82 8 71.87 ± 1.82 8 73.18 ± 0.84 6 73.44 ± 0.66 3
Credit-A 70.58 ± 4.43 15 70.58 ± 2.37 8 70.58 ± 2.37 15 70.58 ± 2.37 8 70.58 ± 2.37 15 70.58 ± 2.37 13 70.58 ± 2.37 14 70.58 ± 2.37 14.4 70.58 ± 2.37 14 70.58 ± 2.37 12
Car 70.02 ± 2.18 6 70.02 ± 0.11 6 70.02 ± 0.11 7 70.02 ± 0.11 6 70.02 ± 2.18 6 70.02 ± 0.11 6 70.02 ± 0.11 1 70.02 ± 0.11 6 70.02 ± 0.11 1 70.02 ± 2.44 1
Wilt 94.61 ± 0.60 5 94.61 ± 0.04 5 94.61 ± 0.04 4 94.61 ± 0.04 5 94.61 ± 0.04 5 94.61 ± 0.04 1 94.61 ± 0.04 4.8 94.61 ± 0.67 5 94.61 ± 0.04 1 94.61 ± 0.67 1

Knowledge-Based Systems 250 (2022) 109092


German 70.00 ± 1.45 20 70.00 ± 0.00 4 70.00 ± 0.00 2 70.00 ± 0.00 21 70.00 ± 0.00 9.6 70.00 ± 0.00 1 70.00 ± 0.00 1 70.00 ± 0.00 14.6 70.00 ± 0.00 1 70.00 ± 0.00 4
Derma 80.58 ± 4.06 34 80.85 ± 4.33 33 80.05 ± 3.80 22 81.13 ± 1.97 33 80.58 ± 4.06 34 76.52 ± 5.63 18.8 80.58 ± 11.03 10 79.76 ± 5.28 34 80.88 ± 5.35 32 83.86 ± 4.05 26
Lymph 76.14 ± 4.55 18 77.80 ± 7.25 18 79.11 ± 6.07 14 75.49 ± 6.86 19 76.14 ± 4.55 12.8 74.87 ± 5.38 14.4 78.42 ± 6.69 6.4 72.96 ± 11.50 15 75.49 ± 6.86 18 80.47 ± 7.29 2
Wpbc 74.69 ± 5.59 33 76.78 ± 1.88 23 76.28 ± 2.65 27 76.28 ± 2.65 18 76.27 ± 1.18 5 76.28 ± 2.65 26.6 76.78 ± 1.88 26.4 76.28 ± 2.65 31.8 76.28 ± 2.65 33 76.78 ± 1.88 8
Zoo 69.36 ± 13.05 16 69.39 ± 10.71 12 69.92 ± 11.45 12 68.19 ± 9.65 17 68.19 ± 9.65 16 71.11 ± 8.01 10.8 57.61 ± 6.53 4.4 68.19 ± 9.65 16 69.92 ± 10.82 15 78.19 ± 6.91 14
Autos 46.34 ± 8.02 25 50.28 ± 5.32 3 50.28 ± 5.32 12 50.28 ± 6.52 26 50.77 ± 5.99 6 50.78 ± 7.05 20 50.24 ± 5.38 13 50.28 ± 5.32 25 51.27 ± 4.08 24 53.21 ± 5.31 1
Heart 73.70 ± 5.42 13 74.44 ± 4.22 6 74.44 ± 4.22 10 74.44 ± 4.22 8 74.44 ± 4.22 9.6 74.44 ± 4.22 12.8 73.70 ± 5.30 12.6 74.44 ± 4.22 13 74.44 ± 4.22 13 75.56 ± 6.06 3
Hepati 66.45 ± 5.62 19 70.97 ± 5.59 20 70.97 ± 5.59 18 70.97 ± 5.59 20 70.97 ± 6.03 6.8 70.97 ± 5.59 6.2 70.97 ± 5.59 12.2 70.32 ± 5.30 15 69.68 ± 4.89 16 72.26 ± 7.07 11
Parkin 85.13 ± 8.17 22 85.14 ± 4.89 21 84.58 ± 4.95 23 85.65 ± 5.27 19 85.15 ± 4.83 11.6 85.62 ± 5.39 19.6 85.66 ± 4.17 15.6 85.10 ± 5.98 21.4 84.58 ± 4.95 22 86.18 ± 5.22 6
SPECT 78.66 ± 1.82 22 79.40 ± 0.21 8 79.40 ± 0.21 2 79.40 ± 0.21 8 78.66 ± 1.82 22 79.40 ± 0.21 6.4 79.40 ± 0.21 10 77.89 ± 6.03 22 79.77 ± 0.97 12 80.13 ± 4.81 1
Sick 93.88 ± 0.84 29 93.88 ± 0.06 5 93.88 ± 0.06 10 93.88 ± 0.06 15 93.88 ± 0.06 3 93.88 ± 0.94 16.8 93.88 ± 0.06 16.6 93.88 ± 0.94 23.8 93.88 ± 0.06 1 93.88 ± 0.06 7
Wbc 93.42 ± 1.25 9 93.85 ± 3.59 8 93.42 ± 3.47 7 93.42 ± 3.47 10 93.42 ± 3.47 9 93.42 ± 3.47 9 93.57 ± 3.64 8.6 96.14 ± 1.39 9 93.57 ± 3.34 8 93.99 ± 3.44 6
Lung 86.23 ± 4.84 325 89.09 ± 7.99 196 84.88 ± 10.76 206 84.98 ± 2.01 281 82.29 ± 11.73 325 72.86 ± 16.04 199.6 68.77 ± 13.70 15.2 86.23 ± 4.84 325 86.78 ± 12.38 119 89.27 ± 9.45 98
Colon 80.64 ± 7.33 2000 80.64 ± 6.55 121 80.64 ± 3.90 121 82.18 ± 9.61 881 72.69 ± 12.20 3.8 85.64 ± 10.40 2.6 80.64 ± 19.14 12.4 58.33 ± 14.33 4.4 80.64 ± 4.36 401 83.72 ± 6.14 36
Yale 71.87 ± 8.12 1024 72.42 ± 7.32 921 71.87 ± 7.26 961 71.88 ± 8.40 1001 49.44 ± 13.29 11.4 55.63 ± 4.82 34.8 54.02 ± 6.94 52.6 37.07 ± 9.05 8.6 71.87 ± 8.12 961 72.42 ± 6.98 961
ORL 92.75 ± 3.89 1024 92.75 ± 3.89 1001 92.75 ± 3.00 1001 92.75 ± 3.89 921 92.75 ± 3.89 15.4 89.00 ± 4.87 115.6 84.50 ± 3.60 61.2 37.75 ± 5.18 10.6 92.75 ± 3.58 961 92.75 ± 3.58 961
Avg. 74.96 ± 4.71 199.04 76.88 ± 3.84 103.63 76.43 ± 3.99 106.54 76.44 ± 3.91 141.58 74.74 ± 4.75 23.98 74.96 ± 4.71 26.21 74.06 ± 4.96 16.02 71.36 ± 4.88 27.48 76.41 ± 4.23 115.13 78.04 ± 4.25 91.83
X. Yang, H. Chen, T. Li et al.
Table 7
Classification performance in terms of accuracy (%) using classifier KNN on twenty-four datasets (Class noise level is 20%).
Datasets raw MRMR FCBF MIFS FRS-FS SFRS-FS PFRS-FS PGDFRS-FS RFBSIB FNIB
Acc±std num Acc±std num Acc±std num Acc±std num Acc±std Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num
Glass 60.82 ± 10.99 9 62.68 ± 9.16 7 64.91 ± 8.15 5 62.67 ± 5.79 9 62.67 ± 5.79 7.6 62.67 ± 5.79 9 62.67 ± 5.79 9 63.14 ± 5.56 9 63.61 ± 11.44 6 67.74 ± 4.54 4
Sonar 73.61 ± 8.53 60 78.87 ± 5.14 40 75.50 ± 7.09 55 78.79 ± 5.49 47 77.85 ± 9.24 13.8 77.40 ± 5.31 41 75.39 ± 6.99 24.6 73.58 ± 3.96 14.4 76.45 ± 5.20 50 82.20 ± 5.83 19
Ecoli 82.89 ± 5.33 7 83.20 ± 3.09 7 83.20 ± 3.09 8 83.20 ± 3.09 8 83.20 ± 3.09 6.2 83.20 ± 3.09 6 83.20 ± 3.09 7 74.38 ± 11.54 5.4 83.20 ± 3.09 7 83.20 ± 3.09 5
Iono 74.08 ± 5.21 34 80.06 ± 3.86 23 76.92 ± 2.76 8 80.06 ± 8.12 11 79.21 ± 6.49 10.8 79.48 ± 4.60 11.4 80.04 ± 6.02 17 75.22 ± 4.34 30.4 82.04 ± 2.67 6 83.19 ± 6.66 4
Pima 67.58 ± 3.09 8 67.58 ± 2.70 4 68.23 ± 4.40 5 63.81 ± 4.52 9 63.81 ± 4.52 8 63.81 ± 4.52 8 63.81 ± 4.52 8 63.81 ± 4.52 8 63.81 ± 4.52 8 69.27 ± 3.06 1
Credit-A 75.51 ± 1.91 15 76.09 ± 3.97 8 76.08 ± 2.91 16 76.08 ± 2.91 16 76.80 ± 2.37 15 77.53 ± 3.16 12.2 76.08 ± 3.21 12.8 77.39 ± 2.14 12.2 76.08 ± 2.91 15 76.95 ± 2.22 10
Car 84.67 ± 1.78 6 86.63 ± 1.26 6 84.08 ± 2.90 7 86.63 ± 1.26 6 84.67 ± 1.78 6 84.08 ± 2.90 6 81.82 ± 7.77 5.2 84.08 ± 2.90 6 84.08 ± 2.90 6 92.13 ± 0.86 5
Wilt 83.92 ± 1.26 5 84.38 ± 1.34 5 87.46 ± 0.79 5 87.58 ± 0.98 4 84.38 ± 1.34 5 84.38 ± 1.34 5 83.74 ± 1.32 4.6 84.38 ± 1.34 5 86.71 ± 1.47 4 97.95 ± 0.29 2
German 62.50 ± 1.61 20 66.20 ± 4.31 3 64.40 ± 3.13 12 67.60 ± 2.16 4 63.70 ± 4.96 8.4 64.30 ± 4.88 10.6 63.00 ± 4.34 1 62.80 ± 1.64 20 62.80 ± 2.05 17 71.90 ± 2.53 1
Derma 86.90 ± 4.25 34 89.90 ± 2.77 29 87.14 ± 3.24 26 90.71 ± 2.04 29 86.06 ± 6.66 15.4 86.89 ± 2.05 25 87.74 ± 8.68 10.8 86.10 ± 5.94 21 89.87 ± 2.73 29 90.16 ± 3.65 19
Lymph 68.38 ± 12.42 18 79.77 ± 8.06 4 75.85 ± 7.28 12 71.50 ± 2.93 11 73.62 ± 2.70 8.4 72.71 ± 1.36 15.8 70.51 ± 5.90 14.8 73.45 ± 8.26 16.6 73.29 ± 6.79 13 79.45 ± 7.50 4
Wpbc 65.67 ± 4.32 33 71.64 ± 8.39 10 67.72 ± 6.03 7 66.65 ± 7.97 34 71.17 ± 8.28 13.2 69.68 ± 8.48 16.2 69.64 ± 7.40 14.4 67.65 ± 8.40 30 66.65 ± 7.97 33 72.78 ± 7.38 19
Zoo 90.05 ± 4.53 16 92.09 ± 8.07 17 93.94 ± 6.73 11 93.04 ± 6.57 16 93.13 ± 5.58 9.8 91.88 ± 5.88 9.8 94.04 ± 6.37 12.8 92.09 ± 8.07 13.4 92.09 ± 8.07 15 93.08 ± 5.56 13
Autos 54.81 ± 14.42 25 65.89 ± 4.17 10 65.77 ± 4.62 6 61.99 ± 12.00 24 63.64 ± 10.82 9.6 63.70 ± 11.04 16.6 65.06 ± 11.50 16.6 62.34 ± 12.22 21.8 61.53 ± 12.06 25 68.90 ± 5.75 6
Heart 71.11 ± 3.99 13 75.93 ± 8.38 6 74.44 ± 0.83 7 72.59 ± 9.02 8 71.48 ± 10.11 7 73.70 ± 9.57 5.6 72.22 ± 9.44 11.6 71.48 ± 10.11 12.8 71.48 ± 11.39 12 76.67 ± 2.11 3
Hepati 58.71 ± 3.76 19 67.10 ± 4.21 8 60.65 ± 4.78 9 61.94 ± 6.99 18 62.58 ± 5.40 14 61.29 ± 2.28 10.8 61.94 ± 5.30 14.2 61.29 ± 5.10 18.4 59.35 ± 3.68 19 70.97 ± 6.03 2
Parkin 89.74 ± 3.24 22 90.78 ± 1.31 21 92.29 ± 1.91 13 90.24 ± 3.84 23 90.78 ± 3.84 9.6 90.79 ± 3.76 13.4 91.82 ± 3.70 12.8 90.24 ± 3.84 22 90.24 ± 3.84 22 92.31 ± 4.80 18
SPECT 69.24 ± 6.64 22 77.50 ± 4.16 6 79.40 ± 0.21 2 79.40 ± 0.21 2 69.24 ± 6.64 22 70.06 ± 5.83 9.8 66.61 ± 7.90 11.2 65.49 ± 7.12 22 77.90 ± 2.51 3 79.40 ± 0.21 2
Sick 86.37 ± 0.87 29 86.48 ± 0.91 22 93.88 ± 0.06 2 93.88 ± 0.06 2 85.95 ± 0.92 3 93.74 ± 0.91 23.2 87.35 ± 1.64 12.6 93.88 ± 0.94 6.6 93.88 ± 0.06 1 88.55 ± 2.55 6
Wbc 82.84 ± 4.50 9 89.42 ± 3.24 8 87.70 ± 5.27 7 86.41 ± 5.59 9 84.11 ± 5.82 9 84.97 ± 5.06 7 84.25 ± 5.96 9 96.14 ± 0.83 9 91.28 ± 3.04 6 90.13 ± 2.21 4
Lung 76.79 ± 5.78 325 82.71 ± 9.79 116 78.68 ± 12.70 301 76.90 ± 10.29 321 76.79 ± 5.78 325 76.79 ± 5.78 325 67.61 ± 9.81 3.6 76.79 ± 5.78 325 80.12 ± 14.35 148 85.09 ± 5.35 210
Colon 68.97 ± 18.53 2000 75.77 ± 5.99 321 69.23 ± 7.38 401 73.97 ± 9.60 601 77.05 ± 17.35 3.8 69.36 ± 3.40 2.2 78.72 ± 12.94 16.8 61.41 ± 5.52 4.8 69.23 ± 11.13 281 74.23 ± 13.11 126
Yale 52.13 ± 7.14 1024 52.73 ± 8.43 921 52.38 ± 4.94 841 54.60 ± 2.94 161 50.41 ± 4.44 11.4 45.45 ± 5.82 48.4 46.95 ± 8.82 18.2 23.05 ± 11.46 8.6 52.97 ± 7.22 421 53.43 ± 5.74 456
ORL 76.75 ± 1.90 1024 75.25 ± 2.85 961 74.00 ± 5.55 1001 77.75 ± 5.03 761 76.75 ± 1.90 419.6 75.00 ± 3.06 102.6 69.75 ± 6.27 47 45.75 ± 10.99 11.6 77.25 ± 3.79 1021 77.50 ± 3.54 1021
Avg. 73.50 ± 5.67 199.04 77.44 ± 4.81 106.83 76.41 ± 4.45 115.29 76.58 ± 4.98 88.96 75.38 ± 5.66 40.07 75.12 ± 4.58 30.86 74.33 ± 6.45 13.15 71.91 ± 5.94 27.25 76.08 ± 5.62 90.33 79.88 ± 4.36 81.67
14

Table 8
Classification performance in terms of accuracy (%) using classifier SVM on twenty-four datasets (Class noise level is 20%).
Datasets raw MRMR FCBF MIFS FRS-FS SFRS-FS PFRS-FS PGDFRS-FS RFBSIB FNIB
Acc±std num Acc±std num Acc±std num Acc±std num Acc±std Acc±std num Acc±std num Acc±std num Acc±std num Acc±std num
Glass 50.44 ± 5.98 9 53.24 ± 6.20 7 53.24 ± 6.20 5 52.75 ± 6.01 9 53.20 ± 6.46 7.6 52.75 ± 6.01 8.8 53.24 ± 6.20 2.8 52.75 ± 6.01 9 53.24 ± 6.20 8 54.10 ± 7.03 5
Sonar 71.13 ± 6.00 60 75.06 ± 6.90 40 75.06 ± 6.90 55 75.48 ± 6.98 47 72.11 ± 4.10 13.8 76.45 ± 4.58 36.2 74.97 ± 4.60 25.8 73.56 ± 5.87 14.4 76.96 ± 4.44 37 80.25 ± 3.39 46
Ecoli 80.32 ± 3.53 7 82.77 ± 5.48 7 80.65 ± 4.61 8 80.65 ± 4.61 8 81.24 ± 3.75 5.6 80.65 ± 4.61 6 80.65 ± 4.61 7 69.69 ± 9.84 5.4 80.65 ± 4.61 7 81.24 ± 3.75 5
Iono 84.92 ± 3.80 34 86.89 ± 2.35 23 84.89 ± 3.16 8 88.02 ± 3.61 11 86.89 ± 5.68 10.8 87.17 ± 5.07 26.4 87.17 ± 3.68 17 84.02 ± 5.52 30.4 84.91 ± 2.55 7 88.88 ± 5.11 13
Pima 76.03 ± 3.82 8 75.92 ± 2.66 4 75.52 ± 1.64 5 74.74 ± 2.41 9 74.74 ± 2.41 8 74.74 ± 2.41 8 74.74 ± 2.41 8 74.74 ± 2.41 8 74.74 ± 2.41 8 76.05 ± 2.50 5
Credit-A 85.51 ± 2.43 15 85.51 ± 2.32 8 85.51 ± 2.32 16 85.51 ± 2.32 16 85.51 ± 2.32 10.2 85.51 ± 2.32 10 85.51 ± 2.32 12.8 85.51 ± 2.32 12.2 85.51 ± 2.32 13 85.51 ± 2.32 1
Car 79.40 ± 1.63 6 79.86 ± 1.95 6 79.75 ± 1.84 6 79.75 ± 2.20 6 79.69 ± 1.93 6 79.69 ± 1.93 6 77.49 ± 4.46 5.2 79.69 ± 1.93 6 79.69 ± 1.93 6 83.56 ± 0.87 5
Wilt 94.61 ± 0.60 5 94.61 ± 0.04 5 94.61 ± 0.04 5 94.61 ± 0.04 4 94.61 ± 0.04 5 94.61 ± 0.04 1 94.61 ± 0.04 4.6 94.61 ± 0.04 5 94.61 ± 0.04 1 94.61 ± 0.67 1

Knowledge-Based Systems 250 (2022) 109092


German 72.60 ± 2.82 20 73.60 ± 3.83 3 72.40 ± 4.60 12 73.20 ± 4.16 4 72.40 ± 4.60 20 72.70 ± 3.29 9.4 72.40 ± 4.60 20 72.40 ± 4.60 20 72.40 ± 4.60 20 74.80 ± 3.68 1
Derma 94.54 ± 1.50 34 96.98 ± 1.82 29 95.90 ± 2.14 26 96.98 ± 1.54 29 94.24 ± 3.01 15.4 95.90 ± 1.38 27.4 94.81 ± 1.80 18 93.18 ± 2.11 23.6 95.62 ± 1.81 30 96.18 ± 1.77 19
Lymph 76.02 ± 9.51 18 80.55 ± 8.34 4 81.30 ± 9.49 12 77.98 ± 8.73 11 78.63 ± 6.52 7.4 78.33 ± 11.04 13 78.67 ± 8.31 14.8 76.60 ± 3.97 16.6 77.98 ± 8.73 18 81.34 ± 8.73 4
Wpbc 76.73 ± 5.04 33 78.78 ± 3.83 10 76.74 ± 4.97 7 76.74 ± 4.97 33 76.78 ± 1.88 6 78.27 ± 5.25 23.8 78.28 ± 4.18 19.6 76.76 ± 3.35 30 76.74 ± 4.97 33 79.31 ± 5.08 19
Zoo 91.84 ± 6.79 16 94.03 ± 6.75 16 92.98 ± 7.62 11 93.13 ± 5.58 16 92.88 ± 8.74 9.8 88.88 ± 11.22 9.8 92.88 ± 5.91 15 92.88 ± 5.91 13.4 92.98 ± 7.62 15 92.88 ± 5.91 13
Autos 53.66 ± 7.71 25 56.09 ± 9.78 10 55.62 ± 7.57 6 55.62 ± 7.57 24 56.08 ± 6.89 14.2 57.08 ± 8.97 17.6 56.08 ± 8.51 16.6 54.62 ± 9.09 21.8 55.62 ± 7.57 25 59.54 ± 6.63 6
Heart 80.74 ± 4.91 13 83.33 ± 7.05 6 82.96 ± 7.10 7 81.85 ± 8.43 8 82.22 ± 7.81 10.8 82.59 ± 8.55 10.4 82.22 ± 9.31 12.6 81.85 ± 8.82 13 81.85 ± 8.82 13 84.07 ± 6.36 3
Hepati 69.68 ± 8.31 19 70.32 ± 1.44 8 70.32 ± 1.44 9 70.32 ± 1.44 18 67.74 ± 2.28 14 68.39 ± 4.78 15.8 67.74 ± 3.23 15.4 69.68 ± 1.77 16.6 69.03 ± 2.89 18 70.97 ± 2.28 2
Parkin 81.54 ± 10.18 22 83.69 ± 9.54 21 84.14 ± 6.76 13 83.61 ± 6.58 23 83.64 ± 7.22 6.4 83.65 ± 7.60 10.2 84.12 ± 6.53 8.6 84.12 ± 6.53 20.8 84.14 ± 6.73 18 85.66 ± 5.53 18
SPECT 78.64 ± 3.57 22 79.40 ± 0.21 6 79.40 ± 0.21 2 79.40 ± 0.21 2 78.64 ± 1.78 22 79.40 ± 0.21 9.8 79.40 ± 0.21 11.2 78.64 ± 1.78 22 79.40 ± 0.21 1 79.78 ± 0.78 2
Sick 93.88 ± 0.84 29 93.88 ± 0.06 22 93.88 ± 0.06 2 93.88 ± 0.06 2 93.88 ± 0.06 3 93.88 ± 0.94 23.2 93.88 ± 0.06 12.6 93.88 ± 0.94 7.4 93.88 ± 0.06 1 93.88 ± 0.06 6
Wbc 96.14 ± 1.07 9 96.14 ± 2.12 8 95.85 ± 1.63 7 95.85 ± 1.63 9 95.85 ± 2.28 8.4 95.85 ± 2.28 8.4 95.85 ± 2.28 9 96.57 ± 0.31 9 95.85 ± 2.28 9 96.28 ± 2.10 4
Lung 78.33 ± 9.88 325 84.83 ± 10.19 91 80.43 ± 13.81 86 80.82 ± 10.39 186 78.33 ± 9.88 325 78.33 ± 9.88 325 68.68 ± 13.08 3.6 78.33 ± 9.88 325 81.00 ± 8.81 146 86.33 ± 9.77 150
Colon 71.15 ± 8.36 2000 72.82 ± 9.99 321 76.03 ± 7.07 401 75.90 ± 12.83 601 73.85 ± 19.22 3.8 70.90 ± 7.59 2.2 78.85 ± 17.33 15.6 64.23 ± 10.36 4.8 77.69 ± 9.72 441 77.56 ± 8.39 246
Yale 56.23 ± 10.36 1024 56.92 ± 10.55 921 56.07 ± 12.07 841 55.58 ± 11.82 161 53.37 ± 6.29 11.4 47.72 ± 7.31 44.2 45.26 ± 7.17 42 20.40 ± 12.07 8.6 57.34 ± 12.41 921 58.04 ± 13.98 906
ORL 79.75 ± 8.17 1024 80.75 ± 6.41 961 80.25 ± 7.57 1001 80.25 ± 6.81 761 79.75 ± 8.17 419.6 75.25 ± 5.26 102.6 72.75 ± 8.22 47 37.00 ± 10.70 11.6 81.00 ± 7.09 1001 81.00 ± 7.09 1001
Avg. 78.08 ± 5.28 199.04 79.83 ± 4.99 105.79 79.31 ± 5.04 106.33 79.28 ± 5.04 83.33 78.59 ± 5.14 40.18 78.28 ± 5.11 31.47 77.93 ± 5.38 15.20 74.41 ± 5.26 27.28 79.28 ± 4.95 116.54 80.91 ± 4.74 103.38
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

samples within the decision classes. The NFRS model not only [6] Z. Yin, L. Liu, J. Chen, B. Zhao, Y. Wang, Locally robust EEG feature selection
reduces the impact of noise data (attribute noise and class noise) for individual-independent emotion recognition, Expert Syst. Appl. 162
(2020) 113768.
for fuzzy decision information system, but also the lower and
[7] X. Zhang, H. Yao, Z. Lv, D. Miao, Class-specific information measures and
upper approximations reflect the degree of dispersion between- attribute reducts for hierarchy and systematicness, Inform. Sci. 563 (2021)
classes and aggregation intra-class samples. In particular, FNIB 196–225.
uses a new evaluation function by simultaneously considering [8] G. Kou, P. Yang, Y. Peng, F. Xiao, Y. Chen, F.E. Alsaadi, Evaluation of feature
selection methods for text classification with small datasets using multiple
the separability of a candidate feature provided for both the
criteria decision-making methods, Appl. Soft Comput. 86 (2020) 105836.
selected subset and the remaining feature subset to describe the [9] X. Zhang, H. Gou, Z. Lv, D. Miao, Double-quantitative distance measurement
relevance and redundancy of feature. The evaluation function and classification learning based on the tri-level granular structure of
can also integrate the uncertainty of the lower and upper ap- neighborhood system, Knowl.-Based Syst. 217 (2021) 106799.
proximations. Thus, the uncertainty of the information system [10] D. Dubois, H. Prade, Rough fuzzy sets and fuzzy rough sets, Int. J. Gen.
Syst. 17 (2–3) (1990) 191–209.
would be decreased in NFRS. Experiment results on twenty-four [11] W. Qian, C. Xiong, Y. Wang, A ranking-based feature selection for multi-
real-world datasets demonstrate the robustness of the proposed label classification with fuzzy relative discernibility, Appl. Soft Comput.
NFRS model, and the robustness and superiority classification 102 (2021) 106995.
performance of features selection algorithm by using the NFRS [12] C. Wang, Y. Huang, W. Ding, Z. Cao, Attribute reduction with fuzzy rough
self-information measures, Inform. Sci. 549 (2021) 68–86.
model, by comparing it with other baselines. However, the pro- [13] J. Chen, J. Mi, Y. Lin, A graph approach for fuzzy-rough feature selection,
posed method also has shortcomings. Firstly, the proposed NFRS Fuzzy Sets and Systems 391 (2020) 96–116.
model requires estimating the local density of data, so that the [14] L. Sun, T. Wang, W. Ding, J. Xu, Y. Lin, Feature selection using Fisher
time complexity of the FNIB algorithm will be increased. Second, score and multilabel neighborhood rough sets for multilabel classification,
Inform. Sci. 578 (2021) 887–912.
FNIB obtains feature subsets by using a downstream classification [15] W. Ding, J. Wang, J. Wang, Multigranulation consensus fuzzy-rough based
model, and thus the size of the feature subsets is influenced attribute reduction, Knowl.-Based Syst. 198 (2020) 105945.
by the classification learning model. Our future work includes [16] A. Kumar, P.S. Prasad, Scalable fuzzy rough set reduct computation using
designing a novel framework that works in dynamic and ordered fuzzy min–max neural network preprocessing, IEEE Trans. Fuzzy Syst. 28
(5) (2020) 953–964.
data by using dominance rough set theory. We are also interested
[17] C. Wang, Y. Qian, W. Ding, X. Feng, Feature selection with fuzzy-rough
in exploring techniques to speed up our method for large-scale minimum classification error criterion, IEEE Trans. Fuzzy Syst. (2021) 1,
data. http://dx.doi.org/10.1109/TFUZZ.2021.3097811.
[18] L. D’eer, N. Verbiest, C. Cornelis, L. Godo, A comprehensive study of
implicator–conjunctor-based and noise-tolerant fuzzy rough sets: Defini-
CRediT authorship contribution statement tions, properties and robustness analysis, Fuzzy Sets and Systems 275
(2015) 1–38.
Xiaoling Yang: Conceptualization, Methodology, Software, [19] Q. Hu, L. Zhang, S. An, D. Zhang, D. Yu, On robust fuzzy rough set models,
Writing – original draft, Validation, Formal analysis, Investi- IEEE Trans. Fuzzy Syst. 20 (4) (2012) 636–651.
[20] A. Namburu, S. kumar Samay, S.R. Edara, Soft fuzzy rough set-based MR
gation, Visualization, Writing – review & editing. Hongmei brain image segmentation, Appl. Soft Comput. 54 (2017) 456–466.
Chen: Writing – review & editing, Supervision, Visualization, [21] N. Verbiest, E. Ramentol, C. Cornelis, F. Herrera, Preprocessing noisy
Investigation, Formal analysis, Validation. Tianrui Li: Writing – imbalanced datasets using SMOTE enhanced with fuzzy rough prototype
review & editing, Supervision, Resources. Chuan Luo: Writing – selection, Appl. Soft Comput. 22 (2014) 511–517.
[22] J.A. Sáez, E. Corchado, ANCES: A novel method to repair attribute noise in
review & editing.
classification problems, Pattern Recognit. 121 (2022) 108198.
[23] S. Xia, G. Wang, Z. Chen, Y. Duan, et al., Complete random forest based class
Declaration of competing interest noise filtering learning for improving the generalizability of classifiers, IEEE
Trans. Knowl. Data Eng. 31 (11) (2019) 2063–2078.
[24] X. Zhu, X. Wu, Class noise vs. attribute noise: A quantitative study of their
The authors declare that they have no known competing impacts, Artif. Intell. Rev. 22 (2004) 177–210.
financial interests or personal relationships that could have [25] S. An, E. Zhao, C. Wang, G. Guo, S. Zhao, P. Li, Relative fuzzy rough
appeared to influence the work reported in this paper. approximations for feature selection and classification, IEEE Trans. Cybern.
(2021) 1–11, http://dx.doi.org/10.1109/TCYB.2021.3112674.
[26] S. An, J. Liu, C. Wang, S. Zhao, A relative uncertainty measure for fuzzy
Acknowledgments rough feature selection, Internat. J. Approx. Reason. 139 (2021) 130–142.
[27] C. Cornelis, N. Verbiest, R. Jensen, Ordered weighted average based fuzzy
This work was partially supported by grants from the Na- rough sets, in: International Conference on Rough Sets and Knowledge
Technology, Springer, 2010, pp. 78–85.
tional Science Foundation of China (Nos. 61976182, 62076171,
[28] C.Y. Wang, L. Wan, New results on granular variable precision fuzzy rough
61876157, and 61976245), and Sichuan Key R&D Project (no. sets based on fuzzy (co) implications, Fuzzy Sets and Systems 423 (2021)
2020YFG0035). 149–169.
[29] Q. Hu, S. An, D. Yu, Soft fuzzy rough sets for robust feature evaluation and
selection, Inform. Sci. 180 (22) (2010) 4384–4400.
References
[30] W. Ji, Y. Pang, X. Jia, Z. Wang, F. Hou, B. Song, M. Liu, R. Wang, Fuzzy rough
sets and fuzzy rough neural networks for feature selection: A review, Wiley
[1] H.T. Shen, Y. Zhu, W. Zheng, X. Zhu, Half-quadratic minimization for Interdiscip. Rev. Data Min. Knowl. Discov. 11 (3) (2021) e1402.
unsupervised feature selection on incomplete data, IEEE Trans. Neural [31] L. Sun, T. Yin, W. Ding, Y. Qian, J. Xu, Feature selection with missing
Netw. Learn. Syst. 32 (7) (2021) 3122–3135, http://dx.doi.org/10.1109/ labels using multilabel fuzzy neighborhood rough sets and maximum
TNNLS.2020.3009632. relevance minimum redundancy, IEEE Trans. Fuzzy Syst. (2021) 1, http:
[2] L. Wang, J. Meng, R. Huang, H. Zhu, K. Peng, Incremental feature weighting //dx.doi.org/10.1109/TFUZZ.2021.3053844.
for fuzzy feature selection, Fuzzy Sets and Systems 368 (2019) 1–19. [32] Y. Li, S. Wu, Y. Lin, J. Liu, Different classes’ ratio fuzzy rough set based
[3] L. Sun, L. Wang, W. Ding, Y. Qian, J. Xu, Feature selection using fuzzy robust feature selection, Knowl.-Based Syst. 120 (2017) 74–86.
neighborhood entropy-based uncertainty measures for fuzzy neighborhood [33] A. Tan, J. Liang, W. Wu, J. Zhang, L. Sun, C. Chen, Fuzzy rough discrimina-
multigranulation rough sets, IEEE Trans. Fuzzy Syst. 29 (1) (2021) 19–33, tion and label weighting for multi-label feature selection, Neurocomputing
http://dx.doi.org/10.1109/TFUZZ.2020.2989098. 465 (2021) 128–140.
[4] J. Chen, Y. Zeng, Y. Li, G. Huang, Unsupervised feature selection based [34] Z. Yuan, H. Chen, P. Xie, P. Zhang, J. Liu, T. Li, Attribute reduction methods
extreme learning machine for clustering, Neurocomputing 386 (2020) in fuzzy rough set theory: An overview, comparative experiments, and
198–207. new directions, Appl. Soft Comput. 107 (2021) 107353.
[5] P. Zhang, T. Li, G. Wang, C. Luo, H. Chen, J. Zhang, D. Wang, Z. Yu, Multi- [35] S. An, Q. Hu, W. Pedrycz, P. Zhu, E.C. Tsang, Data-distribution-aware fuzzy
source information fusion based on rough set theory: A review, Inf. Fusion rough set model and its application to robust classification, IEEE Trans.
68 (2021) 85–117. Cybern. 46 (12) (2015) 3073–3085.

15
X. Yang, H. Chen, T. Li et al. Knowledge-Based Systems 250 (2022) 109092

[36] S. An, Q. Hu, C. Wang, Probability granular distance-based fuzzy rough set [42] N. Malik, M. Shabir, Rough fuzzy bipolar soft sets and application in
model, Appl. Soft Comput. 102 (2021) 107064. decision-making problems, Soft Comput. 23 (5) (2019) 1603–1614.
[37] J. Wan, H. Chen, T. Li, X. Yang, B. Sang, Dynamic interaction feature [43] G. Liu, Axiomatic systems for rough sets and fuzzy rough sets, Internat. J.
selection based on fuzzy rough set, Inform. Sci. 581 (2021) 891–911. Approx. Reason. 48 (3) (2008) 857–867.
[38] P. Ni, S. Zhao, X. Wang, H. Chen, C. Li, E.C. Tsang, Incremental feature [44] Q. Hu, D. Yu, W. Pedrycz, D. Chen, Kernelized fuzzy rough sets
selection based on fuzzy rough sets, Inform. Sci. 536 (2020) 185–204. and their applications, IEEE Trans. Knowl. Data Eng. 23 (11) (2011)
[39] M. Hu, E.C.C. Tsang, Y. Guo, W. Xu, Fast and robust attribute reduction 1649–1667.
based on the separability in fuzzy decision systems, IEEE Trans. Cybern. [45] H. Peng, F. Long, C. Ding, Feature selection based on mutual information
(2021) 1–14, http://dx.doi.org/10.1109/TCYB.2020.3040803. criteria of max-dependency, max-relevance, and min-redundancy, IEEE
[40] J. Chen, Y. Lin, J. Mi, S. Li, W. Ding, A spectral feature selection approach Trans. Pattern Anal. Mach. Intell. 27 (8) (2005) 1226–1238.
with kernelized fuzzy rough sets, IEEE Trans. Fuzzy Syst. (2021) http: [46] R. Battiti, Using mutual information for selecting features in supervised
//dx.doi.org/10.1109/TFUZZ.2021.3096212. neural net learning, IEEE Trans. Neural Netw. 5 (4) (1994) 537–550.
[41] M. Naz, M. Shabir, On fuzzy bipolar soft sets, their algebraic structures and [47] L. Yu, H. Liu, Efficient feature selection via analysis of relevance and
applications, J. Intell. Fuzzy Systems 26 (4) (2014) 1645–1656. redundancy, J. Mach. Learn. Res. 5 (2004) 1205–1224.

16

You might also like