Professional Documents
Culture Documents
Received June 10, 2015. Revised March 10, 2016. Accepted May 17, 2016.
Abstract
Reduction of feature vector dimension is a problem of selecting the most informative
features from an information system. Using rough set theory (RST) we can reduce the
feature vector dimension when all the attribute values are crisp or discrete. For any
information system or decision system, if attributes contain real-valued data, RST cannot
be applied directly. Fuzzy-rough set techniques may be applied on this kind of system to
reduce the dimension. But, Fuzzy-rough set uses the concept of fuzzy-equivalence relation,
which is not suitable to model approximate equality. In this paper we propose a new
alternative method to reduce the dimension of feature vectors of a decision system where
the attribute values may be discrete or real or even mixed in nature. To model approximate
www.jprr.org
equality we first consider the intuitive relationship between distance measure and equality.
Subsequently we fuzzify the distance measures to establish the degree of equality (or
closeness) among feature vectors (objects or points). Finally we use the concept of α-
cut to obtain equivalence relation based on which dimension of feature vectors can be
reduced. We also compare the performance of the present method to reduce the feature
vector dimension with those of principle component analysis (PCA), Kernel Principal
Component Analysis (KPCA) and independent component analysis (ICA). In most of
the cases the present method performs same or even better than the other methods.
Keywords: Rough set, approximate equality, α-cut, feature dependencies, dimension
reduction, multilayer perceptron, support vector machine.
1. Introduction
Many classification problems involve high dimensional descriptions of input features. Re-
duction of feature vector dimension helps to simplify the design and implementation of
classifiers. A technique that can reduce dimensionality using information contained within
the data set and preserving the meaning of the features is desirable in practice. Rough
set theory (RST) can be used to reduce the number of attributes contained in a data set
using only the data it contains and no additional information [17]. Given a data set with
discretized attribute values, it is possible to find a subset (termed as reduct) of original
attributes using RST that are most informative; all other attributes can be removed from
the data set with minimal information loss.
However, it is most often the case that the values of the attributes may be both crisp and
real-valued. In that case the traditional rough set theory for ‘reduct’ encounters a problem.
It is not possible in the theory to say whether two attribute values are similar and to what
extent they are the same. For example, two very close values may only differ as a result of
noise, but in RST they are considered to be as different as two values of a different order
of magnitude.
© 2016 JPRR. All rights reserved. Permissions to make digital or hard copies of all or part of
this work for personal or classroom use may be granted by JPRR provided that copies are not
made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. To copy otherwise, or to republish, requires a fee and/or
special permission from JPRR.
Approximate Equality for Feature Dimension Reduction
To overcome this problem fuzzy-rough set technique [7] may be applied. But fuzzy
rough set uses the concept of fuzzy-equivalence relation, which is not suitable to model
approximate equality as the transitivity property of fuzzy equivalence relation is counter-
intuitive [12].
We proposed here a new methodology to reduce the feature vector dimension of a decision
system having attribute values crisp or fuzzy or mixed in nature. As fuzzy equivalence
relation does not suitably model approximate equality, we consider the intuitive notion
of distance measure to model approximate equality. More the two objects are close (i.e.
the smaller the distance between them) the more they are approximately equal. Based on
this intuitive design philosophy (to model approximate equality) we try to group similar
points (i.e. points that are approximately equal) on pattern space. To model approximate
equality between any two points (objects) in the pattern space, we select an appropriate
membership function. Hence, we introduce the notion of α-cut on fuzzy relation to achieve
a new equivalence relation to map it into rough set methodology.
The rest of this paper is structured as follows. Section 2 discusses the fundamentals of
rough set theory, in particular focusing on dimensionality reduction. Section 3 introduces a
new alternative method to reduce the feature vector dimension of a decision system having
attribute values crisp or fuzzy or mixed in nature. The proposed algorithm is defined in
section 4. Section 5 presents a worked-out example. An application is given in section 6.
Comparative study with other methods is shown in section 7. Section 8 is for conclusion.
2. Background
A rough set is an approximation of a vague concept by a pair of precise concepts, called lower
and upper approximations [17]. The lower approximation is a description of the domain
objects which are known with certainty to belong to the subset of interest, whereas the upper
approximation is a description of the objects which possibly belong to the subset. For further
reading comprehension, interested readers may be referred to [1, 2, 5, 6, 10, 13–16, 18, 20–28].
2.1 Rough Set Attribute Reduction
Rough sets have been employed to remove redundant conditional attributes from discrete-
valued data sets, while retaining their information content. Let I = (U, A) be an information
system, where U is a non-empty set of finite objects (the universe of discourse), A is a finite
nonempty set of attributes such that a : U → Va ∀ a ∈ A, Va being the value set of attribute
a. In a decision system, A = {C ∪ D} where C is the set of conditional attributes and D is
the set of decision attributes. With any P ⊆ A there is an associated equivalence relation
IN D(P ) :
IN D(P ) = {(x, y) ∈ U 2 | ∀ a ∈ P, a(x) = a(y)} (1)
The partition of U, generated by IN D(P ) is denoted U/P and can be calculated as follows:
P X = {x | [x]P ⊆ X} (4)
27
Ray and Kolay
Let P and Q be equivalence relations over U, then the positive region can be defined as
In terms of classification, the positive region contains all objects of U that can be classified
to classes of U/Q using the knowledge in attributes P. An important issue in data analysis
is discovering dependencies between attributes. Intuitively, a set of attributes Q depends
totally on a set of attributes P, denoted P ⇒ Q, if all attribute values from Q are uniquely
determined by values of attributes from P. Dependency can be defined in the following way:
k
For P, Q ⊆ A, Q depends on P in a degree k(0 ≤ k ≤ 1), denoted P = ⇒ Q, if
a b c d q
x1 1 0 1 2 0
x2 2 1 2 2 2
x3 1 1 2 1 1
x4 1 2 2 1 2
x5 0 0 1 2 1
x6 1 2 0 1 2
U/Q = {X1 , X2 , X3 }
The positive region P OSp (Q) = ∪X∈U/Q P X = {x2 , x5 }. Hence, {q} depends on {a} in a
degree k = γ{a} ({q}), where
28
Approximate Equality for Feature Dimension Reduction
In the similar fashion, we can calculate the dependencies for all possible subsets of C as:
γ{a} ({q}) = 2/6, γ{b} ({q}) = 0, γ{c} ({q}) = 1/6, γ{d} ({q}) = 0,
γ{a,b} ({q}) = 1, γ{a,c} ({q}) = 4/6, γ{a,d} ({q}) = 3/6, γ{b,c} ({q}) = 2/6,
γ{b,d} ({q}) = 4/6, γ{c,d} ({q}) = 2/6, γ{a,b,c} ({q}) = 1, γ{a,b,d} ({q}) = 1,
γ{a,c,d} ({q}) = 4/6, γ{b,c,d} ({q}) = 4/6, γ{a,b,c,d} ({q}) = 1.
Here, {a, b} is the minimal subset of the conditional attributes for which dependency factor
is equal to 1. Hence, {a, b} is one reduct.
where d(xi , xj ) is the distance between two objects xi and xj . dmax is the maximum distance
between any two objects.
A fuzzy τ -equivalence relation can be defined as below:
Definition. If a fuzzy relation r ⊆ U × U satisfies the following three conditions we call it
a fuzzy τ -equivalence relation [12]:
(a) ∀ xi ∈ U ⇒ µr (xi , xi ) = 1 (Reflexive)
(b) ∀ (xi , xj ) ∈ U × U, µr (xi , xj ) = µ ⇒ (xj , xi ) = µ (Symmetric)
(c) ∀ (xi , xj ), (xj , xk ), (xi , xk ) ∈ U × U µr (xi , xk ) ≥ τ (µr (xi , xj ), µr (xj , xk )) (Transitive).
But fuzzy equivalence relation is not suitable to model approximate equality [12]. Hence,
we introduce the notion of α-cut on fuzzy relation to achieve a new equivalence relation R0
as below:
R0 (P ) = {(xi , xj ) ∈ U 2 | ∀ a ∈ P, µr (xi , xj ) ≥ α}, (8)
where P ⊆ A.
If (xi , xj ) ∈ R0 (P ), then xi and xj are indiscernible by attributes from P. The equivalence
classes of the equivalence relation R0 of P are denoted [x]P . Let X ⊆ U, the P -lower
approximation of a set can now be defined as [7]
P X = {x | [x]P ⊆ X} (9)
Let P and Q be equivalence relations over U, then the positive region can be defined as:
29
Ray and Kolay
5. A Worked-out Example
Let us consider an information (decision) system I = (U, A) of emotion classification where
U is the finite nonempty set of objects and A = C ∪D is the finite nonempty set of attributes
where C stands for the finite nonempty set of conditional attributes and D stands for the
finite nonempty set of decision attributes.
Table 1 represents a decision system with
30
Approximate Equality for Feature Dimension Reduction
a b c d e f g q
x1 156.9 33.75 55 14 14.5 20 21 1
x2 114.75 0 55 8.75 12 5 8.5 2
x3 124.5 0 55 20.25 22.5 2 7.5 3
x4 123.75 24 37.25 22.75 19 7 7.25 4
x5 149.75 0 45 22 21.75 18 15.25 5
x6 179.75 32.25 79.5 16 17.25 19 21 1
x7 125.25 5.75 42.75 7 12.25 7.75 6 2
x8 111.75 25 40 21 18 0 4.25 3
x9 121.25 10.75 102 30.25 27.25 36.75 32 4
x10 122.25 0 63.25 24.25 22.5 15.5 11.5 5
x11 144 23.5 80.25 16.5 18.25 11.75 12.75 1
x12 116.08 12.25 45.25 3.75 5.5 5.75 6.5 2
x13 124.5 13.5 43 0 11.5 0 0 3
x14 125 11.25 61 31.25 29.75 15.25 13 4
x15 112.5 0 47.25 22.5 20.75 18.5 16 5
x16 137.75 31.25 75.5 19.5 17.5 13.5 11.75 1
x17 131.25 0 82 17.75 17.5 14.5 11.25 2
x18 94.5 25 69.75 13.75 18.75 6.25 7 3
x19 121.75 38.75 69.75 27 28 33.75 34.25 4
x20 115.25 7.75 45.75 17.25 17.75 0 2.75 5
x21 166 23.25 80.5 14 16 25 23.25 1
x22 102.25 0 58.75 22.75 23 1.75 2.75 2
x23 102.5 0 31.75 20.25 19.25 0 0 3
x24 114.25 30.25 41 20 22 14 15 4
x25 123.25 0 41.75 17.5 16.5 25 25 5
x26 174 18.25 63.25 14.2 16.75 5.5 5.5 1
x27 107.17 29.5 54.75 8.25 11.25 2.75 3.75 2
x28 126.5 26.5 53.75 14.75 10.5 0 0 3
x29 126 12 21 18.25 17.75 42.75 45.5 4
x30 141.5 0 33.75 19.75 19 15 18.75 5
31
Ray and Kolay
R0 a b c d e f g q
x1 2 2 1 0 0 1 1 1
x2 1 0 1 0 0 0 0 2
x3 2 0 1 1 1 0 0 3
x4 2 2 0 1 0 0 0 4
x5 2 0 1 1 1 1 1 5
x6 2 2 1 0 0 1 1 1
x7 2 1 1 0 0 0 0 2
x8 1 2 0 1 0 0 0 3
x9 2 1 1 1 1 1 1 4
x10 2 0 1 1 1 1 0 5
x11 2 2 1 0 0 0 0 1
x12 1 1 1 0 0 0 0 2
x13 2 1 1 0 0 0 0 3
x14 2 1 1 1 1 1 0 4
x15 1 0 1 1 1 1 1 5
x16 2 2 1 0 0 0 0 1
x17 2 0 1 0 0 0 0 2
x18 0 2 1 0 0 0 0 3
x19 2 2 1 1 1 1 1 4
x20 1 1 1 0 0 0 0 5
x21 2 2 1 0 0 1 1 1
x22 1 0 1 1 1 0 0 2
x23 1 0 0 1 0 0 0 3
x24 1 2 1 0 1 0 0 4
x25 2 0 1 0 0 1 1 5
x26 2 1 1 0 0 0 0 1
x27 1 2 1 0 0 0 0 2
x28 2 2 1 0 0 0 0 3
x29 2 1 0 0 0 1 1 4
x30 2 0 0 0 0 0 1 5
32
Approximate Equality for Feature Dimension Reduction
Based on the above data if we try to find the dependency factor taking all conditional
attribute in account, we get the value = 22/30 ≤ 1. Hence, all the attribute values cannot
classify all objects to their decision classes. So, we have to define characteristic functions
in such a way that results more different classes.
Now, we redefine the functions as given below:
6. Application
The information system shown in the appendix is taken from [11]. The system composed of
122 patients with duodenal ulcer, 11 pre operating attributes and a long-term result in the
visick grading [11]. Attributes 1-4 are concerned anamnesis and the remaining attributes
are related to pre-operating gastric secretion. Attribute 1 represents ‘sex’ which can take
the value either 0 for ‘male’ or 1 for ‘female’, attribute 4 represents ‘complication of ulcer’
which can take the value either 0 for ‘none’ or 1 for ‘acute haemorrhage’ or 2 for ‘multiple
haemorrhages’ or 3 for ‘perforation in the past’ or 4 for ‘pyloric stenosis’. All other at-
tributes, except 1 and 4 take arbitrary real values from intervals defined by extreme cases.
Attribute 2 represents ‘age (years)’, attribute 3 represents ‘duration of disease (years)’,
attribute 5 represents ‘HCl concentration (mmol HCl 100 ml−1 )’, attribute 6 represents
33
Ray and Kolay
R00 a b c d e f g q
x1 4 4 2 1 1 4 5 1
x2 2 0 2 0 1 1 2 2
x3 3 0 2 2 3 1 2 3
x4 3 3 1 2 2 2 2 4
x5 4 0 2 2 3 4 4 5
x6 4 4 3 1 2 4 5 1
x7 3 1 2 0 1 2 2 2
x8 2 3 1 2 2 0 1 3
x9 3 2 4 3 4 5 5 4
x10 3 0 3 2 3 4 3 5
x11 4 3 3 1 2 3 3 1
x12 2 2 2 0 0 2 2 2
x13 3 2 2 0 1 0 0 3
x14 3 2 3 3 4 4 3 4
x15 2 0 2 2 3 4 4 5
x16 4 4 3 1 2 3 3 1
x17 4 0 3 1 2 3 3 2
x18 0 3 3 1 2 2 2 3
x19 3 4 3 2 4 5 5 4
x20 2 1 2 1 2 0 1 5
x21 4 3 3 1 2 5 5 1
x22 1 0 2 2 3 1 1 2
x23 1 0 1 2 2 0 0 3
x24 2 4 2 1 3 3 3 4
x25 3 0 2 1 2 5 5 5
x26 4 2 3 1 2 2 2 1
x27 1 3 2 0 1 1 1 2
x28 3 3 2 1 1 0 0 3
x29 3 2 1 1 2 5 5 4
x30 4 0 1 1 2 3 4 5
34
Approximate Equality for Feature Dimension Reduction
‘volume of gastric juice per 1 h (ml)’, attribute 7 represents ‘volume of residual gastric juice
(ml)’, attribute 8 represents ‘basic acid output (BAO) (mmol HCl 100 ml−1 )’, attribute
9 represents ‘HCl concentration (mmol HCl 100 ml−1 )’, attribute 10 represents ‘volume of
gastric juice (ml)’, and attribute 11 represents ‘maximal acid output (MAO) (mmol HCl
100 ml−1 )’.
6.1 Experimental Results
From appendix we have the decision system with
7. Comparative Study
If we reduce the dimensionality of the feature vector the correlated information is elimi-
nated. As a result, a loss of accuracy in classification (using the feature vector with reduced
dimension) is incurred. To test the performance of the present method we compare it with
standard techniques of principal component analysis (PCA) [8], Kernel principal component
analysis (KPCA) [19] and independent component analysis (ICA) [4] for classification with
reduced dimension in multilayer perception (MLP) and support vector machine (SVM). A
benchmark of performance is established by considering classification in MLP and SVM
on full data set, without the benefit of dimensionality reduction. The Abalone, Horse colic
and Pima Indians data sets, obtained from UCL database repository [3], are used for the
present study.
Table 4: Details of data sets used.
Table 5-7 show the performance of each method used. From the results it is clear that
the proposed method produces satisfactory results compared to the others. Therefore, the
proposed method is comparable with the conventional methods of classifications. Such
comparison of classification results on a set of benchmark data is simply a rough estimate
about the performance of the proposed method. Hence, the proposed method can be treated
as an alternative tool for classification.
8. Conclusion
In this paper we have successfully applied the basic notion of modeling approximate equality
as proposed in [9, 12] and develop an algorithm for optimal reduction of feature vector
dimension. The basic concept of the proposed method is essentially derived from Rough
Set Theory (RST).
Though RST is a good tool for feature dimension reduction, for information systems
having real-valued data this technique cannot be applied directly.
35
Ray and Kolay
MLP SVM
Method Inputs Classification Classification
Score Score
Full data Set 8 91 % 92 %
PCA 4 90 % 91 %
KPCA 3 89 % 87 %
IPCA 5 82 % 83 %
Present Method 3 90 % 92 %
MLP SVM
Method Inputs Classification Classification
Score Score
Full data Set 27 91 % 94 %
PCA 10 89 % 92 %
KPCA 13 85 % 89 %
IPCA 15 82 % 83 %
Present Method 8 90 % 91 %
MLP SVM
Method Inputs Classification Classification
Score Score
Full data Set 8 78 % 79 %
PCA 5 89 % 88 %
KPCA 6 82 % 83 %
IPCA 7 81 % 80 %
Present Method 3 89 % 90 %
To overcome the limitation of RST, Fuzzy-rough based approach may be used. But the
fuzzy-rough based approach is pivoted on fuzzy equivalence relation, which is characterized
by fuzzy transitivity property. Unfortunately fuzzy transitivity property is counter-intuitive
in nature [12].
Hence, in the proposed method we avoid all the above said drawbacks and develop a tool
for optimal reduction of feature vector dimension of any kind of information system having
discrete or real or mixed valued data.
In section 7 we compare the performance of the present approach with standard PCA,
KPCA and ICA and we get very satisfactory results. As the performance of the proposed
method for classification on benchmark data is comparable with existing methods; hence
the proposed method can be treated as an alternative tool for classification.
36
Approximate Equality for Feature Dimension Reduction
References
[1] B. De Baets and R. Mesiar, Pseudo-metrics and T-equivalences, The Journal of Fuzzy Mathe-
matics, Vol. 5, No. 2, pp. 471-481, 1997.
[2] A.J. Bell and T.J. Sejnowski, The indipendent components of natural scenes are edge filters,
Vision Res, Vol. 37, No. 23, pp. 3327-3338, 1997.
[3] C.L. Blake and C.J. Merz, http://www.ics.uci.edu/mlearn/ ML Repository.
[4] P. Comon, Independent component analysis, a new concept, Signal Process, Vol. 36, No. 3,
pp. 287-314, 1994.
[5] K.I. Diamantaras and S.Y. Kung, Principal Component Neural Networks, Wiley, New York,
1996.
[6] V. Janis, Resemblance is a nearness, Fuzzy Sets and Systems, Vol. 133, pp. 171-173, 2003.
[7] R. Jensen and Q. Shen, Semantics-preserving dimensionality reduction: rough and fuzzy-rough-
based approaches, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 12,
pp. 1457-1471, 2004.
[8] I.J. Jollie, Principal Component Analysis, Springer, New York, 1986.
[9] F. Klawonn, Should fuzzy equality and similarity satisfy transitivity? Comments on the paper by
M. De Cock and E. Kerre, Fuzzy Sets and Systems, Vol. 133, pp. 175-180, 2003.
[10] R. Kruse, J. Gebhardt, and F. Klawonn, Foundations of Fuzzy Systems, Wiley, Chichester,
1994.
[11] K.S. Lowinski, Sensitivity analysis of rough classification, Int. J. Man-Machine Studies, Vol. 32,
pp. 693-705, 1990.
[12] M. De Cock and E. Kerre, On (un)suitable fuzzy relations to model approximate equality, Fuzzy
Sets and Systems, Vol. 133, pp. 137-153, 2003.
[13] V. Murali, Fuzzy equivalence relations, Fuzzy Sets and Systems, Vol. 30, pp. 155-163, 1989.
[14] E. Oja, The nonlinear PCA learning rule and signal separation- mathematical analysis, Neu-
rocomputing, Vol. 17, pp. 25-45, 1997.
[15] S. Ovchinnikov, Similarity relations, fuzzy partitions, and fuzzy orderings, Fuzzy Sets and
Systems, Vol. 40, pp. 107-126, 1991.
[16] S. Ovchinnikov, Representations of transitive fuzzy relations, in: H.J. Skala, S. Termini, E.
Trillas (Eda), Aspect of Vagueness, D. Reidel Pub-lishing Company, Dordrecht, MA, pp. 105-118,
1984.
[17] Z. Pawlak, Rough sets: Theoretical Aspects of Reasoning About Data, Kluwer Academic Pub-
lishing, Dordrecht, 1991.
[18] H. Poincare, La Valeur de la Science, Flammarion, Paris, 1904.
[19] R. Rosipal, M. Girolami, L.J. Trejo, and A. Cichocki, Kernel PCA for feature extraction and
de-noising in non-linear regression, Neural Comput. Appl, Vol. 10, No. 3, pp. 231-243, 2001.
[20] E. Ruspini, A new approach to clustering, Inform. Control, Vol. 15, pp. 22-32, 1969.
[21] B. Scholkopf, A. Smola, and K.R. Muller, Nonlinear component analysis as a Kernel eigenvalue
problem, Neural Comput., Vol. 10, pp. 1299-1319, 1998.
[22] H. Thiele, On the mutual defineability of fuzzy tolerance relations and fuzzy tolerance coverings,
in: Proc. 25th Internat. Symp. On Multiple Valued Logic, IEEE Computer Society, Los Alamitos,
CA, pp. 140-145, 1995.
[23] H. Thiele and N. Schmechel, The mutual defineability of fuzzy equivalence relations and fuzzy
partitions, in: Proc. Internat. Joint Conf. of the Fourth IEEE International Conference on Fuzzy
Systems and the Second International Fuzzy Engineering Symposium, Yokohama, pp. 1383-1390,
1995.
[24] H. Thiele, On similarity based fuzzy clusterings, in: D. Dubois, E.P. Klement, H. Prade (Eds.),
Fuzzy Sets, Logics and Reasoning about Knowledge, Kluwer Academic Publishers, Dordrecht,
pp. 289-299, 1999.
[25] E. Tsiporkova and H.J. Zimmermann, Aggregation of Compatibility and Equality: A New Class
of Similarity Measures for Fuzzy Sets, In Proceedings IPMU, pp. 1769-1776, 1998.
[26] L.A. Zadeh, Similarity relations and fuzzy orderings, Inform. Sci., Vol. 3, pp. 177-200, 1971.
[27] L.A. Zadeh, A Fuzzy-Sets-Theoretic Interpretation of Linguistic Hedges, Journal of Cybernetics,
Vol. 2, No. 3, pp. 4-34, 1972.
[28] L.A. Zadeh, Calculus of Fuzzy Restrictions, In Fuzzy Sets and Their Applications to Cognitive
and Decision Processes (L.A. Zadeh, K.S. Fu, K. Tanaka, M. Shimura (Eds.)), Academic Press,
New York, pp. 1-40, 1975.
37
Ray and Kolay
Appendix
38
Approximate Equality for Feature Dimension Reduction
39
Ray and Kolay
40