You are on page 1of 15

Journal of Pattern Recognition Research 1 (2016) 26-40

Received June 10, 2015. Revised March 10, 2016. Accepted May 17, 2016.

Application of Approximate Equality for Reduction of


Feature Vector Dimension

Kumar S. Ray ksray@isical.ac.in


Electronics and Communication Science Unit, Indian Statistical Institute
203 B.T. Road, Kolkata, West Bengal 700108, India
Srikanta Kolay kolaysrikanta@gmail.com
SMS India Pvt. Ltd., RDB Boulevard, Unit-D, Plot: K1, Block-EP & GP,
Sec-V, Salt Lake, Kolkata, West Bengal 700091, India

Abstract
Reduction of feature vector dimension is a problem of selecting the most informative
features from an information system. Using rough set theory (RST) we can reduce the
feature vector dimension when all the attribute values are crisp or discrete. For any
information system or decision system, if attributes contain real-valued data, RST cannot
be applied directly. Fuzzy-rough set techniques may be applied on this kind of system to
reduce the dimension. But, Fuzzy-rough set uses the concept of fuzzy-equivalence relation,
which is not suitable to model approximate equality. In this paper we propose a new
alternative method to reduce the dimension of feature vectors of a decision system where
the attribute values may be discrete or real or even mixed in nature. To model approximate
www.jprr.org

equality we first consider the intuitive relationship between distance measure and equality.
Subsequently we fuzzify the distance measures to establish the degree of equality (or
closeness) among feature vectors (objects or points). Finally we use the concept of α-
cut to obtain equivalence relation based on which dimension of feature vectors can be
reduced. We also compare the performance of the present method to reduce the feature
vector dimension with those of principle component analysis (PCA), Kernel Principal
Component Analysis (KPCA) and independent component analysis (ICA). In most of
the cases the present method performs same or even better than the other methods.
Keywords: Rough set, approximate equality, α-cut, feature dependencies, dimension
reduction, multilayer perceptron, support vector machine.

1. Introduction
Many classification problems involve high dimensional descriptions of input features. Re-
duction of feature vector dimension helps to simplify the design and implementation of
classifiers. A technique that can reduce dimensionality using information contained within
the data set and preserving the meaning of the features is desirable in practice. Rough
set theory (RST) can be used to reduce the number of attributes contained in a data set
using only the data it contains and no additional information [17]. Given a data set with
discretized attribute values, it is possible to find a subset (termed as reduct) of original
attributes using RST that are most informative; all other attributes can be removed from
the data set with minimal information loss.
However, it is most often the case that the values of the attributes may be both crisp and
real-valued. In that case the traditional rough set theory for ‘reduct’ encounters a problem.
It is not possible in the theory to say whether two attribute values are similar and to what
extent they are the same. For example, two very close values may only differ as a result of
noise, but in RST they are considered to be as different as two values of a different order
of magnitude.

© 2016 JPRR. All rights reserved. Permissions to make digital or hard copies of all or part of
this work for personal or classroom use may be granted by JPRR provided that copies are not
made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. To copy otherwise, or to republish, requires a fee and/or
special permission from JPRR.
Approximate Equality for Feature Dimension Reduction

To overcome this problem fuzzy-rough set technique [7] may be applied. But fuzzy
rough set uses the concept of fuzzy-equivalence relation, which is not suitable to model
approximate equality as the transitivity property of fuzzy equivalence relation is counter-
intuitive [12].
We proposed here a new methodology to reduce the feature vector dimension of a decision
system having attribute values crisp or fuzzy or mixed in nature. As fuzzy equivalence
relation does not suitably model approximate equality, we consider the intuitive notion
of distance measure to model approximate equality. More the two objects are close (i.e.
the smaller the distance between them) the more they are approximately equal. Based on
this intuitive design philosophy (to model approximate equality) we try to group similar
points (i.e. points that are approximately equal) on pattern space. To model approximate
equality between any two points (objects) in the pattern space, we select an appropriate
membership function. Hence, we introduce the notion of α-cut on fuzzy relation to achieve
a new equivalence relation to map it into rough set methodology.
The rest of this paper is structured as follows. Section 2 discusses the fundamentals of
rough set theory, in particular focusing on dimensionality reduction. Section 3 introduces a
new alternative method to reduce the feature vector dimension of a decision system having
attribute values crisp or fuzzy or mixed in nature. The proposed algorithm is defined in
section 4. Section 5 presents a worked-out example. An application is given in section 6.
Comparative study with other methods is shown in section 7. Section 8 is for conclusion.

2. Background
A rough set is an approximation of a vague concept by a pair of precise concepts, called lower
and upper approximations [17]. The lower approximation is a description of the domain
objects which are known with certainty to belong to the subset of interest, whereas the upper
approximation is a description of the objects which possibly belong to the subset. For further
reading comprehension, interested readers may be referred to [1, 2, 5, 6, 10, 13–16, 18, 20–28].
2.1 Rough Set Attribute Reduction
Rough sets have been employed to remove redundant conditional attributes from discrete-
valued data sets, while retaining their information content. Let I = (U, A) be an information
system, where U is a non-empty set of finite objects (the universe of discourse), A is a finite
nonempty set of attributes such that a : U → Va ∀ a ∈ A, Va being the value set of attribute
a. In a decision system, A = {C ∪ D} where C is the set of conditional attributes and D is
the set of decision attributes. With any P ⊆ A there is an associated equivalence relation
IN D(P ) :
IN D(P ) = {(x, y) ∈ U 2 | ∀ a ∈ P, a(x) = a(y)} (1)
The partition of U, generated by IN D(P ) is denoted U/P and can be calculated as follows:

U/P = ⊗{a ∈ P : U/IN D({a})} (2)


where
A ⊗ B = {X ∩ Y : ∀ X ∈ A, ∀ Y ∈ B, X ∩ Y 6= φ} (3)
If (x, y) ∈ IN D(P ), then x and y are indiscernible by attributes from P. The equiva-
lence classes of the P -indiscernibility relation are denoted [x]P . Let X ⊆ U, the P -lower
approximation of a set can now be defined as:

P X = {x | [x]P ⊆ X} (4)

27
Ray and Kolay

Let P and Q be equivalence relations over U, then the positive region can be defined as

P OSp (Q) = ∪X∈U/Q P X (5)

In terms of classification, the positive region contains all objects of U that can be classified
to classes of U/Q using the knowledge in attributes P. An important issue in data analysis
is discovering dependencies between attributes. Intuitively, a set of attributes Q depends
totally on a set of attributes P, denoted P ⇒ Q, if all attribute values from Q are uniquely
determined by values of attributes from P. Dependency can be defined in the following way:
k
For P, Q ⊆ A, Q depends on P in a degree k(0 ≤ k ≤ 1), denoted P = ⇒ Q, if

K = γp (Q) = |P OSp (Q)|/|U |, (6)

where |U | stands for the cardinality of set U.


If k = 1, Q depends totally on P, if 0 < k < 1 Q depends partially (in a degree k) on P,
and if k = 0, Q does not depend on P.
2.2 Reducts
The reduction of attributes is achieved by comparing equivalence relations generated by sets
of attributes. Attributes are removed so that the reduced set provides the same quality of
classification as the original. In the context of decision systems, a reduct is formally defined
as a minimal subset R of the conditional attribute set C such that γR (D) = γC (D).
2.3 Example
Consider an example decision system shown in the table below.

a b c d q
x1 1 0 1 2 0
x2 2 1 2 2 2
x3 1 1 2 1 1
x4 1 2 2 1 2
x5 0 0 1 2 1
x6 1 2 0 1 2

Here, U = {x1 , x2 , x3 , x4 , x5 , x6 }, A = {a, b, c, d, q}, where C = {a, b, c, d} and D = {q}.


Considering Q = {q} we get the partition of U as below:

U/Q = {X1 , X2 , X3 }

where X1 = {x1 }, X2 = {x2 , x4 , x6 }, X3 = {x3 , x5 }.


Now, considering P = {a}, we get the equivalence classes as below:

U/P = {{x1 , x3 , x4 , x6 }, {x2 }, {x5 }}.

Let us now find the lower approximations as below:

P X1 = {φ}, P X2 = {x2 }, P X3 = {x5 }.

The positive region P OSp (Q) = ∪X∈U/Q P X = {x2 , x5 }. Hence, {q} depends on {a} in a
degree k = γ{a} ({q}), where

γ{a} ({q}) = |P os{a} ({q})|/|U | = |{x2 , x5 }| / |{x1 , x2 , x3 , x4 , x5 , x6 }| = 2/6.

28
Approximate Equality for Feature Dimension Reduction

In the similar fashion, we can calculate the dependencies for all possible subsets of C as:

γ{a} ({q}) = 2/6, γ{b} ({q}) = 0, γ{c} ({q}) = 1/6, γ{d} ({q}) = 0,
γ{a,b} ({q}) = 1, γ{a,c} ({q}) = 4/6, γ{a,d} ({q}) = 3/6, γ{b,c} ({q}) = 2/6,
γ{b,d} ({q}) = 4/6, γ{c,d} ({q}) = 2/6, γ{a,b,c} ({q}) = 1, γ{a,b,d} ({q}) = 1,
γ{a,c,d} ({q}) = 4/6, γ{b,c,d} ({q}) = 4/6, γ{a,b,c,d} ({q}) = 1.

Here, {a, b} is the minimal subset of the conditional attributes for which dependency factor
is equal to 1. Hence, {a, b} is one reduct.

3. Feature Dimension Reduction When Attributes Contain Real Valued


Data
When the attributes contain real-valued data, it is not possible to find the reduct using
rough set method. For a decision system two domain values say 3.2 and 3.25 may be very
close in reality but still they will be considered as two different values in RST. To tackle
this problem we have proposed this new methodology, which can be applied to both the
system having discrete-valued data as well as real-valued data.
3.1 Feature Dimension Reduction Using Approximates Equality and α-cut
Let us consider an information (decision) system I = (U, A), where U is the finite nonempty
set of objects and A = C ∪ D is the finite nonempty set of attributes where C stands for
the finite nonempty set of conditional attributes and D stands for the finite nonempty set
of decision attributes.
To model approximate equality between any two points (objects) in the pattern space,
we select an appropriate membership function, as given below:

µr (xi , xj ) = {1 − d(xi , xj )/dmax }, (7)

where d(xi , xj ) is the distance between two objects xi and xj . dmax is the maximum distance
between any two objects.
A fuzzy τ -equivalence relation can be defined as below:
Definition. If a fuzzy relation r ⊆ U × U satisfies the following three conditions we call it
a fuzzy τ -equivalence relation [12]:
(a) ∀ xi ∈ U ⇒ µr (xi , xi ) = 1 (Reflexive)
(b) ∀ (xi , xj ) ∈ U × U, µr (xi , xj ) = µ ⇒ (xj , xi ) = µ (Symmetric)
(c) ∀ (xi , xj ), (xj , xk ), (xi , xk ) ∈ U × U µr (xi , xk ) ≥ τ (µr (xi , xj ), µr (xj , xk )) (Transitive).
But fuzzy equivalence relation is not suitable to model approximate equality [12]. Hence,
we introduce the notion of α-cut on fuzzy relation to achieve a new equivalence relation R0
as below:
R0 (P ) = {(xi , xj ) ∈ U 2 | ∀ a ∈ P, µr (xi , xj ) ≥ α}, (8)
where P ⊆ A.
If (xi , xj ) ∈ R0 (P ), then xi and xj are indiscernible by attributes from P. The equivalence
classes of the equivalence relation R0 of P are denoted [x]P . Let X ⊆ U, the P -lower
approximation of a set can now be defined as [7]

P X = {x | [x]P ⊆ X} (9)

Let P and Q be equivalence relations over U, then the positive region can be defined as:

P OSp (Q) = ∪X∈U/Q P X (10)

29
Ray and Kolay

And the dependency factor can be calculated as:

γp (Q) = |P OSp (Q)| / |U | (11)

3.1.1 Determination of the Value of α


The value of α is to be fixed to a minimum value for which γC (D) = 1. However, the value
of α can be set to a higher value also.
3.1.2 Reduct
Considering the same α-value, the reduct can be defined as the minimal subset R of C so
that γR (D) = 1.

4. Algorithm to Find an Optimal Reduct OPTREDUCT


C → set of all conditional features.
D → set of decision features.
(a) Taking R = C, find the minimum value of α say αmin , so that γR (D) = 1
(b) Set α to a value ≥ αmin
(c) R ← { }
(d) Do
(e) T ← R
(f) ∀ x ∈ (C − R)
(g) if γR∪{x} (D) > γT (D), where γR (D) = card (P OSR (D))/card (U )
(h) T ← R ∪ {x}
(i) R ← T
(j) until γR (D) = γC (D)
(k) return R
4.1 Complexity of the Algorithm
The complexity of our algorithm is no more an N-P hard type.

5. A Worked-out Example
Let us consider an information (decision) system I = (U, A) of emotion classification where
U is the finite nonempty set of objects and A = C ∪D is the finite nonempty set of attributes
where C stands for the finite nonempty set of conditional attributes and D stands for the
finite nonempty set of decision attributes.
Table 1 represents a decision system with

U = {x1 , x2 , x3 , . . . , x30 } and A = {a, b, c, d, e, f, g, q},

where C = {a, b, c, d, e, f, g} and D = {q}.


Conditional attribute a stands for ‘mouth length’, b stands for ‘mouth opening’, c stands
for ‘eye brow constriction’, d stands for ‘eye opening right’, e stands for ‘eye opening left’, f
stands for ‘upper lid right’, g stands for ‘upper lid left’. The q-value 1 stands for ‘happiness’,
2 stands for ‘disgust’, 3 stands for ‘anger’, 4 stands for ‘afraid’, 5 stands for ‘sadness’.
5.1 Application of OPTREDUCT Algorithm
First, we find a value of αmin = 0.9 for which γR (D) = 1. We set the value of α = 0.975 (to
a value higher than or equals to αmin ).
If we run this algorithm we get the reduct through the path shown in Fig. 1.

30
Approximate Equality for Feature Dimension Reduction

Table 1: Decision system of emotion classification.

a b c d e f g q
x1 156.9 33.75 55 14 14.5 20 21 1
x2 114.75 0 55 8.75 12 5 8.5 2
x3 124.5 0 55 20.25 22.5 2 7.5 3
x4 123.75 24 37.25 22.75 19 7 7.25 4
x5 149.75 0 45 22 21.75 18 15.25 5
x6 179.75 32.25 79.5 16 17.25 19 21 1
x7 125.25 5.75 42.75 7 12.25 7.75 6 2
x8 111.75 25 40 21 18 0 4.25 3
x9 121.25 10.75 102 30.25 27.25 36.75 32 4
x10 122.25 0 63.25 24.25 22.5 15.5 11.5 5
x11 144 23.5 80.25 16.5 18.25 11.75 12.75 1
x12 116.08 12.25 45.25 3.75 5.5 5.75 6.5 2
x13 124.5 13.5 43 0 11.5 0 0 3
x14 125 11.25 61 31.25 29.75 15.25 13 4
x15 112.5 0 47.25 22.5 20.75 18.5 16 5
x16 137.75 31.25 75.5 19.5 17.5 13.5 11.75 1
x17 131.25 0 82 17.75 17.5 14.5 11.25 2
x18 94.5 25 69.75 13.75 18.75 6.25 7 3
x19 121.75 38.75 69.75 27 28 33.75 34.25 4
x20 115.25 7.75 45.75 17.25 17.75 0 2.75 5
x21 166 23.25 80.5 14 16 25 23.25 1
x22 102.25 0 58.75 22.75 23 1.75 2.75 2
x23 102.5 0 31.75 20.25 19.25 0 0 3
x24 114.25 30.25 41 20 22 14 15 4
x25 123.25 0 41.75 17.5 16.5 25 25 5
x26 174 18.25 63.25 14.2 16.75 5.5 5.5 1
x27 107.17 29.5 54.75 8.25 11.25 2.75 3.75 2
x28 126.5 26.5 53.75 14.75 10.5 0 0 3
x29 126 12 21 18.25 17.75 42.75 45.5 4
x30 141.5 0 33.75 19.75 19 15 18.75 5

Fig. 1: OPTREDUCT route.

31
Ray and Kolay

5.2 Application of Shen’s QUICKREDUCT Algorithm


Now, we want to use Shen’s QUICKREDUCT algorithm on the decision system shown in
Table 1. To find the reduct from the above decision system, we should not use rough set
directly as the conditional attribute contains real values [7]. Here, we can use rough set
after redefining the datasets through characteristic function.
Let us define the characteristic functions as follows:

Ra0 (xi ) = 0 if Ra (xi ) ≤ 100 Rd0 (xi ) = 0 if Rd (xi ) ≤ 20


= 1 if 100 < Ra (xi ) ≤ 120 = 1 if Rd (xi ) > 20
= 2 if Ra (xi ) > 120 Re0 (xi ) = 0 if Re (xi ) ≤ 20
Rb0 (xi ) = 0 if Rb (xi ) ≤ 0 = 1 if Re (xi ) > 20
= 1 if 0 < Rb (xi ) ≤ 20 Rf0 (xi ) = 0 if Rf (xi ) ≤ 15
= 2 if Rb (xi ) > 20 = 1 if Rf (xi ) > 15
Rc0 (xi ) = 0 if Rc (xi ) ≤ 40 Rg0 (xi ) = 0 if Rg (xi ) ≤ 15
= 1 if Rc (xi ) > 40 = 1 if Rg (xi ) > 15.

Based on this we get the decision system shown in Table 2.


Table 2: Modified decision system of emotion classification.

R0 a b c d e f g q
x1 2 2 1 0 0 1 1 1
x2 1 0 1 0 0 0 0 2
x3 2 0 1 1 1 0 0 3
x4 2 2 0 1 0 0 0 4
x5 2 0 1 1 1 1 1 5
x6 2 2 1 0 0 1 1 1
x7 2 1 1 0 0 0 0 2
x8 1 2 0 1 0 0 0 3
x9 2 1 1 1 1 1 1 4
x10 2 0 1 1 1 1 0 5
x11 2 2 1 0 0 0 0 1
x12 1 1 1 0 0 0 0 2
x13 2 1 1 0 0 0 0 3
x14 2 1 1 1 1 1 0 4
x15 1 0 1 1 1 1 1 5
x16 2 2 1 0 0 0 0 1
x17 2 0 1 0 0 0 0 2
x18 0 2 1 0 0 0 0 3
x19 2 2 1 1 1 1 1 4
x20 1 1 1 0 0 0 0 5
x21 2 2 1 0 0 1 1 1
x22 1 0 1 1 1 0 0 2
x23 1 0 0 1 0 0 0 3
x24 1 2 1 0 1 0 0 4
x25 2 0 1 0 0 1 1 5
x26 2 1 1 0 0 0 0 1
x27 1 2 1 0 0 0 0 2
x28 2 2 1 0 0 0 0 3
x29 2 1 0 0 0 1 1 4
x30 2 0 0 0 0 0 1 5

32
Approximate Equality for Feature Dimension Reduction

Based on the above data if we try to find the dependency factor taking all conditional
attribute in account, we get the value = 22/30 ≤ 1. Hence, all the attribute values cannot
classify all objects to their decision classes. So, we have to define characteristic functions
in such a way that results more different classes.
Now, we redefine the functions as given below:

R00 a(xi ) = 0 if Ra (xi ) ≤ 100 R00 e(xi ) = 0 if Re (xi ) ≤ 10


=1 if 100 < Ra (xi ) ≤ 110 =1 if 10 < Re (xi ) ≤ 15
=2 if 110 < Ra (xi ) ≤ 120 =2 if 15 < Re (xi ) ≤ 20
=3 if 120 < Ra (xi ) ≤ 130 =3 if 20 < Re (xi ) ≤ 25
=4 if Ra (xi ) > 130 =1 if Re (xi ) > 25
R00 b(xi ) = 0 if Rb (xi ) ≤ 0 R00 f (xi ) = 0 if Rf (xi ) ≤ 0
=1 if 0 < Rb (xi ) ≤ 10 =1 if 0 < Rf (xi ) ≤ 5
=2 if 10 < Rb (xi ) ≤ 20 =2 if 5 < Rf (xi ) ≤ 10
=3 if 20 < Rb (xi ) ≤ 30 =3 if 10 < Rf (xi ) ≤ 15
=4 if Rb (xi ) > 30 =4 if 15 < Rf (xi ) ≤ 20
R00 c(xi ) = 0 if Rc (xi ) ≤ 20 =5 if Rf (xi ) > 20
=1 if 20 < Rc (xi ) ≤ 40 R00 g(xi ) = 0 if Rg (xi ) ≤ 0
=2 if 40 < Rc (xi ) ≤ 60 =1 if 0 < Rg (xi ) ≤ 5
=3 if 60 < Rc (xi ) ≤ 100 =2 if 5 < Rg (xi ) ≤ 10
=4 if Rc (xi ) > 100 =3 if 10 < Rg (xi ) ≤ 15
R00 d(xi ) = 0 if Rd (xi ) ≤ 0 =4 if 15 < Rg (xi ) ≤ 20
=1 if 0 < Rd (xi ) ≤ 20 =5 if Rg (xi ) > 20.
=2 if 20 < Rd (xi ) ≤ 30
=3 if Rd (xi ) > 30

Based on this we get the decision system shown in the Table 3.


Based on the above data if we try to find the dependency factor taking all conditional
attribute in account, we get the value = 30/30 = 1. Hence, all the attributes can classify
all objects to their decision classes.
So, now we can directly apply rough set QUICKREDUCT [7] algorithm and we get the
reduct {a, b, g} by following the below route shown in Fig. 2.
5.3 Comparison
Both the algorithms have same order of complexity and gives reduct having same number
of attributes. However, OPTREDUCT algorithm is more intuitive and straightforward.

6. Application
The information system shown in the appendix is taken from [11]. The system composed of
122 patients with duodenal ulcer, 11 pre operating attributes and a long-term result in the
visick grading [11]. Attributes 1-4 are concerned anamnesis and the remaining attributes
are related to pre-operating gastric secretion. Attribute 1 represents ‘sex’ which can take
the value either 0 for ‘male’ or 1 for ‘female’, attribute 4 represents ‘complication of ulcer’
which can take the value either 0 for ‘none’ or 1 for ‘acute haemorrhage’ or 2 for ‘multiple
haemorrhages’ or 3 for ‘perforation in the past’ or 4 for ‘pyloric stenosis’. All other at-
tributes, except 1 and 4 take arbitrary real values from intervals defined by extreme cases.
Attribute 2 represents ‘age (years)’, attribute 3 represents ‘duration of disease (years)’,
attribute 5 represents ‘HCl concentration (mmol HCl 100 ml−1 )’, attribute 6 represents

33
Ray and Kolay

Table 3: Next modified decision system.

R00 a b c d e f g q
x1 4 4 2 1 1 4 5 1
x2 2 0 2 0 1 1 2 2
x3 3 0 2 2 3 1 2 3
x4 3 3 1 2 2 2 2 4
x5 4 0 2 2 3 4 4 5
x6 4 4 3 1 2 4 5 1
x7 3 1 2 0 1 2 2 2
x8 2 3 1 2 2 0 1 3
x9 3 2 4 3 4 5 5 4
x10 3 0 3 2 3 4 3 5
x11 4 3 3 1 2 3 3 1
x12 2 2 2 0 0 2 2 2
x13 3 2 2 0 1 0 0 3
x14 3 2 3 3 4 4 3 4
x15 2 0 2 2 3 4 4 5
x16 4 4 3 1 2 3 3 1
x17 4 0 3 1 2 3 3 2
x18 0 3 3 1 2 2 2 3
x19 3 4 3 2 4 5 5 4
x20 2 1 2 1 2 0 1 5
x21 4 3 3 1 2 5 5 1
x22 1 0 2 2 3 1 1 2
x23 1 0 1 2 2 0 0 3
x24 2 4 2 1 3 3 3 4
x25 3 0 2 1 2 5 5 5
x26 4 2 3 1 2 2 2 1
x27 1 3 2 0 1 1 1 2
x28 3 3 2 1 1 0 0 3
x29 3 2 1 1 2 5 5 4
x30 4 0 1 1 2 3 4 5

Fig. 2: QUICKREDUCT route.

34
Approximate Equality for Feature Dimension Reduction

‘volume of gastric juice per 1 h (ml)’, attribute 7 represents ‘volume of residual gastric juice
(ml)’, attribute 8 represents ‘basic acid output (BAO) (mmol HCl 100 ml−1 )’, attribute
9 represents ‘HCl concentration (mmol HCl 100 ml−1 )’, attribute 10 represents ‘volume of
gastric juice (ml)’, and attribute 11 represents ‘maximal acid output (MAO) (mmol HCl
100 ml−1 )’.
6.1 Experimental Results
From appendix we have the decision system with

U = {x1 , x2 , . . . , x122 } and A = {1, 2, . . . , 11, d},

where C = {1, 2, . . . , 11} and D = {d}.


After running OPTREDUCT algorithm on this decision system we get {5, 6, 7, 8, 10} as
an optimal reduct.

7. Comparative Study
If we reduce the dimensionality of the feature vector the correlated information is elimi-
nated. As a result, a loss of accuracy in classification (using the feature vector with reduced
dimension) is incurred. To test the performance of the present method we compare it with
standard techniques of principal component analysis (PCA) [8], Kernel principal component
analysis (KPCA) [19] and independent component analysis (ICA) [4] for classification with
reduced dimension in multilayer perception (MLP) and support vector machine (SVM). A
benchmark of performance is established by considering classification in MLP and SVM
on full data set, without the benefit of dimensionality reduction. The Abalone, Horse colic
and Pima Indians data sets, obtained from UCL database repository [3], are used for the
present study.
Table 4: Details of data sets used.

Data set Attributes Class Remark


(features)
Abalone 8 3 Overlapped classes with unstructured domain
Horse colic 27 2 Mixture of continuous, discrete and nominal with
30 % of attributes (features) are missing
Pima Indians 8 2 Highly overlapping classes

Table 5-7 show the performance of each method used. From the results it is clear that
the proposed method produces satisfactory results compared to the others. Therefore, the
proposed method is comparable with the conventional methods of classifications. Such
comparison of classification results on a set of benchmark data is simply a rough estimate
about the performance of the proposed method. Hence, the proposed method can be treated
as an alternative tool for classification.

8. Conclusion
In this paper we have successfully applied the basic notion of modeling approximate equality
as proposed in [9, 12] and develop an algorithm for optimal reduction of feature vector
dimension. The basic concept of the proposed method is essentially derived from Rough
Set Theory (RST).
Though RST is a good tool for feature dimension reduction, for information systems
having real-valued data this technique cannot be applied directly.

35
Ray and Kolay

Table 5: Results of Abalone data set.

MLP SVM
Method Inputs Classification Classification
Score Score
Full data Set 8 91 % 92 %
PCA 4 90 % 91 %
KPCA 3 89 % 87 %
IPCA 5 82 % 83 %
Present Method 3 90 % 92 %

Table 6: Results of Horse Colic data set.

MLP SVM
Method Inputs Classification Classification
Score Score
Full data Set 27 91 % 94 %
PCA 10 89 % 92 %
KPCA 13 85 % 89 %
IPCA 15 82 % 83 %
Present Method 8 90 % 91 %

Table 7: Results on Pima Indians.

MLP SVM
Method Inputs Classification Classification
Score Score
Full data Set 8 78 % 79 %
PCA 5 89 % 88 %
KPCA 6 82 % 83 %
IPCA 7 81 % 80 %
Present Method 3 89 % 90 %

To overcome the limitation of RST, Fuzzy-rough based approach may be used. But the
fuzzy-rough based approach is pivoted on fuzzy equivalence relation, which is characterized
by fuzzy transitivity property. Unfortunately fuzzy transitivity property is counter-intuitive
in nature [12].
Hence, in the proposed method we avoid all the above said drawbacks and develop a tool
for optimal reduction of feature vector dimension of any kind of information system having
discrete or real or mixed valued data.
In section 7 we compare the performance of the present approach with standard PCA,
KPCA and ICA and we get very satisfactory results. As the performance of the proposed
method for classification on benchmark data is comparable with existing methods; hence
the proposed method can be treated as an alternative tool for classification.

36
Approximate Equality for Feature Dimension Reduction

References
[1] B. De Baets and R. Mesiar, Pseudo-metrics and T-equivalences, The Journal of Fuzzy Mathe-
matics, Vol. 5, No. 2, pp. 471-481, 1997.
[2] A.J. Bell and T.J. Sejnowski, The indipendent components of natural scenes are edge filters,
Vision Res, Vol. 37, No. 23, pp. 3327-3338, 1997.
[3] C.L. Blake and C.J. Merz, http://www.ics.uci.edu/mlearn/ ML Repository.
[4] P. Comon, Independent component analysis, a new concept, Signal Process, Vol. 36, No. 3,
pp. 287-314, 1994.
[5] K.I. Diamantaras and S.Y. Kung, Principal Component Neural Networks, Wiley, New York,
1996.
[6] V. Janis, Resemblance is a nearness, Fuzzy Sets and Systems, Vol. 133, pp. 171-173, 2003.
[7] R. Jensen and Q. Shen, Semantics-preserving dimensionality reduction: rough and fuzzy-rough-
based approaches, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 12,
pp. 1457-1471, 2004.
[8] I.J. Jollie, Principal Component Analysis, Springer, New York, 1986.
[9] F. Klawonn, Should fuzzy equality and similarity satisfy transitivity? Comments on the paper by
M. De Cock and E. Kerre, Fuzzy Sets and Systems, Vol. 133, pp. 175-180, 2003.
[10] R. Kruse, J. Gebhardt, and F. Klawonn, Foundations of Fuzzy Systems, Wiley, Chichester,
1994.
[11] K.S. Lowinski, Sensitivity analysis of rough classification, Int. J. Man-Machine Studies, Vol. 32,
pp. 693-705, 1990.
[12] M. De Cock and E. Kerre, On (un)suitable fuzzy relations to model approximate equality, Fuzzy
Sets and Systems, Vol. 133, pp. 137-153, 2003.
[13] V. Murali, Fuzzy equivalence relations, Fuzzy Sets and Systems, Vol. 30, pp. 155-163, 1989.
[14] E. Oja, The nonlinear PCA learning rule and signal separation- mathematical analysis, Neu-
rocomputing, Vol. 17, pp. 25-45, 1997.
[15] S. Ovchinnikov, Similarity relations, fuzzy partitions, and fuzzy orderings, Fuzzy Sets and
Systems, Vol. 40, pp. 107-126, 1991.
[16] S. Ovchinnikov, Representations of transitive fuzzy relations, in: H.J. Skala, S. Termini, E.
Trillas (Eda), Aspect of Vagueness, D. Reidel Pub-lishing Company, Dordrecht, MA, pp. 105-118,
1984.
[17] Z. Pawlak, Rough sets: Theoretical Aspects of Reasoning About Data, Kluwer Academic Pub-
lishing, Dordrecht, 1991.
[18] H. Poincare, La Valeur de la Science, Flammarion, Paris, 1904.
[19] R. Rosipal, M. Girolami, L.J. Trejo, and A. Cichocki, Kernel PCA for feature extraction and
de-noising in non-linear regression, Neural Comput. Appl, Vol. 10, No. 3, pp. 231-243, 2001.
[20] E. Ruspini, A new approach to clustering, Inform. Control, Vol. 15, pp. 22-32, 1969.
[21] B. Scholkopf, A. Smola, and K.R. Muller, Nonlinear component analysis as a Kernel eigenvalue
problem, Neural Comput., Vol. 10, pp. 1299-1319, 1998.
[22] H. Thiele, On the mutual defineability of fuzzy tolerance relations and fuzzy tolerance coverings,
in: Proc. 25th Internat. Symp. On Multiple Valued Logic, IEEE Computer Society, Los Alamitos,
CA, pp. 140-145, 1995.
[23] H. Thiele and N. Schmechel, The mutual defineability of fuzzy equivalence relations and fuzzy
partitions, in: Proc. Internat. Joint Conf. of the Fourth IEEE International Conference on Fuzzy
Systems and the Second International Fuzzy Engineering Symposium, Yokohama, pp. 1383-1390,
1995.
[24] H. Thiele, On similarity based fuzzy clusterings, in: D. Dubois, E.P. Klement, H. Prade (Eds.),
Fuzzy Sets, Logics and Reasoning about Knowledge, Kluwer Academic Publishers, Dordrecht,
pp. 289-299, 1999.
[25] E. Tsiporkova and H.J. Zimmermann, Aggregation of Compatibility and Equality: A New Class
of Similarity Measures for Fuzzy Sets, In Proceedings IPMU, pp. 1769-1776, 1998.
[26] L.A. Zadeh, Similarity relations and fuzzy orderings, Inform. Sci., Vol. 3, pp. 177-200, 1971.
[27] L.A. Zadeh, A Fuzzy-Sets-Theoretic Interpretation of Linguistic Hedges, Journal of Cybernetics,
Vol. 2, No. 3, pp. 4-34, 1972.
[28] L.A. Zadeh, Calculus of Fuzzy Restrictions, In Fuzzy Sets and Their Applications to Cognitive
and Decision Processes (L.A. Zadeh, K.S. Fu, K. Tanaka, M. Shimura (Eds.)), Academic Press,
New York, pp. 1-40, 1975.

37
Ray and Kolay

Appendix

Patient no. Pre-operating attributes Result


1 2 3 4 5 6 7 8 9 10 11 d
x1 1 46 12 0 5.6 79 50 4.4 19 119 22.6 1
x2 0 27 3 1 12.5 58 15 7.3 26 120 31.2 1
x3 0 25 6 0 11.5 77 15 8.9 16.1 93 15 1
x4 0 48 3 0 15.6 29 2 4.5 28.7 186 53.4 1
x5 1 26 0.5 0 7.6 80 45 6.1 17.1 101 17.2 3
x6 0 32 5 1 11.9 56 100 6.7 13.6 150 20.4 1
x7 0 26 2 1 6.1 19 8 1.2 14.8 58 8.6 1
x8 1 28 2 1 6 46 40 2.2 20.4 65 13.3 1
x9 0 55 30 0 16.8 118 12 19.8 40.4 172 69.6 1
x10 0 21 5 3 20.9 111 32 23.2 34.5 270 93.1 2
x11 0 37 2 0 12.6 152 30 19.2 38.7 202 78.2 1
x12 0 48 5 2 2.3 73 6 1.7 5.5 199 10.9 2
x13 0 43 20 3 8.1 97 32 7.8 11 120 13.2 2
x14 0 30 2 0 10 15 15 1.5 18.8 121 22.7 1
x15 0 49 14 2 11.7 118 38 13.8 23.2 266 52.5 1
x16 0 27 3 1 9.5 154 25 14.6 13.5 141 19.1 1
x17 0 28 10 0 20.9 178 26 36.1 23.3 214 49.8 1
x18 1 40 4 0 8.1 62 17 5 5.6 41 2.3 4
x19 0 60 20 0 13.4 107 27 14.3 19 335 63.5 1
x20 0 22 4 0 3.5 176 40 6.1 5.6 190 10.6 2
x21 0 21 4 0 1 155 66 1.6 2.6 160 4.2 1
x22 0 21 6 4 4 360 210 14.4 3.4 211 7.1 1
x23 0 28 0 1 6 152 15 9.2 9.8 227 22.3 1
x24 0 31 2 3 1.8 60 10 1.1 12.3 117 14.4 3
x25 0 37 3 0 8.5 94 20 8 17.3 188 32.6 1
x26 0 22 2 0 8.3 111 28 9.2 20.8 192 39.8 1
x27 0 43 5 0 1.9 401 53 7.5 16.3 94 15.2 1
x28 1 59 1 0 4.8 30 12 1.4 9.3 27 5.2 1
x29 0 32 3 0 2.8 164 35 4.5 10.3 178 18.3 1
x30 0 34 8 0 6.3 82 13 5.2 7.4 130 9.6 1
x31 0 51 1 0 8.6 87 25 7.5 13.7 230 31.4 1
x32 0 41 20 0 2.6 29 15 0.8 6.1 108 6.6 1
x33 1 50 5 1 2.5 44 120 1.1 4.2 49 2.1 1
x34 0 24 2 0 14.1 160 22 22.5 21.2 209 44.4 1
x35 0 32 3 0 9 122 45 10.9 15.7 223 35 1
x36 0 30 8 0 8.5 121 26 10.3 5.7 261 11.4 1
x37 0 63 2 0 5.8 60 34 3.5 8.7 133 11.5 1
x38 0 30 2 1 1.7 171 60 2.1 4.7 139 6.6 1
x39 0 21 4 0 14.7 182 31 26.8 77.5 379 104.2 4
x40 0 42 6 0 6.8 319 254 21.8 9.7 266 25.7 1
x41 0 71 4 2 2 34 27 1.1 4.2 185 7.8 4
x42 0 34 2 0 4.1 212 32 8.7 5.3 154 8.1 4
x43 0 54 2 3 5.3 166 124 8.7 6.8 236 16 3

38
Approximate Equality for Feature Dimension Reduction

x44 0 60 0 1 11.4 127 30 14.5 9.3 148 13.8 2


x45 0 33 2 2 8.7 135 54 11.8 29 186 53.8 2
x46 0 40 20 1 11.6 123 88 14.2 22 152 33.3 1
x47 1 32 10 1 10.3 120 20 12.3 11.9 135 16.1 1
x48 0 37 3 0 7.5 86 21 6.4 15 189 28.3 1
x49 1 31 5 3 4 56 43 2.2 7.4 137 10.2 1
x50 0 25 7 3 2.2 184 10 4.1 5.4 459 24.7 1
x51 1 27 1 3 3.1 140 60 4.4 6.6 167 11 2
x52 0 56 15 1 8.3 60 17 5 11.4 72 8.2 1
x53 0 23 2 0 6 133 26 8 11.5 113 13 1
x54 0 23 14 0 2.9 191 23 5.6 15.5 136 21.1 2
x55 0 36 6 3 5.6 140 35 7.9 12.5 129 16.1 1
x56 1 27 7 1 7.1 270 180 19.1 11.2 345 38.7 1
x57 0 51 3 1 3.5 111 50 3.8 15.1 212 32 1
x58 0 31 0.5 3 4.7 525 105 24.7 10.8 627 67.7 1
x59 0 50 8 4 10.6 185 21 19.6 25.3 224 56.6 4
x60 1 31 12 0 2 45 63 0.9 7.1 165 11.7 1
x61 1 47 2 0 26.1 68 46 17.7 28 307 86 1
x62 0 34 4 2 8.8 95 32 8.3 11.8 183 21.6 2
x63 0 42 1 3 3.7 514 75 19.2 12.5 312 39.1 4
x64 0 27 2 2 4 96 14 3.8 14.9 69 10.3 4
x65 1 32 0.5 0 7.8 69 78 5.4 16.7 51 8.5 3
x66 1 35 3 0 2.3 43 28 1 8.3 90 7.5 1
x67 0 36 10 0 3.2 79 38 2.6 9.2 165 15.2 1
x68 0 34 2 0 5.5 108 80 6 11.1 121 13.4 1
x69 0 27 4 0 3.3 159 72 5.2 5 127 6.3 1
x70 1 32 7 0 6.1 43 74 2.6 10.8 326 35.1 1
x71 1 37 15 2 2.2 112 35 2.4 16.7 53 8.7 1
x72 0 35 7 0 4.4 118 38 5.2 5.7 129 7.4 1
x73 0 28 15 4 7.3 23 110 1.7 9.8 21 20.6 2
x74 0 45 24 0 1.4 60 28 0.9 7.1 146 10.3 1
x75 1 27 10 0 21 187 225 39.1 39.1 387 151.4 4
x76 0 27 4 0 10.6 127 30 14 11 430 45.6 1
x77 0 26 3 0 3.8 283 43 11 11.7 260 30.3 1
x78 0 27 4 0 4.6 79 20 3.6 8.7 184 16.1 1
x79 0 28 1 1 1 214 40 2.1 8.6 442 37.9 3
x80 0 50 32 0 5.1 171 30 8.8 5.1 135 7 2
x81 0 28 11 0 4.3 145 65 6.3 6 196 11.8 4
x82 0 27 4 0 6 225 50 13.6 18.8 129 24.2 3
x83 0 48 10 0 11 102 20 11.2 16.3 142 23.2 1
x84 0 30 10 0 9.4 249 70 23.5 18.6 194 36.1 1
x85 1 34 15 0 15.9 136 60 21.6 17.8 184 32.8 1
x86 0 22 3 0 10.6 198 30 20.9 11.9 188 22.4 2
x87 0 30 5 0 8.6 155 37 13.3 13.9 232 32.1 2
x88 0 51 1 1 14.9 80 20 11.9 20.7 128 26.5 1
x89 1 30 10 0 6.8 136 100 9.3 20.7 128 26.5 1

39
Ray and Kolay

x90 0 30 5 0 7.4 213 90 15.7 10.5 266 28 2


x91 0 35 4 0 3.8 57 116 2.2 10.4 191 19.8 1
x92 0 30 10 0 7.6 158 22 12 12.1 169 20.4 4
x93 0 43 6 0 3.1 122 15 3.8 1.6 208 3.4 1
x94 0 42 10 0 11.7 159 132 18.6 19.6 127 24.9 1
x95 1 45 12 0 5.2 53 32 2.7 13.8 286 39.5 4
x96 0 34 1.5 1 4.5 104 70 4.6 12.4 263 32.6 2
x97 0 36 5 0 7.1 110 26 7.9 13.5 277 37.4 1
x98 0 30 9 0 4.3 134 55 5.8 8.8 336 29.6 1
x99 0 31 5 2 2.5 19 134 0.48 9.1 149 13.5 1
x100 0 25 9 0 8.2 60 78 4.9 14.2 151 21.4 1
x101 0 30 10 1 1.5 122 80 1.9 5.3 220 11.6 4
x102 0 33 5 2 5.7 68 10 3.9 6.4 245 15.6 1
x103 0 32 2 0 6 187 60 11.2 11 285 31.4 2
x104 0 45 22 2 8.7 80 90 7 42.3 270 114.3 1
x105 0 38 2 0 5.8 58 8 3.4 7.1 148 10.6 4
x106 1 56 0.83 0 8.8 73 30 6.4 20 68 13.7 1
x107 1 45 11 0 6.3 50 105 3.1 13.2 91 12 4
x108 0 32 2 1 8.9 143 75 12.8 10.9 280 30.4 1
x109 0 60 2 0 4.2 195 50 8.1 6.5 265 17.3 3
x110 1 44 3 2 3.7 86 5 3.2 7.7 170 13.1 2
x111 0 49 4 0 6.3 180 15 11.4 21 115 84.2 1
x112 1 28 10 0 9.5 98 60 9.3 14.7 134 19.7 1
x113 0 26 2 0 8.3 82 60 6.8 26.3 330 86.9 1
x114 0 39 5 0 7.5 137 14 10.3 10.7 160 17.1 1
x115 0 49 9 1 3.1 150 40 4.6 7 261 18.4 1
x116 0 30 1 1 17.4 76 29 13.2 24.8 229 56.7 1
x117 0 52 4 1 5.7 45 27 2.6 15.4 242 37.2 1
x118 0 45 3 0 5.2 67 128 3.5 11.8 230 27.1 3
x119 0 53 7 0 7.4 68 30 5 8.7 140 12.2 1
x120 0 29 6 0 15.7 120 40 18.8 12.3 220 27 2
x121 0 28 4 0 8.9 88 28 7.8 12.3 163 20 2
x122 0 38 5 2 1 128 6 1.3 5.8 145 8.4 1

40

You might also like