New Approaches For Data Reduction: A. S. Salama at H. M. Abu-Donia @

New Approaches for Data Reduction
A. S. Salama
Mathematics Department, Faculty of Science, Tanta University , Egypt.
e-mail: amgadsalama2003@yahoo.com
H. M. Abu-Donia
Mathematics Department, Faculty of Science, Zagazig University, Egypt.
e-mail: donia− 1000@yahoo.com
Abstract
In this paper, we studied some topological properties of information systems and
we introduced three new approaches for data reduction. Topological approach for
data reduction is a new method to deal with general types of relations. The reducts
of an information systems has here some orders (first order, second order, and so
on) and also the core. The second approach depending on the comparing the values
of each subset of the set of condition attributes with the decision attribute. The
evaluation of reducts and the core by the second approach is a quick and efficiently
method for data reduction than the classical methods. The last approach depending
on the notion of topological covering.
Keywords: Information Systems, Topological Spaces, and Data Reduction.
1. Introduction
Information systems were introduced by Z. Pawlak in (1982) [12,14,15,16] are excellent
tools to handle a granularity of data. It may be used to describe dependencies between
attributes, to evaluate significance of attributes, and to deal with inconsistent data, to
name just a few possible uses out of many ways of analysis information systems to real
world problems. Most important it is an approach to handling imperfect data.
The calculus of these systems is based on an objective viewpoint of the world; i. e., all
computations of the calculus is based on existing data characteristics. Many other ways of
handling imperfect data are subjective, world situations by experts.
The notion of indiscernibility, the main idea of rough set theory, is closely related to
data granularity. The notion of indiscernibility may be introduced in two different ways.
First, it may discuss in the more general but less intuitive form as the fair of the universe.
1
Second, it may also be presented in the form of a table, called an information table or
an information system. We will use the latter approach in this paper because it is more
application oriented.
The rough set theory is based also on complete information systems [10,11,13]. It
classifies objects using upper - approximation and lower - approximation defined on the
indiscernibility relation. In order to process incomplete information systems, the rough
set theory needs to be extended, especially, the indiscernibility relation needs to be ex-
tended to some in-equivalent relations there are several extensions for the indiscernibility
relation at present [7,9], such as tolerance relations non-symmetric similarity relations and
complementarily relations.
2. Topological properties of information systems
Studying the topological applications of information systems appeared in [5,7,9]. An
information system can be defined by a quadruple, S = (Ob, At, {Va : a ∈ At}, fa ) where:
−Ob is a finite non-empty set of objects,
−At is a finite non-empty set of attributes,
−Va is a finite non-empty set of values of a ∈ At,
−fa : Ob −→ P (Va ) is an information function.
For any object x ∈ Ob when fa (x) ∈ Va for all a ∈ At, the information system S in
this case called single value information system (Pawlak systems). On the other hand,
when fa (x) ∈ P (Va ) it is called set value information system .An information system S is
complete if for all a ∈ At and for all x ∈ Ob, fa (x) 6= φ.
Special type of information systems, which is called a nominal information system and it is
defined as: S = (Ob, , {Infn }n∈M ) where −M is a finite set of positive integers, −Ob is a
P
finite non-empty set of objects, − is a finite non - empty set of alphabet of information
P
symbols, −Infn : Ob −→ for n ∈ M is information function satisfying the condition:

P
m m
For every n, m ∈ M and w ∈ , if n > m, then Infn−1 (Infn (Xw )) = Xw . Where
P
m m
Xw = Infn−1 (w) = {x ∈ Ob : Infm −1
(x) = w}. The set Xw is called information set with
A
information w on the level m. For any subsets A ⊆ M and B ⊆ , the set XB , where
P
A
XB = {x ∈ Ob : Infn−1 (x) ∈ B, n ∈ A} is called an information set with information set B
on the level n ∈ A.
Example 2.1 Consider Ob = {x1 , x2 , x3 , x4 , x5 } be a set of objects and let M = {1, 2, 3}
= {a, b, c, d, e, f, g, h, i} is the set of alphabet of information symbols. Then Table
P
and
2.1 represents a nominal information system.
2
Ob Inf1 Inf2 Inf3
x1 a d i
x2 a e h
x3 b e g
x4 c f g
x5 b d i
Table (2.1)
According to Table (2.1) we have
Xa1 = {x1 , x2 }, Xb1 = {x3 , x5 }, Xc1 = {x4 }
Xd2 = {x1 , x5 }, Xe2 = {x2 , x3 }, Xf2 = {x4 }
Xi3 = {x1 , x5 }, Xh3 = {x2 }, Xg3 = {x3 , x4 }

{2,3} {2,3} {2,3} {2,3}
X{d,i} = {x1 , x5 }, X{e,h} = {x2 }, X{e,g} = {x3 }, X{f,g} = {x4 }
Proposition 2.1 The families {Xwn }, n ∈ M , w ∈

P
are partitions of objects in any
nominal information system S = (Ob, , {Infn }n∈M ).
P
Proof The families {Xwn }, n ∈ M , w ∈

P
are the families of equivalence classes of the
equivalence relation En defined on Ob by: (x, y) ∈ En , n ∈ M iff Infn (x) = Infn (y).
Proposition 2.2 The families {Xwn }, n ∈ M , w ∈

P
are topological bases.
Proof From Proposition 2.1.
Lemma 2.1 If β is a base for a topological space (U, τ ), where β is a partition of U ,

then for every subset X ⊆ U :
(i) Intτ (X) = {B ∈ β : B ⊆ X}

S
(ii) Clτ (X) = {B ∈ β : B ∩ X 6= φ}

S
Proof Only we prove (ii) because (i) is trivial. Let x ∈ Clτ (X), then for every open set G
containing x, X ∩ G 6= φ. But G = B, then there exists B0 ∈ β such that x ∈ B0 ⊆ G.
S
B∈β
But B0 is an open set containing x, hence B0 ∩ X 6= φ and x ∈ {B ∈ β : B ∩ X 6= φ}.
S
Conversely, if x ∈ {B ∈ β : B ∩ X 6= φ} and G is an open set containing x and β is a

S
partition of U , x ∈ U , then x belongs to only one element of β say x ∈ B0 . Then must

B0 ⊆ G, i. e., x ∈ B0 ⊆ G but B0 ∩ X 6= φ, hence G ∩ X 6= φ. Then x ∈ Clτ (X).
Let S = (Ob, , {Infn }n∈M ) be a nominal information system. For any subset Y of Ob
P
A A A A
and A ⊆ M we define: L(X) = {XB : XB ⊆ Y } and U (X) = {XB : XB ∩ Y 6= φ}.
S S
P
For a given subset B of , L(X) and U (X) are called the lower and the upper approxima-
tions of Y in S respectively. According to Example 2.1, if A = {1} and B = {a, b, c} then
L(Y ) = {x4 } and U (Y ) = {x1 , x2 , x4 } for the subset Y = {x1 , x4 }.
3
Theorem 2.1 Let S = (Ob, , {Infn }n∈M ) be a nominal information system. For any
P
Y ⊆ Ob, A ⊆ M and B ⊆ we have Intτ (Y ) = L(Y ) and Clτ (Y ) = U (Y ), where τ is the

P
A
topology generated by the base {XB }.
Proof. From Lemma 2.1.
A
Theorem 2.2 Topologies generated by the bases {XB } are all quasi-discrete topologi-
cal spaces.
Proof. A set X in a quasi-discrete topological space is open iff it is closed. If X is open
A A 0
then X = {XB : A ⊂ M, B ⊆ }. Hence X = U − {XB : B ⊆ −B} so X is
S P S P
A
closed. Also if X is closed then X = U − {XB : B ⊆ } for some B ⊆ . Hence
S P P
A 0
X = {XB : B ⊆ −B} and X is open.
S P
According to Example 2.1, the following are bases for topologies on Ob:
1 1 1
β1 = {Xa , Xb , Xc } = {{x1 , x2 }, {x3 , x5 }, {x4 }}
2 2 2
β2 = {Xd , Xe , Xf } = {{x1 , x5 }, {x2 , x3 }, {x4 }}
3 3 3
β3 = {Xi , Xh , Xg } = {{x1 , x5 }, {x2 }, {x3 , x4 }}
{2,3} {2,3} {2,3} {2,3}
β4 = {X{d,i} , X{e,h} , X{e,g} , X{f,g} } = {{x1 , x5 }, {x2 }, {x3 }, {x3 }}.
If Y = {x2 , x3 , x5 } be a subset of Ob, then with respect to the base β3 we have L(Y ) = {x3 },
U (Y ) = {x1 , x2 , x3 , x4 , x5 }, Intτβ (Y ) = {x3 } and Clτβ (Y ) = {x1 , x2 , x3 , x4 , x5 }.
3 3
Let S = (Ob, , {Infn }n∈M ) be a nominal information system. For any two levels
P
0 n m
n, m ∈ M and any values w, w ∈ let {Xw } and {X 0 } be two partitions of the set of
P
w
objects Ob defined by the equivalence relations Infn and Infm respectively. Then we say
m n n m
that the partition {X 0 } depends on the partition {Xw } denoted {Xw } ≤ {X 0 } if and
m S n w m m
w
only if: X 0 = n Xw For all X 0 ∈ {X 0 }.
w w w
n m
Theorem 2.3 Let τn and τm be the topologies induced by the partitions {Xw } and {X 0 }
w
n m
respectively. Then {Xw } ≤ {X 0 } iff τm ⊆ τn .
w
m m
Proof Let G ∈ τm be an open set, then G = m X 0 , X 0 ⊆ G for some m ∈ M , where
S
w S Sw
m m S n n n
{X 0 } is a base of τm . But X 0 = n Xw , hence G = m n Xw i.e., G =
S
Xw which
w w max{n,m}
implies that G ∈ τn , hence τm ⊆ τn . Conversely, if τm ⊆ τn , then for every G ∈ τm also
m S n m n
implies G ∈ τn , hence G = X 0 = n Xw then there exist n0 such that X 0 = n Xw0 .
S S
m w w 0
n m
Hence {Xw } ≤ {X 0 }.
w
Example 2.2 Consider the partitions β1 = {{x1 , x2 }, {x3 }, {x4 }} and β2 = {{x1 , x2 }, {x3 , x4 }}
of the set U = {x1 , x2 , x3 , x4 }. Then β1 ≤ β2 and τ2 ⊆ τ1 where
τ1 = {U, φ, {x3 }, {x4 }, {x1 , x2 }, {x3 , x4 }, {x1 , x2 , x3 }, {x1 , x2 , x4 }} and
4
τ2 = {U, φ, {x1 , x2 }, {x3 , x4 }} are the topologies generated by β1 and β1 respectively.
For any topological space (U, τ ) , we define the equivalence relation E(τ ) on the set U
by: (x, y) ∈ E(τ ) iff Clτ ({x}) = Clτ ({y}) x, y ∈ U . The set of all equivalence classes of
E(τ ) is denoted by U/E(τ ).
Theorem 2.4 Let S = (U, , {Infn }n∈M ) be a nominal information system and let τn
P
be the topology generated by the base βn = {Xwn : n ∈ M, w ∈ }. If (U, τ ) be any

P
quasi-discrete topological space has U/E(τ ) as a base. Then τn = τ iff for all x ∈ Xwn there
exists B ∈ U/E(τ ) such that x ∈ B.
Proof If for all x ∈ Xwn there exists B ∈ U/E(τ ) and x ∈ B, then Xwn = B hence
B ∈ U/E(τ ) and τn = τ .
Conversely, if τn = τ , and τn is a quasi-discrete topological space generated by βn ,

then τn and τ have the same base i.e., for all Xwn ∈ βn there exist B ∈ U/E(τ ) such that
Xwn = B, hence for all x ∈ Xwn , there exist B ∈ U/E(τ ) such that x ∈ B.
Lemma 2.2 [9] For any topology τ on a set U , and for all x, y ∈ U , if x ∈ Clτ ({y})
and y ∈ Clτ ({x}) then Clτ ({x}) = Clτ ({y}).
Lemma 2.3 [9] If τ is a quasi-discrete topology on a set U , then y ∈ Clτ ({x}) im-
plies x ∈ Clτ ({y}) for all x, y ∈ U .
Lemma 2.4 [9] If τ is a quasi-discrete topology on a set U , then the family {Clτ ({x}) :
x ∈ U } is a partition of U .
Proposition 2.3 Let τ be the topology induced by the partition βn = {Xwn : n ∈ M, w ∈
} on the set Ob, where S = (Ob, , {Infn }n∈M ) be a nominal information system. Then
P P
βn = Ob/E(τ ).
Proof
[
x ∈ B, B ∈ βn ⇔ x ∈ Clτ (B) = Clτ ({y})
y∈B
⇔ y0 ∈ B and x ∈ Clτ ({y0 })

⇔ Clτ ({x}) = Clτ ({y0 }) (Lemma2.2)
⇔ (x, y0 ) ∈ E(τ )
⇔ ∃A ∈ Ob/E(τ ) suchthat x ∈ A
⇔ βn = Ob/E(τ )
, {Infn }n∈M ) and for all n ∈ M we

P
For any nominal information system S = (Ob,
{Ob/E(τn )}.
T
define the partition Ob/E(τind ) =
n∈M
5
Theorem 2.5 For any nominal information system S = (Ob, , {Infn }n∈M ), then τn ⊆ τind
P
where τn and τind are the topologies generated by the partitions Ob/E(τn ) and Ob/E(τind )
respectively.
Proof Since Ob/E(τind ) ≤ Ob/E(τn ) for all n ∈ M then τn ⊆ τind (Theorem 2.3).
Example 2.3 Consider the topological space (U, τ ) where U = {x1 , x2 , x3 , x4 } and
β = {{x1 }, {x2 , x3 }, {x4 }} is the base of τ , then τ is a quasi-discrete topology and:
Clτ ({x1 }) = {x1 }, Clτ ({x2 }) = {x2 , x3 }, Clτ ({x3 }) = {x2 , x3 }, Clτ ({x4 }) = {x4 }.
Then U/E(τ ) = {{x1 }, {x2 , x3 }, {x4 }} = β.
3. Topological Reduction of Data

By reduction we mean if we can remove some data from the data table given in our
information system preserving its basic properties. To express this idea more precisely, let
S = (Ob, At, {Va : a ∈ At}, fa ) be an information system (numerical system ). Let r be a
positive real, for each object x ∈ Ob and for a ∈ At, Na (x, r) is the a−neighborhood of x
and defined by:
Na (x, r) = {y ∈ Ob :| fa (x) − fa (x) |≤ r}.
For any subset B of At , the B−neighborhood of x is defined by:
NB (x, r) = {y ∈ Ob :| fa (x) − fa (x) |≤ r ∀a ∈ B}.
For any subset X of Ob, we define two mappings: Int, Cl : P (Ob) −→ P (Ob) as follows:
IntB (X) = {x ∈ Ob : Na (x, r) ⊆ X ∀a ∈ B}
ClB (X) = {x ∈ Ob : Na (x, r) ∩ X 6= φ ∀a ∈ B}

The classes {IntB (X) : X ⊆ Ob, B ⊆ At}, {ClB (X) : X ⊆ Ob, B ⊆ At} and {NB (x, r) :
x ∈ Ob, B ⊆ At} are subbases of a topological spaces denoted τI , τC and τN respectively.
Now let At = {a1 , a2 , ..., an } and let τIa , τIa , ..., τIa , τCa , τCa , ..., τCa and
1 2 n 1 2 n
τNa , τNa , ..., τNa be the topologies induced by the subbases {Inta1 (X) : X ⊆ Ob},
1 2 n
{Inta2 (X) : X ⊆ Ob}, ..., {Intan (X) : X ⊆ Ob}, {Cla1 (X) : X ⊆ Ob}, {Cla2 (X) :
X ⊆ Ob}, ... , {Clan (X) : X ⊆ Ob} and {Na1 (x, r) : x ∈ Ob}, {Na1 (x, r) : x ∈ Ob},
... , {Na1 (x, r) : x ∈ Ob}, respectively. These topologies called interior, closure and
neighborhood topologies respectively.
One of the two attributes ai , aj , i 6= j is called interior-dispensable in At if,

τIa = τIa , otherwise, ai or aj is indispensable in At. Let τ1,2 , τ2,3 , ..., τn−1,n be the topologies
i j
induced by τIa ∪ τIa , τIa ∪ τIa , ..., τIa ∪ τIa if interior topologies are used (the same
1 2 2 3 n−1 n
6
terminology used if closure topologies or neighborhood topologies is replaced).
Now if τIAt is the topology induced by {IntAt (X) : X ⊆ Ob}(τC or τN can be used
At At
alternately), then when τi,j = τIAt the set {ai , aj } is a second order reduct of At in
S. On the other hand, if τi,j 6= τIAt for all i, j = 1, 2, .., n we must calculate the high-
est topologies τ1,2,3 , ..., τn−2,n−1,n and the subset {ai , aj , ak } is a third order reduct of At
in S when τi,j,k = τIAt . By the same manner, we can define a highly order reducts of At in S.
In each case, the topological core of At in S is the intersection of all reducts (inter-
section of all the same order reducts). This core called the interior core and denoted
CoreInt (At). By the same terminology, we can define the closure core (CoreCl (At)) and
the neighborhood core (CoreN (At)).
Example 3.1 Consider the information system given by Table 3.1

Ob a1 a2 a3 a4
x1 1 2 9 6
x2 3 2 6 2
x3 3 6 3 3
x4 4 2 2 3
x5 6 6 5 4
Table 3.1
and if we choose r = 2, then Nai (x, r) = {y ∈ Ob :| fai (x) − fai (y) |≤ 2} ,hence we
have the following subbases: ζ1 = {{x1 , x2 , x3 }, {x1 , x2 , x3 , x4 }, {x2 , x3 , x4 , x5 }, {x4 , x5 }},
ζ2 = {{x1 , x2 , x4 }, {x3 , x5 }}, ζ3 = {{x1 }, {x3 , x4 , x5 }, {x2 , x5 }, {x3 , x4 }, {x2 , x3 , x5 }} and
ζ4 = {{x2 , x3 , x4 , x5 }, Ob}
The corresponding bases are:
β1 = {{x1 , x2 , x3 }, {x1 , x2 , x3 , x4 }, {x2 , x3 , x4 , x5 }, {x4 , x5 }, {x4 }, {x2 , x3 }, {x2 , x3 , x4 }},
β2 = {{x1 , x2 , x4 }, {x3 , x5 }},
β3 = {{x1 }, {x3 , x4 , x5 }, {x2 , x5 }, {x3 , x4 }, {x2 , x3 , x5 }, {x5 }, {x3 }, {x3 , x5 }} and
β4 = {{x2 , x3 , x4 , x5 }, {x1 , x5 }, {x5 }, Ob}.
The corresponding topologies are:
τ1 = {Ob, φ, {x1 , x2 , x3 }, {x1 , x2 , x3 , x4 }, {x2 , x3 , x4 , x5 }, {x4 , x5 }, {x4 }, {x2 , x3 }, {x2 , x3 , x4 }},
τ2 = {Ob, φ, {x1 , x2 , x4 }, {x3 , x5 }},
τ3 = {Ob, φ, {x1 }, {x3 , x4 , x5 }, {x2 , x5 }, {x3 , x4 }, {x2 , x3 , x5 }, {x5 }, {x3 }, {x3 , x5 }, {x1 , x2 , x5 },
{x1 , x3 , x4 , x5 }, {x1 , x2 , x3 , x5 }, {x1 , x3 , x4 }, {x1 , x5 }, {x1 , x3 , x5 }, {x1 , x3 }, {x2 , x3 , x4 , x5 }} and
τ4 = {Ob, φ, {x2 , x3 , x4 , x5 }, {x1 , x5 }, {x5 }}.
If we considered the set of all attributes then τN is the discrete topology, but the second
At
order topologies are given such that: τ1,2 6= τN , τ1,3 = τN , τ1,4 6= τN , τ2,3 = τN ,
At At At At
7
τ2,4 6= τN and τ3,4 6= τN . Then {a1 , a2 } and {a2 , a3 } are second order reducts of At and
At At
the second order core is given by CoreN (At) = {a3 }.
4. Quick approach for data reduction

In [13] when we use Pawlak method for data reduction the following information system
(Table 4.1 ) has two reducts appears in Tables 4.2 and 4.3.
U Headache Muscle pain Temperature Flu

U1 Yes Yes Normal No
U2 Yes Yes high Yes
U3 Yes Yes Very high Yes
U4 No Yes Normal No
U5 No No high No
U6 No Yes Very high Yes
Table 4.1
U Muscle pain Temperature Flu
U1 , U4 Yes Normal No
U2 Yes high Yes
U3 , U6 Yes Very high Yes
U5 No high No
Table 4.2
U Headache Temperature Flu
U1 Yes Normal No
U2 Yes high Yes
U3 Yes Very high Yes
U4 No Normal No
U5 No high No
U6 No Very high Yes
Table 4.3
Formally, the two reducts of Table 4.1 are:
Reduct 1 = {Muscle pain, Temperature}, Reduct 2 = {Headache, Temperature} and have
the core, Core = {Temperature}.
Our method for calculating the reducts and the core is shortly suggested by three steps
are:
Step 1: calculate the cardinality of each attribute a ∈ At denoted by | a |.
Step 2: determine max(| a |)∀a ∈ At. There are two cases here are:
8
Case I find max(| a |)∀a ∈ At, then the attribute of the maximum cardinality
is the core.
Case II if there exist more than one maximum attribute , then we test the devi-
ation factor of these attributes and the set of attributes of the highest deviation
factor is the core.
The deviation factor of an attribute a is a measure of how different values of
that attribute take a different decision values. This factor is denoted by η and
defined as: η(B) =| {d ∈ D : ν1 = ν2 ∈ a and d(ν1 ) 6= d(ν2 ), ∀a ∈ B} |, B ⊆ C,
where D is the decision attribute and C is the condition attributes.
Step 3: add the core to each subset of the set of all condition attributes after removing
from them the core. We take the subsets of the lowest deviation factor as a reducts.
Example 4.1 Consider the same information system given in Table 4.1, then we have:
| Muscle pain | = 2, | Headache | = 2 | Temperature | = 3, then we take Temperature as
a core and the residue set of the condition attributes is { Muscle pain, Headache }. Now
according to Step 3 we will add Temp. to Muscle pain to obtain { Muscle pain, Temper-
ature } and add it to Headache to obtain { Headache, Temperature } and we find that:
η({Muscle pain, Temperature}) = 0, and η({Headache, Temperature}) = 0. Hence these
two subsets { Muscle pain, Temperature } and { Headache, Temperature } are the reducts.
Example 4.2 Consider the following information system (Table 4.4),
U a b c D
u1 a0 b1 c1 y
u2 a1 b1 c0 n
u3 a0 b2 c1 n
u4 a1 b1 c1 y
Table 4.4
where C = {a, b, c} is the condition attributes and D is the decision attribute. Then we
have: | a | = 2, | b | = 2 and | c | = 2 which give no core, but η(a) = 2, η(b) = 2 and
η(c) = 2, then {b, c} is the unique reduct.
5. Reduction of Data by Covering

Let R ⊆ Ob × Ob be a binary relation on the universe of an information system S, this
relation defines a covering of Ob if it reflexive relation at least. When it is not reflexive, we
9
can make a covering of Ob by adding the negative set defined by this relation.
Let C = {CR (x) : CR (x) ⊆ Ob} be the covering of Ob by R when R is reflexive, and
n n n
C = {CR (x), CR (x)}, where CR (x) = R(x) = {y : xRy} and CR (x) = Ob −
S
CR (x) when
x∈Ob
R is not reflexive one.
We recall again that any collection of data specified as a structure (Ob, At, {Va :
aAt}, fa ) such that Ob is a nonempty set of objects, At is a nonempty set of attributes,
Va is a nonempty set of values and fa is a function of Ob into 2Va \{φ} , is referred
S
V =
x∈Ob
to as a multi-valued information system.
In this section we assume that with every attribute a ∈ At is related a reflexive relation
Ra . For simplicity, this relation shall be defined in the following way. Let a ∈ At and
B ⊆ At then xRa y iff fa (x) ∩ fa (y) 6= φ, and xRB y iff xRa y ∀a ∈ B. Also, the relation
W W W
RB defined by:xRB y iff ∃a ∈ B xRa y. The relation RB is reflexive relation and called the
weak relation derived by the strong relation Ra . The coverings CRB and CRBW are subbases
of two topologies called the strong and weak topologies and, denoted by τCRB and τCRW
B
respectively. The class {fa (x) : a ∈ At} shall be called an information about the object
x, or a record of x. We shall say that two records determined by x, y are strongly similar
with respect to τCRW iff ∀a ∈ At fa (x) ∩ fa (y) 6= φ. Also two records {fa (x) : a ∈ B ⊆ At}
B
and {fb (x) : b ∈ B ⊆ At} are τCRB strongly similar with respect to the set B ⊆ At iff
∀a ∈ B fb (x) ∩ fb (y) 6= φ. Two objects x, y are weakly τCRW similar if for some a ∈ B
B
fa (x) ∩ fa (y) 6= φ.
The set of attributes Y ⊆ At depends on the set X ⊆ At with respect to the strong sim-
ilar topology τCRW if and only if CRX ≤ CRY . In the same way we can define dependency
B
At RW
of attributes with respect to the weak similar topology τCRW : X −→ Y iff CRXW ≤ CRYW .
B
By information system in general sense, we shall understand the family of operators on

subsets of a given universe. The operators are usually called lower and upper approximation
operators. They can be defined using equivalence relations, partitions, covers, similar
relations, topology etc.
Let CRB be a covering of the universe Ob of an information system S. For every X ⊂ Ob
th
we define the n upper approximation of X using CRB by:
C −1 (X) = {C ∈ CRB : C ∩ X 6= φ}, C −2 (X) = C −1 (C −1 (X)), C −3 (X) = C −1 (C −2 (X)),
S
th
... , C −n (X) = C −1 (C −n−1 (X)). Analogously C n (X) = −C −n (−X). is called n lower
approximation by CRB of the set X.
Let F : P (At) −→ P (Ob × Ob) be arbitrary function satisfying the conditions: For
every B ⊆ At F (B) = {F ({b}) : b ∈ B} and F (φ) = Ob × Ob.
T
10
Now for any subset X ⊆ At we say that X is F −independent iff for every set Y ⊂ X it
holds F (Y ) 6= F (X). Otherwise, we say that X is F −dependent. The set Y ⊂ X is called
F −reduct of X iff F (Y ) = F (X) and Y is F −independent the class of all reducts of any
subset B ⊆ At will be denoted by RedF (B) and the class of all reducts of At will be denoted
by RedF .
Let B ⊆ At and a ∈ B, then a is called F −dispensable in B if F (B) = F (B − {a}).

Otherwise a is called F −indispensable. The core of B is defined by:
Core(B) = {b ∈ B : b is F − indispensable in B}.
Now, for any B ⊆ At, we shall say that the set B is RB − independent iff for every set Y ⊆ B
it holds CRY 6= CRB . Otherwise we say that B is dependent with respect to RB . The subset
Y ⊆ B is a reduct of B with respect to τCRB iff CRY = CRB and Y is RB −independent.
By RB −Core of B, we mean the set CoreR (B) = {b ∈ B : be is RB − indispensable in B}.
B
Fact1: For every B ⊆ At, CoreR (B) = {Q : Q is a reduct of B with respect to RB }.

T
B
Example 5.1 Consider the information system given by Table 3.5
Ob M1 M2 M3 M4 M5
P1 {2, 5} {b} {1, a} {2} {c}
P2 {3, 4} {b, c} {3, c} {b} {2, 3}
P3 {3} {a} {1, 2, d} {3, b, c} {d}
P4 {4, 5} {b} {3, b} {2, a, b} {2, a}
P5 {2, 3} {b, c} {2, d} {4, d} {5, d}
P6 {4} {a, c} {a, b} {5} {4, a}
P7 {2} {b, d} {d} {3, c} {a, d}
P8 {3, 5} {c, d} {3, 4} {5, c} {3, 4}
Table 5.1
Here Ob = {P1 , P2 , ..., P8 } and At = {M1 , M2 , ..., M5 } .Then

CRM = {{P1 , P4 , P5 , P7 , P8 }, {P2 , P3 , P4 , P5 , P6 , P8 }, {P2 , P2 , P5 , P8 }, {P1 , P2 , P3 , P6 , P8 },
1
{P1 , P2 , P3 , P5 , P7 , P8 }, {P2 , P4 , P6 }, {P1 , P5 , P7 }, {P1 , P2 , P3 , P4 , P5 , P8 }},
11
CRM = {{P1 , P2 , P4 , P5 , P7 }, {P1 , P2 , P4 , P5 , P6 , P7 , P8 }, {P3 , P6 }, {P2 , P3 , P4 , P6 , P8 },
2
{P1 , P2 , P4 , P5 , P7 , P8 }, {P2 , P5 , P6 , P7 , P8 }}.
CRM = {{P1 , P3 , P6 }, {P2 , P4 , P8 }, {P1 , P3 , P5 , P7 }, {P2 , P4 , P6 , P8 }, {P3 , P5 , P7 }, {P1 , P4 , P6 },
3
{P3 , P5 , P7 }, {P2 , P4 , P8 }}.
CRM = {{P1 , P4 }, {P2 , P3 , P4 }, {P2 , P3 , P4 , P7 , P8 }, {P1 , P2 , P3 , P4 }, {P5 }, {P6 , P8 }, {P3 , P7 , P8 },
4
{P3 , P6 , P7 , P8 }}.
CRM = {{P1 }, {P2 , P4 , P8 }, {P3 , P5 , P7 }, {P2 , P4 , P6 , P7 }, {P3 , P5 , P7 }, {P4 , P6 , P7 , P8 },
5
{P3 , P4 , P5 , P6 , P7 }, {P2 , P6 , P8 }}. Let B = {M1 , M2 }, then
CRM = {{P1 , P4 , P5 , P7 }, {P2 , P4 , P6 , P8 }, {P3 }, {P1 , P2 , P4 }, {P1 , P2 , P5 , P7 , P8 }, {P2 , P6 },
B
{P1 , P5 , P7 }, {P2 , P5 , P8 }}.
Now if we take F (Y ) = RY , ∀y ∈ B, then for Y1 = {M1 }, F ({M1 }) = RM1 and

for Y2 = {M2 }, F ({M1 }) = RM1 . But CRM1 6= CRM2 =6= CRM then B = {M1 , M2 } is
B
F −independent set. Let D = {M1 , M3 }, then
CRD = {{P1 }, {P2 , P4 , P8 }, {P3 , P5 }, {P2 , P4 , P6 , P8 }, {P3 , P5 , P7 }, {P4 , P6 }, {P5 , P7 }. Also
CRD 6= CRM1 6= CRM3 , then D is F −independent.
Let E = {M2 , M3 }, then CRE = {{P1 }, {P2 , P4 , P8 }, {P3 }, {P2 , P4 }, {P5 , P7 }, {P6 }, {P2 , P8 }}.
Let G = {M1 , M2 , M3 }, then CRD = {{P1 }, {P2 , P4 , P8 }, {P3 }, {P2 , P4 }, {P5 , P7 }, {P6 }, {P2 , P8 }}.
Then F (G) = F (G − {M1 }) = F (E). Then the attribute M1 is F −dispensable in G. Also,
the Core(G) = {M2 , M3 }.
References
[1] Abd El-Monsef, M. E. (1980): Studies on some pretopological concepts, Ph. D. Thesis,
Tanta University.
[2] Andrijevic, D. (1987) : On the topology generated by preopen sets, MATH. 39, 367-376.
[3] Ahlqvist O., Keukelaar J., and Oukbir K. (2000). Rough classification and accuracy
assessment, International Journal of Geographical Information Science, 14(5): 475-496.
[4] Clementini E. and Felice P. Di (1997). Approximate topological relations. International

Journal of Approximate Reasoning, 16(2):173-204,.
[5] Flapan, E. (2000): When topology meets Chemistry, Cambridge University Press.
[6] Jelonek J., Krawiec K., Slowinski R(2002).. Rough set reduction of attributes and their
domains for neural networks. In: [CI], pp. 339 - 347.
[7] Marcus S. (1994). Tolerance rough sets, Ĉech topologies, learning processes, Bull. Polish
Acad. Sci. tech. Sci. 42/3, pp.471 - 487.
12
[8] Nagata, J. I. : Modern general topology, North Holland Pub. Co., Amsterdam (1968).
[9] Wiweger, A. (1989): On topological rough sets, Bull. Pol. Ac. Mat., 37, 51-62.
[10] Pawlak Z. (1981). Information systems theoretical foundations. Information Systems

6, pp. 205 - 218.
[11] Pawlak Z. (1982). Rough sets. International Journal of Computer and Information
Sciences 11, pp. 341 - 356.
[12] [12] Pawlak Z. (1998). Reasoning about data -a rough set perspective. In: Polkowski
and Skowron , pp. 25 - 34.
[13] Pawlak, Z.; Marek, W. (1981): Rough sets and information systems, ICS. PAS. Re-
ports , 481-485.
[14] Pawlak, Z. (1982): Rough sets, algebraic and topological approach, ICS. PAS. Reports
, 99-104.
[15] Pawlak, Z. (1986): On rough relations, Bull. Pol. Ac. Tec. sciences, vol. 34. 9-10.
[16] Pawlak, Z. (1996): Rough sets, rough relations and rough functions Bull. Pol. Ac.
Math, 15-19.
13

New Approaches For Data Reduction: A. S. Salama at H. M. Abu-Donia @

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

New Approaches For Data Reduction: A. S. Salama at H. M. Abu-Donia @

Uploaded by

Copyright:

Available Formats

New Approaches for Data Reduction

Keywords: Information Systems, Topological Spaces, and Data Reduction.

symbols, −Infn : Ob −→ for n ∈ M is information function satisfying the condition:

Xa1 = {x1 , x2 }, Xb1 = {x3 , x5 }, Xc1 = {x4 }

Xd2 = {x1 , x5 }, Xe2 = {x2 , x3 }, Xf2 = {x4 }

Xi3 = {x1 , x5 }, Xh3 = {x2 }, Xg3 = {x3 , x4 }

Proposition 2.1 The families {Xwn }, n ∈ M , w ∈

Proof The families {Xwn }, n ∈ M , w ∈

Proposition 2.2 The families {Xwn }, n ∈ M , w ∈

Lemma 2.1 If β is a base for a topological space (U, τ ), where β is a partition of U ,

(i) Intτ (X) = {B ∈ β : B ⊆ X}

(ii) Clτ (X) = {B ∈ β : B ∩ X 6= φ}

Conversely, if x ∈ {B ∈ β : B ∩ X 6= φ} and G is an open set containing x and β is a

partition of U , x ∈ U , then x belongs to only one element of β say x ∈ B0 . Then must

Y ⊆ Ob, A ⊆ M and B ⊆ we have Intτ (Y ) = L(Y ) and Clτ (Y ) = U (Y ), where τ is the

be the topology generated by the base βn = {Xwn : n ∈ M, w ∈ }. If (U, τ ) be any

Conversely, if τn = τ , and τn is a quasi-discrete topological space generated by βn ,

⇔ y0 ∈ B and x ∈ Clτ ({y0 })

, {Infn }n∈M ) and for all n ∈ M we

3. Topological Reduction of Data

NB (x, r) = {y ∈ Ob :| fa (x) − fa (x) |≤ r ∀a ∈ B}.

IntB (X) = {x ∈ Ob : Na (x, r) ⊆ X ∀a ∈ B}

ClB (X) = {x ∈ Ob : Na (x, r) ∩ X 6= φ ∀a ∈ B}

One of the two attributes ai , aj , i 6= j is called interior-dispensable in At if,

Example 3.1 Consider the information system given by Table 3.1

4. Quick approach for data reduction

U Headache Muscle pain Temperature Flu

Example 4.2 Consider the following information system (Table 4.4),

5. Reduction of Data by Covering

By information system in general sense, we shall understand the family of operators on

Let B ⊆ At and a ∈ B, then a is called F −dispensable in B if F (B) = F (B − {a}).

Fact1: For every B ⊆ At, CoreR (B) = {Q : Q is a reduct of B with respect to RB }.

Example 5.1 Consider the information system given by Table 3.5

P1 {2, 5} {b} {1, a} {2} {c}

P2 {3, 4} {b, c} {3, c} {b} {2, 3}

P3 {3} {a} {1, 2, d} {3, b, c} {d}

P4 {4, 5} {b} {3, b} {2, a, b} {2, a}

P5 {2, 3} {b, c} {2, d} {4, d} {5, d}

P6 {4} {a, c} {a, b} {5} {4, a}

P7 {2} {b, d} {d} {3, c} {a, d}

P8 {3, 5} {c, d} {3, 4} {5, c} {3, 4}

Here Ob = {P1 , P2 , ..., P8 } and At = {M1 , M2 , ..., M5 } .Then

Now if we take F (Y ) = RY , ∀y ∈ B, then for Y1 = {M1 }, F ({M1 }) = RM1 and

[4] Clementini E. and Felice P. Di (1997). Approximate topological relations. International

[10] Pawlak Z. (1981). Information systems theoretical foundations. Information Systems

You might also like