Professional Documents
Culture Documents
A. S. Salama
Mathematics Department, Faculty of Science, Tanta University , Egypt.
e-mail: amgadsalama2003@yahoo.com
H. M. Abu-Donia
Mathematics Department, Faculty of Science, Zagazig University, Egypt.
e-mail: donia− 1000@yahoo.com
Abstract
In this paper, we studied some topological properties of information systems and
we introduced three new approaches for data reduction. Topological approach for
data reduction is a new method to deal with general types of relations. The reducts
of an information systems has here some orders (first order, second order, and so
on) and also the core. The second approach depending on the comparing the values
of each subset of the set of condition attributes with the decision attribute. The
evaluation of reducts and the core by the second approach is a quick and efficiently
method for data reduction than the classical methods. The last approach depending
on the notion of topological covering.
1. Introduction
Information systems were introduced by Z. Pawlak in (1982) [12,14,15,16] are excellent
tools to handle a granularity of data. It may be used to describe dependencies between
attributes, to evaluate significance of attributes, and to deal with inconsistent data, to
name just a few possible uses out of many ways of analysis information systems to real
world problems. Most important it is an approach to handling imperfect data.
The calculus of these systems is based on an objective viewpoint of the world; i. e., all
computations of the calculus is based on existing data characteristics. Many other ways of
handling imperfect data are subjective, world situations by experts.
The notion of indiscernibility, the main idea of rough set theory, is closely related to
data granularity. The notion of indiscernibility may be introduced in two different ways.
First, it may discuss in the more general but less intuitive form as the fair of the universe.
1
Second, it may also be presented in the form of a table, called an information table or
an information system. We will use the latter approach in this paper because it is more
application oriented.
The rough set theory is based also on complete information systems [10,11,13]. It
classifies objects using upper - approximation and lower - approximation defined on the
indiscernibility relation. In order to process incomplete information systems, the rough
set theory needs to be extended, especially, the indiscernibility relation needs to be ex-
tended to some in-equivalent relations there are several extensions for the indiscernibility
relation at present [7,9], such as tolerance relations non-symmetric similarity relations and
complementarily relations.
2. Topological properties of information systems
Studying the topological applications of information systems appeared in [5,7,9]. An
information system can be defined by a quadruple, S = (Ob, At, {Va : a ∈ At}, fa ) where:
−Ob is a finite non-empty set of objects,
−At is a finite non-empty set of attributes,
−Va is a finite non-empty set of values of a ∈ At,
−fa : Ob −→ P (Va ) is an information function.
For any object x ∈ Ob when fa (x) ∈ Va for all a ∈ At, the information system S in
this case called single value information system (Pawlak systems). On the other hand,
when fa (x) ∈ P (Va ) it is called set value information system .An information system S is
complete if for all a ∈ At and for all x ∈ Ob, fa (x) 6= φ.
Special type of information systems, which is called a nominal information system and it is
defined as: S = (Ob, , {Infn }n∈M ) where −M is a finite set of positive integers, −Ob is a
P
finite non-empty set of objects, − is a finite non - empty set of alphabet of information
P
2
Ob Inf1 Inf2 Inf3
x1 a d i
x2 a e h
x3 b e g
x4 c f g
x5 b d i
Table (2.1)
According to Table (2.1) we have
Proof Only we prove (ii) because (i) is trivial. Let x ∈ Clτ (X), then for every open set G
containing x, X ∩ G 6= φ. But G = B, then there exists B0 ∈ β such that x ∈ B0 ⊆ G.
S
B∈β
But B0 is an open set containing x, hence B0 ∩ X 6= φ and x ∈ {B ∈ β : B ∩ X 6= φ}.
S
Let S = (Ob, , {Infn }n∈M ) be a nominal information system. For any subset Y of Ob
P
A A A A
and A ⊆ M we define: L(X) = {XB : XB ⊆ Y } and U (X) = {XB : XB ∩ Y 6= φ}.
S S
P
For a given subset B of , L(X) and U (X) are called the lower and the upper approxima-
tions of Y in S respectively. According to Example 2.1, if A = {1} and B = {a, b, c} then
L(Y ) = {x4 } and U (Y ) = {x1 , x2 , x4 } for the subset Y = {x1 , x4 }.
3
Theorem 2.1 Let S = (Ob, , {Infn }n∈M ) be a nominal information system. For any
P
According to Example 2.1, the following are bases for topologies on Ob:
1 1 1
β1 = {Xa , Xb , Xc } = {{x1 , x2 }, {x3 , x5 }, {x4 }}
2 2 2
β2 = {Xd , Xe , Xf } = {{x1 , x5 }, {x2 , x3 }, {x4 }}
3 3 3
β3 = {Xi , Xh , Xg } = {{x1 , x5 }, {x2 }, {x3 , x4 }}
{2,3} {2,3} {2,3} {2,3}
β4 = {X{d,i} , X{e,h} , X{e,g} , X{f,g} } = {{x1 , x5 }, {x2 }, {x3 }, {x3 }}.
If Y = {x2 , x3 , x5 } be a subset of Ob, then with respect to the base β3 we have L(Y ) = {x3 },
U (Y ) = {x1 , x2 , x3 , x4 , x5 }, Intτβ (Y ) = {x3 } and Clτβ (Y ) = {x1 , x2 , x3 , x4 , x5 }.
3 3
Let S = (Ob, , {Infn }n∈M ) be a nominal information system. For any two levels
P
0 n m
n, m ∈ M and any values w, w ∈ let {Xw } and {X 0 } be two partitions of the set of
P
w
objects Ob defined by the equivalence relations Infn and Infm respectively. Then we say
m n n m
that the partition {X 0 } depends on the partition {Xw } denoted {Xw } ≤ {X 0 } if and
m S n w m m
w
only if: X 0 = n Xw For all X 0 ∈ {X 0 }.
w w w
n m
Theorem 2.3 Let τn and τm be the topologies induced by the partitions {Xw } and {X 0 }
w
n m
respectively. Then {Xw } ≤ {X 0 } iff τm ⊆ τn .
w
m m
Proof Let G ∈ τm be an open set, then G = m X 0 , X 0 ⊆ G for some m ∈ M , where
S
w S Sw
m m S n n n
{X 0 } is a base of τm . But X 0 = n Xw , hence G = m n Xw i.e., G =
S
Xw which
w w max{n,m}
implies that G ∈ τn , hence τm ⊆ τn . Conversely, if τm ⊆ τn , then for every G ∈ τm also
m S n m n
implies G ∈ τn , hence G = X 0 = n Xw then there exist n0 such that X 0 = n Xw0 .
S S
m w w 0
n m
Hence {Xw } ≤ {X 0 }.
w
Example 2.2 Consider the partitions β1 = {{x1 , x2 }, {x3 }, {x4 }} and β2 = {{x1 , x2 }, {x3 , x4 }}
of the set U = {x1 , x2 , x3 , x4 }. Then β1 ≤ β2 and τ2 ⊆ τ1 where
τ1 = {U, φ, {x3 }, {x4 }, {x1 , x2 }, {x3 , x4 }, {x1 , x2 , x3 }, {x1 , x2 , x4 }} and
4
τ2 = {U, φ, {x1 , x2 }, {x3 , x4 }} are the topologies generated by β1 and β1 respectively.
For any topological space (U, τ ) , we define the equivalence relation E(τ ) on the set U
by: (x, y) ∈ E(τ ) iff Clτ ({x}) = Clτ ({y}) x, y ∈ U . The set of all equivalence classes of
E(τ ) is denoted by U/E(τ ).
Theorem 2.4 Let S = (U, , {Infn }n∈M ) be a nominal information system and let τn
P
quasi-discrete topological space has U/E(τ ) as a base. Then τn = τ iff for all x ∈ Xwn there
exists B ∈ U/E(τ ) such that x ∈ B.
Proof If for all x ∈ Xwn there exists B ∈ U/E(τ ) and x ∈ B, then Xwn = B hence
B ∈ U/E(τ ) and τn = τ .
Lemma 2.2 [9] For any topology τ on a set U , and for all x, y ∈ U , if x ∈ Clτ ({y})
and y ∈ Clτ ({x}) then Clτ ({x}) = Clτ ({y}).
Lemma 2.3 [9] If τ is a quasi-discrete topology on a set U , then y ∈ Clτ ({x}) im-
plies x ∈ Clτ ({y}) for all x, y ∈ U .
Lemma 2.4 [9] If τ is a quasi-discrete topology on a set U , then the family {Clτ ({x}) :
x ∈ U } is a partition of U .
Proposition 2.3 Let τ be the topology induced by the partition βn = {Xwn : n ∈ M, w ∈
} on the set Ob, where S = (Ob, , {Infn }n∈M ) be a nominal information system. Then
P P
βn = Ob/E(τ ).
Proof
[
x ∈ B, B ∈ βn ⇔ x ∈ Clτ (B) = Clτ ({y})
y∈B
5
Theorem 2.5 For any nominal information system S = (Ob, , {Infn }n∈M ), then τn ⊆ τind
P
where τn and τind are the topologies generated by the partitions Ob/E(τn ) and Ob/E(τind )
respectively.
Proof Since Ob/E(τind ) ≤ Ob/E(τn ) for all n ∈ M then τn ⊆ τind (Theorem 2.3).
Example 2.3 Consider the topological space (U, τ ) where U = {x1 , x2 , x3 , x4 } and
β = {{x1 }, {x2 , x3 }, {x4 }} is the base of τ , then τ is a quasi-discrete topology and:
Clτ ({x1 }) = {x1 }, Clτ ({x2 }) = {x2 , x3 }, Clτ ({x3 }) = {x2 , x3 }, Clτ ({x4 }) = {x4 }.
Then U/E(τ ) = {{x1 }, {x2 , x3 }, {x4 }} = β.
For any subset X of Ob, we define two mappings: Int, Cl : P (Ob) −→ P (Ob) as follows:
6
terminology used if closure topologies or neighborhood topologies is replaced).
Now if τIAt is the topology induced by {IntAt (X) : X ⊆ Ob}(τC or τN can be used
At At
alternately), then when τi,j = τIAt the set {ai , aj } is a second order reduct of At in
S. On the other hand, if τi,j 6= τIAt for all i, j = 1, 2, .., n we must calculate the high-
est topologies τ1,2,3 , ..., τn−2,n−1,n and the subset {ai , aj , ak } is a third order reduct of At
in S when τi,j,k = τIAt . By the same manner, we can define a highly order reducts of At in S.
In each case, the topological core of At in S is the intersection of all reducts (inter-
section of all the same order reducts). This core called the interior core and denoted
CoreInt (At). By the same terminology, we can define the closure core (CoreCl (At)) and
the neighborhood core (CoreN (At)).
and if we choose r = 2, then Nai (x, r) = {y ∈ Ob :| fai (x) − fai (y) |≤ 2} ,hence we
have the following subbases: ζ1 = {{x1 , x2 , x3 }, {x1 , x2 , x3 , x4 }, {x2 , x3 , x4 , x5 }, {x4 , x5 }},
ζ2 = {{x1 , x2 , x4 }, {x3 , x5 }}, ζ3 = {{x1 }, {x3 , x4 , x5 }, {x2 , x5 }, {x3 , x4 }, {x2 , x3 , x5 }} and
ζ4 = {{x2 , x3 , x4 , x5 }, Ob}
The corresponding bases are:
β1 = {{x1 , x2 , x3 }, {x1 , x2 , x3 , x4 }, {x2 , x3 , x4 , x5 }, {x4 , x5 }, {x4 }, {x2 , x3 }, {x2 , x3 , x4 }},
β2 = {{x1 , x2 , x4 }, {x3 , x5 }},
β3 = {{x1 }, {x3 , x4 , x5 }, {x2 , x5 }, {x3 , x4 }, {x2 , x3 , x5 }, {x5 }, {x3 }, {x3 , x5 }} and
β4 = {{x2 , x3 , x4 , x5 }, {x1 , x5 }, {x5 }, Ob}.
The corresponding topologies are:
τ1 = {Ob, φ, {x1 , x2 , x3 }, {x1 , x2 , x3 , x4 }, {x2 , x3 , x4 , x5 }, {x4 , x5 }, {x4 }, {x2 , x3 }, {x2 , x3 , x4 }},
τ2 = {Ob, φ, {x1 , x2 , x4 }, {x3 , x5 }},
τ3 = {Ob, φ, {x1 }, {x3 , x4 , x5 }, {x2 , x5 }, {x3 , x4 }, {x2 , x3 , x5 }, {x5 }, {x3 }, {x3 , x5 }, {x1 , x2 , x5 },
{x1 , x3 , x4 , x5 }, {x1 , x2 , x3 , x5 }, {x1 , x3 , x4 }, {x1 , x5 }, {x1 , x3 , x5 }, {x1 , x3 }, {x2 , x3 , x4 , x5 }} and
τ4 = {Ob, φ, {x2 , x3 , x4 , x5 }, {x1 , x5 }, {x5 }}.
If we considered the set of all attributes then τN is the discrete topology, but the second
At
order topologies are given such that: τ1,2 6= τN , τ1,3 = τN , τ1,4 6= τN , τ2,3 = τN ,
At At At At
7
τ2,4 6= τN and τ3,4 6= τN . Then {a1 , a2 } and {a2 , a3 } are second order reducts of At and
At At
the second order core is given by CoreN (At) = {a3 }.
Our method for calculating the reducts and the core is shortly suggested by three steps
are:
Step 1: calculate the cardinality of each attribute a ∈ At denoted by | a |.
Step 2: determine max(| a |)∀a ∈ At. There are two cases here are:
8
Case I find max(| a |)∀a ∈ At, then the attribute of the maximum cardinality
is the core.
Case II if there exist more than one maximum attribute , then we test the devi-
ation factor of these attributes and the set of attributes of the highest deviation
factor is the core.
The deviation factor of an attribute a is a measure of how different values of
that attribute take a different decision values. This factor is denoted by η and
defined as: η(B) =| {d ∈ D : ν1 = ν2 ∈ a and d(ν1 ) 6= d(ν2 ), ∀a ∈ B} |, B ⊆ C,
where D is the decision attribute and C is the condition attributes.
Step 3: add the core to each subset of the set of all condition attributes after removing
from them the core. We take the subsets of the lowest deviation factor as a reducts.
Example 4.1 Consider the same information system given in Table 4.1, then we have:
| Muscle pain | = 2, | Headache | = 2 | Temperature | = 3, then we take Temperature as
a core and the residue set of the condition attributes is { Muscle pain, Headache }. Now
according to Step 3 we will add Temp. to Muscle pain to obtain { Muscle pain, Temper-
ature } and add it to Headache to obtain { Headache, Temperature } and we find that:
η({Muscle pain, Temperature}) = 0, and η({Headache, Temperature}) = 0. Hence these
two subsets { Muscle pain, Temperature } and { Headache, Temperature } are the reducts.
U a b c D
u1 a0 b1 c1 y
u2 a1 b1 c0 n
u3 a0 b2 c1 n
u4 a1 b1 c1 y
Table 4.4
where C = {a, b, c} is the condition attributes and D is the decision attribute. Then we
have: | a | = 2, | b | = 2 and | c | = 2 which give no core, but η(a) = 2, η(b) = 2 and
η(c) = 2, then {b, c} is the unique reduct.
9
can make a covering of Ob by adding the negative set defined by this relation.
Let C = {CR (x) : CR (x) ⊆ Ob} be the covering of Ob by R when R is reflexive, and
n n n
C = {CR (x), CR (x)}, where CR (x) = R(x) = {y : xRy} and CR (x) = Ob −
S
CR (x) when
x∈Ob
R is not reflexive one.
We recall again that any collection of data specified as a structure (Ob, At, {Va :
aAt}, fa ) such that Ob is a nonempty set of objects, At is a nonempty set of attributes,
Va is a nonempty set of values and fa is a function of Ob into 2Va \{φ} , is referred
S
V =
x∈Ob
to as a multi-valued information system.
In this section we assume that with every attribute a ∈ At is related a reflexive relation
Ra . For simplicity, this relation shall be defined in the following way. Let a ∈ At and
B ⊆ At then xRa y iff fa (x) ∩ fa (y) 6= φ, and xRB y iff xRa y ∀a ∈ B. Also, the relation
W W W
RB defined by:xRB y iff ∃a ∈ B xRa y. The relation RB is reflexive relation and called the
weak relation derived by the strong relation Ra . The coverings CRB and CRBW are subbases
of two topologies called the strong and weak topologies and, denoted by τCRB and τCRW
B
respectively. The class {fa (x) : a ∈ At} shall be called an information about the object
x, or a record of x. We shall say that two records determined by x, y are strongly similar
with respect to τCRW iff ∀a ∈ At fa (x) ∩ fa (y) 6= φ. Also two records {fa (x) : a ∈ B ⊆ At}
B
and {fb (x) : b ∈ B ⊆ At} are τCRB strongly similar with respect to the set B ⊆ At iff
∀a ∈ B fb (x) ∩ fb (y) 6= φ. Two objects x, y are weakly τCRW similar if for some a ∈ B
B
fa (x) ∩ fa (y) 6= φ.
The set of attributes Y ⊆ At depends on the set X ⊆ At with respect to the strong sim-
ilar topology τCRW if and only if CRX ≤ CRY . In the same way we can define dependency
B
At RW
of attributes with respect to the weak similar topology τCRW : X −→ Y iff CRXW ≤ CRYW .
B
10
Now for any subset X ⊆ At we say that X is F −independent iff for every set Y ⊂ X it
holds F (Y ) 6= F (X). Otherwise, we say that X is F −dependent. The set Y ⊂ X is called
F −reduct of X iff F (Y ) = F (X) and Y is F −independent the class of all reducts of any
subset B ⊆ At will be denoted by RedF (B) and the class of all reducts of At will be denoted
by RedF .
Ob M1 M2 M3 M4 M5
Table 5.1
11
CRM = {{P1 , P2 , P4 , P5 , P7 }, {P1 , P2 , P4 , P5 , P6 , P7 , P8 }, {P3 , P6 }, {P2 , P3 , P4 , P6 , P8 },
2
{P1 , P2 , P4 , P5 , P7 , P8 }, {P2 , P5 , P6 , P7 , P8 }}.
CRM = {{P1 , P3 , P6 }, {P2 , P4 , P8 }, {P1 , P3 , P5 , P7 }, {P2 , P4 , P6 , P8 }, {P3 , P5 , P7 }, {P1 , P4 , P6 },
3
{P3 , P5 , P7 }, {P2 , P4 , P8 }}.
CRM = {{P1 , P4 }, {P2 , P3 , P4 }, {P2 , P3 , P4 , P7 , P8 }, {P1 , P2 , P3 , P4 }, {P5 }, {P6 , P8 }, {P3 , P7 , P8 },
4
{P3 , P6 , P7 , P8 }}.
CRM = {{P1 }, {P2 , P4 , P8 }, {P3 , P5 , P7 }, {P2 , P4 , P6 , P7 }, {P3 , P5 , P7 }, {P4 , P6 , P7 , P8 },
5
{P3 , P4 , P5 , P6 , P7 }, {P2 , P6 , P8 }}. Let B = {M1 , M2 }, then
CRM = {{P1 , P4 , P5 , P7 }, {P2 , P4 , P6 , P8 }, {P3 }, {P1 , P2 , P4 }, {P1 , P2 , P5 , P7 , P8 }, {P2 , P6 },
B
{P1 , P5 , P7 }, {P2 , P5 , P8 }}.
References
[1] Abd El-Monsef, M. E. (1980): Studies on some pretopological concepts, Ph. D. Thesis,
Tanta University.
[2] Andrijevic, D. (1987) : On the topology generated by preopen sets, MATH. 39, 367-376.
[3] Ahlqvist O., Keukelaar J., and Oukbir K. (2000). Rough classification and accuracy
assessment, International Journal of Geographical Information Science, 14(5): 475-496.
[5] Flapan, E. (2000): When topology meets Chemistry, Cambridge University Press.
[6] Jelonek J., Krawiec K., Slowinski R(2002).. Rough set reduction of attributes and their
domains for neural networks. In: [CI], pp. 339 - 347.
[7] Marcus S. (1994). Tolerance rough sets, Ĉech topologies, learning processes, Bull. Polish
Acad. Sci. tech. Sci. 42/3, pp.471 - 487.
12
[8] Nagata, J. I. : Modern general topology, North Holland Pub. Co., Amsterdam (1968).
[9] Wiweger, A. (1989): On topological rough sets, Bull. Pol. Ac. Mat., 37, 51-62.
[11] Pawlak Z. (1982). Rough sets. International Journal of Computer and Information
Sciences 11, pp. 341 - 356.
[12] [12] Pawlak Z. (1998). Reasoning about data -a rough set perspective. In: Polkowski
and Skowron , pp. 25 - 34.
[13] Pawlak, Z.; Marek, W. (1981): Rough sets and information systems, ICS. PAS. Re-
ports , 481-485.
[14] Pawlak, Z. (1982): Rough sets, algebraic and topological approach, ICS. PAS. Reports
, 99-104.
[15] Pawlak, Z. (1986): On rough relations, Bull. Pol. Ac. Tec. sciences, vol. 34. 9-10.
[16] Pawlak, Z. (1996): Rough sets, rough relations and rough functions Bull. Pol. Ac.
Math, 15-19.
13