You are on page 1of 20

ASSIGNMENT

ON
ROUGH SET THEORY
Submitted BY
Rosemelyne Wartde
MTech IT 2nd Semester
Roll No- 20MTechIT02

Dept. of Information technology, NEHU, Shillong


Introduction
 Rough sets was introduced by Z Pawlak in his seminar paper
of 1982.
 formal theory derived from fundamental research on logical
properties of information systems.
 methodology of database mining or knowledge discovery in
relational databases.
 In its abstract form, it is a new area of uncertainty
mathematics closely related to fuzzy theory.
 Rough sets and fuzzy sets are complementary generalizations
of classical sets.
 The approximation spaces of rough set theory are sets with
multiple memberships, while fuzzy sets are concerned with
partial memberships.
Basic problems in data analysis solved by Rough
Set

 Characterization of a set of objects in terms of


attribute values.
 Finding dependency between the attributes.
 Reduction of superfluous attributes.
 Finding the most significant attributes.
 Decision rule generation.
Goals of Rough Set Theory

 The main goal of the rough set analysis is the induction


of (learning) approximations of concepts. Rough sets
constitute a sound basis for KDD. It offers mathematical
tools to discover patterns hidden in data.
 It can be used for feature selection, feature extraction,
data reduction, decision rule generation, and pattern
extraction (templates, association rules) etc.
 Identifies partial or total dependencies in data,
eliminates redundant data, gives approach to null values,
missing data, dynamic data and others.
Information system
 In Rough Set, data model information is stored in a table.

 Each tuples represents a fact or an object.

 In Rough Set terminology a data table is called an Information System.

 Thus, the information table represents input data, gathered from any
Attributes
domain.
Case Temperature Headache Nausa Cough
1 High yes no yes
2 Very_high yes yes no
3 high no no no
4 high yes yes yes
5 normal yes no no
6 normal no yes yes


 Information system is a pair (U, A), U is a non-empty finite set of objects
and A is a non-empty finite set of attributes.

 The elements of A are called conditional attributes.

 An Information table sometimes called decision table.

 Decision system is a pair of (U, A union {d}), where d is decision attribute

Attribute Decision
Case Temperature Headache Nausa Cough Flu

1 high yes no yes yes


2 very_high yes yes no no
3 high no no no no
4 high yes yes yes yes
5 normal yes no no no
6 normal no yes yes yes
Indiscernibility

 A way of reducing table size is to store only one representative object for
every set of objects with same features

 With any P subset A there is an associated equivalence relation IND(P):

 Where IND(P) is called indiscernibility of relation. Here x and y are


indiscernible from each other by attribute P.
 In above example,
IND({p1}) = {{O1, O2}, {O3, O5, O7, O9, O10}, {O4, O6, O8}}
O1 and O2 are characterized by the same values of attribute p1 and the value is
1.
O3, O5, O7, O9, O10 are characterized by the same value of attribute p1 and the
value is 2.
O4, O6, O8 are characterized by the same value of attribute p1 and the value is 0.
 The indiscernibility relation is an equivalence relation. Sets that are indiscernible are
called elementary sets.
Approximations
 It is a formal approximation of a crisp set defined by its two approximations:

1). Upper approximation: is the set of objects which possibly belong to the


target set.

2). Lower approximation: is the set of objects that positively belong to the


target set.
 A set is said to be rough if its boundary region is non-empty, otherwise
the set is crisp.
Four Basic Classes of Rough Sets
Accuracy
 Accuracy of the rough set of a set X which measures how closely the rough set
approximates target set X is given as

 where |X| denotes the cardinality of set X which is not null. Obviously, αp(X) will lie
between [0, 1] –
 if αp(X)= 1, the upper and lower approximations are equal and X becomes a crisp set
with respect to P.
 if αp(X)< 1, X is rough with respect to P.
 if αp(X)= 0, the lower approximation is empty (regardless of the size of the upper
approximation).
Attribute Dependency
 describes which variables are strongly related to which other variables.
 Set of attribute Q depends totally on a set of attributes P, denoted if all values of
attributes from Q are uniquely determined by values of attributes from P.
 Let us take two disjoint sets of attributes, set P and set Q. Each attribute set induces
an indiscernibility or equivalence class structure. The equivalence classes induced by
P is given by [x]P and the equivalence classes induced by Q is given by [x]Q.
 Let, Qi is a given equivalence class from the equivalence-class structure induced by
attribute set Q. The dependency of attribute set Q on attribute set P, k or γ(P, Q), is
given by:

 Note that –
If k or γ(P, Q)= 1, Q depends totally on P.
If k or γ(P, Q)< 1, Q depends partially (in a degree k) on P.
Reduct
 The same or indiscernible objects may be represented several times. Some
of the attributes may be superfluous or redundant.
 We should keep only those attributes that preserve the indiscernibility
relation and consequently set approximation.
 There are usually several such subsets of attributes and those which are
minimal are called Reduct.
 Reduct is a sufficient set of features which by itself can fully characterize
the knowledge in the database
 Some of the important features of Reduct are –
->Produce same equivalence class structure as that expressed by the full
attribute set which can be expresses by [x]RED = [x]P.
->It is minimal.
->It is not unique.
Algorithm to Reduct Calculation

 Input:
 C, the set of all conditional features
 D, the set of all decisional features
 
 Output: R, a feature subset
1. T := { }, R : = { }
2. repeat
3. T : = R
4. ∀ x ∈ (C – R )
5. if γ RU{X} ( D ) > γT( D )
6. T : = R U {x}
7. R : = T
8. until γR( D ) = γC( D )
9. return R
Core
 Core is the set of attributes which is common to all reducts and denoted
by CORE(P) = ∩ (RED(P)).

 Some of the important features of Core are :


->It consists of attributes which cannot be removed without causing
collapse of the equivalence class structure.
->It may be empty.
->It is the set of necessary attributes. If we remove the core attributes
from the information table, it will result in Data inconsistency.
An Example of Reducts & Core
U Headache Muscle Pain Temp. Flu
U1 Yes Yes Normal No
U2 Yes Yes High Yes
U3 Yes Yes Very_High Yes
U4 No Yes Normal No
U5 No No High No
U6 No No Very_High Yes

 Reduct Calculation:
 The set {Muscle-pain, Temp.} is a reduct of the original set of attributes
{Headache, Muscle_ pain, Temp.}. So, Reduct1 = {Muscle-pain, Temp.}. A new
information table based on this Reduct1 is represented as –
U Muscle Pain Temp. Flu
U1, U4 Yes Normal No
U2 Yes High Yes
U3, U6 Yes Very_High Yes
U5 No High No
An Example of Reducts & Core
 The set {Headache, Temp.} is a reduct of the original set of attributes {Headache,
Muscle_ pain, Temp.}. So, Reduct2 = {Headache, Temp.}. A new information table
based on this Reduct2 is represented as –

U Headache Temp. Flu


U1 Yes Normal No
U2 Yes High Yes
U3 Yes Very-High Yes
U4 No Normal No
U5 No High No
U6 No Very-High Yes

 So the core will be intersection of all the reducts. CORE = {Headache, Temp}∩


{MusclePain, Temp} = {Temp}
THANK YOU

You might also like