6 Chapter

ROUGH SET
THEORY AND PRACTICE
OVERVIEW
INFORMATION & DECISION SYSTEMS
INDISCERNIBILITY RELATION & EQUIVALENCE CLASS
APPROXIMATION OF CLASSES & PROPERTIES
DISCERNIBILITY RELATION, FUNCTION & REDUCTS
RULES GENERATION & FRAMEWORK
CLASSIFICATION
DEFAULT RULES GENERATION FRAMEWORK
AN OVERVIEW
Rough-Set concept was introduced by Polish logician, Professor Zdzis aw Pawlak in early eighties.
Recently, it is one of the most developing AI method. techniques for the identification and recognition of common patterns in data, especially in the case of uncertain and incomplete data. The mathematical foundations of this method are based on the set approximation of the classification space.
The rough set philosophy is founded on the assumption that with every of the universe of discourse we associate some information (data, knowledge). E.g., if objects are patients suffering from a certain disease, symptoms of the disease form information about patients.
Objects characterized by the same information are indiscernible (similar) in view of the available information about them. The indiscernibility relation generated in this way is the mathematical basis of rough set theory. Any set of all indiscernible (similar) objects is called elementary set, and form basic granule (atom) of knowledge about the universe. Any union of some elementary sets is referred to as crisp (precise) set - otherwise a set is rough (imprecise, vague).
Rough set
Indiscernible objects Crisp data knowledge Elementary set set
Precise concept
` objects can be characterized through information available.
vague concept,
` cannot be characterized in terms of information about their elements - replaced by a pair of precise concepts - called the lower and the upper approximation of the vague concept.
lower approximation - consists of all objects which certainly belong to the concept upper approximation - contains all objects which possibly belong to the concept. the difference between the upper and the lower approximation constitute the boundary region of the vague concept.
X BX
BX
The set of X The B-Lower Approximation of X The B-Upper Approximation of X
Knowledge base for Rough-Set processing is stored as a table containing conditional and decision attributes. The table represents given knowledge in form of IFTHEN rules. However, some elements of knowledge can be uncertain and inconsistent. In such a case, decision table become inconsistent and some rules become uncertain. Rough-Set reasoning is made over discrete data. Therefore, a discretization is used for data preprocessing.
a method of knowledge representation is very important for Rough-Set data processing. Data are stored in a decision table as follows: columns represent attributes, rows represent objects whereas every cell contains attribute value for corresponding objects and attributes. Decision tables are also called information systems.
Projection of lasses
Raw Data
Training Set
Reducer
0121 1012 ---------Preprocessing Discretization
Split Rules
Test Set
Reducts
Generate Rules
Testing
Important Features
Upper and lower approximation Membership Values Number of Rules Length of Rules Accuracy Coverage Support
ROUGH SET THEORY
AN EXAMPLE
1 2 3 4 5 6 7 : 99 Poor 100Moderate
Studies Poor Poor Moderate Moderate Poor Moderate Good
Education SPM SPM SPM Diploma SPM Diploma MSC SPM Diploma
Works Poor Good Poor Poor Poor Poor Good Good Poor
1 2 3 4 5 6 7 : 99 Poor SPM 100Moderate Diploma
Studies Poor Poor Moderate Moderate Poor Moderate Good
Education SPM SPM SPM Diploma SPM Diploma MSC
Works Poor Good Poor Poor Poor Poor Good Good Poor
Income (D) None Low Low Low None Low Medium Low Low
Indiscernibility Relations Equivalence Class Discernibility Matrix Discernibility Matrix Modulo Discernibility Function Reducts
The concept of indiscernibility captures the discernibility phenomenon and allows us to partition the universe into disjoint subsets where each subset contains objects that are equal, or indiscernible from each other using the selected attribute subset.
IND ( B) ! {( x , x ) U | a B, (a( x ) ! a ( x ))}

A i j i j
Decision System 1 2 3 4 5 6 7 : 99 Poor 100 Moderate Studies Poor Poor Moderate Moderate Poor Moderate Good Education SPM SPM SPM Diploma SPM Diploma MSC SPM Diploma Class E1 E2 E3 E4 E5,1 E5,2 Works Poor Good Poor Poor Poor Poor Good Good Poor Studies Poor Poor Moderate Moderate Good Good Income (D) None Low Low Low None Low Medium Low Low Education SPM SPM SPM Diploma MSC MSC Works Poor Good Poor Poor Good Good
Equivalence Class Income num_obj None Low Low Low Medium High 50 5 30 10 4 1
Class E1 E2 E3 E4 E5,1 E5,2
a 1 1 2 2 3 3
b 2 2 2 3 5 5
c 3 1 3 3 1 1
dec 1 2 2 2 3 4
num_obj 50 5 30 10 4 1
Equivalence Class
Class E1 E2 E3 E4 E5,1 E5,2 a 1 1 2 2 3 3 b 2 2 2 3 5 5 c 3 1 3 3 1 1 dec 1 2 2 2 3 4 num_obj 50 5 30 10 4 1
Discernibility Matrix
E1 E1 E2 E3 E4 E5 x {c} {a} {a, b} {a, b, c}
E2 {c} x {a, c} {a, b, c} {a, b}
E3 {a} {a ,c} x {b} {a, b, c}
E4 {a, b} {a, b, c} {b} x {a, b, c}
E5 {a, b, c} {a, b} {a, b, c} {a, b, c} x
Equivalence Class
Class E1 E2 E3 E4 E5,1 E5,2 a 1 1 2 2 3 3 b 2 2 2 3 5 5 c 3 1 3 3 1 1 dec 1 2 2 2 3 4 num_obj 50 5 30 10 4 1
Discernibility Matrix Modulo E1 E2 E3 E4 E5 E1 x {c} {a} {a, b} {a, b, c} E2 {c} x x x {a, b} E3 {a} x x x {a, b, c} E4 {a, b} x x x {a, b, c} E5 {a, b, c} {a, b} {a, b, c} {a, b, c} x
iscernibility atrix E1 x {c} {a} {a, b} {a, b, c} E2 {c} x {a, c} {a, b, c} {a, b} E3 {a} {a ,c} x {b}x {a, b, c} E4 {a, b} {a, b, c} {b} {a, b, c} {a, b, c} E5 {a, b, c} {a, b} {a, b, c} x
E1 E2 E3 E4 E5
iscernibility Function E1 x {c} {a} {a, b} {a, b, c} E2 {c} x {a, c} {a, b, c} {a, b} E3 {a} {a ,c} x {b} {a, b, c} E4 {a, b} {a, b, c} {b} x {a, b, c} E5 {a, b, c} {a, b} {a, b, c} {a, b, c} x f {a, c} {a, c},{b, c} {a},{b} {b} {a},{b}
E1 E2 E3 E4 E5
Discernibility Matrix Modulo

E1 E2 E3 E4 E5 E1 x {c} {a} {a, b} {a, b, c} E2 {c} x x x {a, b} E3 {a} x x x {a, b, c} E4 {a, b} x x x {a, b, c} E5 {a, b, c} {a, b} {a, b, c} {a, b, c} x
Discernibility Function Modulo

E1 E1 E2 E3 E4 E5 x {c} {a} {a, b} {a, b, c} E2 {c} x x x {a, b} E3 {a} x x x {a, b, c} E4 {a, b} x x x {a, b, c} E5 {a, b, c} {a, b} {a, b, c} {a, b, c} x f {a, c} {a, c},{b, c} {a} {a},{b} {a},{b}
E1 E2 E3 E4 E5
E1 x {c} {a} {a, b} {a, b, c}
E2 {c} x x x {a, b}
E3 {a} x x x {a, b, c}
E4 {a, b} x x x {a, b, c}
E5 {a, b, c} {a, b} {a, b, c} {a, b, c} x
f {a, c} {a, c},{b, c} {a} {a},{b} {a},{b}
Class E1 E2 E3 E4 E5
CNF of boolean function c a (a b) (a b c) c (a b) a (a b c) (a b) (a b c) (a b c) (a b)
Disc.Function ac c (a b) a ab ab
Reducts are minimal selection of attributes within the IS/DS that are more important/interesting than others. based on the concept of discernibility relation in the classes of the DS object reduct : reduct of every class in the IS/DS full reduct : reduct of the system; the most important atrributes in the IS/DS
Class E1 E2 E3 E4 E5
CNF of boolean function c a (a b) (a b c) c (a b) {a,c},{b, c} a (a b c) (a b) (a b c) (a b c) (a b)
Disc. Function
Reducts {a, c } b) {a} {a},{b} {a},{b}
ac c (a a ab ab
` Reduct ` Set
of the system??
of selected attributes that can represent the IS ` In this example, the reduct of the system is {a, c}
` Genetic
Reducer ` Dynamic Reducer ` Johnson Reducer ` Holte1R ` Exhaustive ` SIP/DRIP
[E1, {a, c}], [E2, {a, c},{b,c}], [E3, {a}], [E4, {a}{b}], [E5, {a}{b}]
a b c 1 1 2 2 3 3 2 2 2 3 5 5 3 1 3 3 1 1
dec 1 2 2 2 3 4
Reducts
a1c3 p d1 a1c1 p d2,b2c1 p d2 a2 p d2 b3 p d2 a3 p d3,a3 p d4 b5 p d3,b5 p d4
Equivalence Classes
Rules
Class E1 E2 E2 E3, E4 E4 E5 0.8 E5 0.2 E5 0.8 E5 0.2
Rules a1c3 p d1 a1c1 p d2 b2c1 p d2 p d2 b3 p d2 a3 p d3 a3 b5 b5

p d4 p d3 p d4
a2
Membership Degree 50/50 = 1 5/5 = 1 5/5 = 1 40/40 = 1 10/10 = 1 4/5 = 1/5 4/5 1/5 = = =
rules that uses more default knowledge for situation when the information is incomplete. simpler in structure; a few condition attributes better when handling unseen cases with missing values a preliminary conclusion can be made although not with certain degree of certainty, a decision can be made whether to uphold the decision or to gather more information
START Decision System
Equivalence Classes
Projection of Classes
Discernibility Relation
Reducts Computation
Rules Generation (Definite/Default) END
default rules are generated through creating the indeterminacy in a DS The underlying idea is to search for default rules by destroying reducts of the original system generation of indeterminacy is done through selecting of projections over the condition attributes allowing certain attributes to be excluded from consideration
a 1 1 2 2 3 3
b 2 2 2 3 5 5
c 3 1 3 3 1 1
decision 1 2 2 2 3 4
num_obj 50 5 30 10 4 1
Projection (Cpr) {a, b, c} {b, c} {a, c} {a, b} {c} {b} {J}
Remove (Ccut) a b c a, b a, c a, b, c
Indeterminacy Joining E1, E2, E3, E4, E5 {E1, E3}, E2, E4, E5 no indeterminacy {E1, E2, E3}, E4, E5 {E1, E3, E4}, {E2,E5} {E1, E2, E3} {U}
{a, b, c}
ac
a {b, c} bc {a, c}
c E[1,3] {a, b} a b
c E[1,2] c a
a
E[1,3,4]
{c}
c {b} c b
{a}
E[2,5]
E[1,2,3] c b
{}
U=E[1,2,3,4,5]
Definite Rules a1c3 p d1 a1c1 p d2 b2c1 p d2 a2 p d2 a1 p d1 b3 p d2
Membership(Q) 50/50 = 1 5/5 = 1 5/5 = 1 40/40 = 1 50/50 = 1 10/10 = 1 b2 b2 c1 c1 c1 a1 a1

p p p p p p p
Default Rules a3 a3 b5 b5 a3 a3 d1 d2 d2 d3 d4 d1 d2
Membership(Q)
p d3 4/5 = 0.8 p d4 1/5 = 0.2 p d3 4/5 = 0.8 p d4 1/5 = 0.2 p d3 4/5 = 0.8 p d4 1/5 = 0.2 50/80 =0.625 30/80 =0.375 5/10 =0.5 4/10 =0.4 1/10 =0.1 50/55 =0.91 5/55 =0.09
Given a description contains a conditional part E and the decision part F, denoting a decision rule E p F. The support of the pattern E is a number of objects in the information system A has the property described by E.
The support of F is the number of object in the IS A that have the decision described by F. sup port (E ) ! E The support for the decision rule E p F is the probability of that an object covered by the description is belongs to the class. sup port ( F ) ! F sup port (E p F ) ! sup port (E F )
The quantity accuracy (E p F) gives a measure of how trustworthy the rule is in the condition F. It is the probability that an arbitrary object covered by the description belongs to the class. It is identical to the value of rough membership function applied to an object x that match E. Thus accuracy measures the degree of membership of x in X using attribute B.
sup port (E F ) Accuracy(E p F ) ! sup port (E )
Coverage gives measure of how well the pattern E describes the decision class defined through F. It is a probability that an arbitrary object, belonging to the class C is covered by the description D.
sup port (E F ) Coverage (E p F ) ! sup port ( F )
The rules are said to be complete if any object belonging to the class is covered by the description coverage is 1 while deterministic rules are rules with the accuracy is 1. The correct rules are rules with both coverage and accuracy is 1.
Two folds:
to
determine the performance of classification with other test set
making
decision toward new cases without decision
Classification rules
Test data
Studies Education Works Income (D) Moderate Diploma Poor Low Poor SPM Poor None Moderate Diploma Poor Low Good MSC Good Medium :
New data
studies=poor and work=poor
classify
poor
When a classifier is presented with a new case, the rule set is scanned to find applicable rules; rules that the predecessors match the case. If no rule is found, the most frequent outcome in the training data is chosen. If more than one rules match, these may in turn indicate more than one possible outcome. A voting process is the performed among the rules that match in order to resolve conflicts and to rank the predicted outcomes.
> First
Rule > Highest Accuracy > Simple Voting > Quadratic Voting > Exponential Voting > Weight of Evidence
Projection of Classes
Raw Data
Training Set
Reducer
Reducts
Generate Rules
0121 1012 ---------Preprocessing Discretization
Split Rules
Test Set
Testing
ROugh SET Toolkit for data Analysis (ROSETTA) Implement the default rules generation framework introduced by (Mollestad, 1997) provide utilities
data preprocessing, data discretization rule based knowledge statistical analysis embedded with C++, Prolog, MathLab programs.
EXPERIMENTAL RESULTS ON SEVERAL ROUGH DATA MINING SYSTEMS

AUS NR 118 767 87 56 2305 786 CLEV NR M_L 6 7 4 1 9 7 LYM ACC 88.76 87.64 84.27 86.51 86.51 86.51
Method SIP/DRIP GA Johnson Holte1R Dynamic Exhaustive
ACC 98.48 84.44 82.93 85.52 85.09 84.88
M_L ACC 3 6 4 1 7 7
NR M_L 71 947 40 57 4195 1749 3 4 2 1 7 6
82.35 111 83.82 1014 77.45 68 80.88 43 84.31 106140 82.84 1106
EXPERIMENTAL RESULTS ON SEVERAL ROUGH DATA MINING SYSTEMS BCO NR 34 297 28 81 444 297 GERM ACC NR 52 4511 69 48 -
Method SIP/DRIP GA Johnson Holte1R Dynamic Exhaustive
ACC 95.10 95.10 94.49 84.69 90.61 95.10
M_L 2 4 2 1 4 4
M_L 4 6 4 1
75.417 74.250 74.250 71.750 -
` `
SEE ASSIGNMENT 2 HOW TO PERFORM MODELING?
C1 C1 C2 classes buy_computer = yes buy_computer = no total

` `
C2 False negative True negative recognition(%) 99.34 86.27 95.52
True positive False positive total 7000 3000 10000
buy_computer = yes 6954 412 7366
buy_computer = no 46 2588 2634
Accuracy of a classifier M, acc(M): percentage of test set tuples that are correctly classified by the model M Error rate (misclassification rate) of M = 1 acc(M) Given m classes, Mi,j, an entry in a confusion matrix, indicates # of tuples in class i that are labeled by the classifier as class j Alternative accuracy measures (e.g., for cancer diagnosis) sensitivity = t-pos/pos /* true positive recognition rate */ specificity = t-neg/neg /* true negative recognition rate */ precision = t-pos/(t-pos + f-pos) accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg) This model can also be used for cost-benefit analysis
Data Mining: Concepts and Techniques 53
June 20, 2011
Measure predictor accuracy: measure how far off the predicted value is from the actual known value Loss function: measures the error betw. yi and the predicted value yi Absolute error: | yi yi| Squared error: (yi yi)2 Test error (generalization error): the average loss over the test set Mean absolute error: Mean squared error:
d d i
Relative absolute error: i !1
| y
d
yi ' |
d
i i!1 d
Relative squared error:

i
(y
i !1
yi ' ) 2
| y y '| 1 The mean squared-error exaggerates the presence of outliersi !d | y y| ( yi y ) 2 Popularly use (square) root mean-square error, similarly, root relative i !1 squared error
i i!1
d ( yi yi ' ) 2
d
June 20, 2011
Data Mining: Concepts and Techniques
54
Holdout method Given data is randomly partitioned into two independent sets x Training set (e.g., 2/3) for model construction x Test set (e.g., 1/3) for accuracy estimation Random sampling: a variation of holdout x Repeat holdout k times, accuracy = avg. of the accuracies obtained Cross-validation (k-fold, where k = 10 is most popular) Randomly partition the data into k mutually exclusive subsets, each approximately equal size At i-th iteration, use Di as test set and others as training set Leave-one-out: k folds where k = # of tuples, for small sized data Stratified cross-validation: folds are stratified so that class dist. in each fold is approx. the same as that in the initial data
Data Mining: Concepts and Techniques June 20, 2011 55
Bootstrap Works well with small data sets Samples the given training tuples uniformly with replacement x i.e., each time a tuple is selected, it is equally likely to be selected again and re-added to the training set
Several boostrap methods, and a common one is .632 boostrap

Suppose we are given a data set of d tuples. The data set is sampled d times, with replacement, resulting in a training set of d samples. The data tuples that did not make it into the training set end up forming the test set. About 63.2% of the original data will end up in the bootstrap, and the remaining 36.8% will form the test set (since (1 1/d)d e-1 = 0.368)
Repeat the sampling procedure k times, overall accuracy of the model: k

acc( M ) !
i !1
June 20, 2011
(0.632 v acc( M i )test _ set 0.368 v acc( M i )train _ set )

Ensemble methods Use a combination of models to increase accuracy Combine a series of k learned models, M1, M2, , Mk, with the aim of creating an improved model M* Popular ensemble methods Bagging: averaging the prediction over a collection of classifiers Boosting: weighted vote with a collection of classifiers Ensemble: combining a set of heterogeneous classifiers
June 20, 2011
57
` `
Analogy: Diagnosis based on multiple doctors majority vote Training Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled with replacement from D (i.e., boostrap) A classifier model Mi is learned for each training set Di Classification: classify an unknown sample X Each classifier Mi returns its class prediction The bagged classifier M* counts the votes and assigns the class with the most votes to X Prediction: can be applied to the prediction of continuous values by taking the average value of each prediction for a given test tuple Accuracy Often significant better than a single classifier derived from D For noise data: not considerably worse, more robust Proved improved accuracy in prediction
June 20, 2011
58
Analogy: Consult several doctors, based on a combination of weighted diagnosesweight assigned based on the previous diagnosis accuracy How boosting works? Weights are assigned to each training tuple A series of k classifiers is iteratively learned After a classifier Mi is learned, the weights are updated to allow the subsequent classifier, Mi+1, to pay more attention to the training tuples that were misclassified by Mi The final M* combines the votes of each individual classifier, where the weight of each classifier's vote is a function of its accuracy The boosting algorithm can be extended for the prediction of continuous values Comparing with bagging: boosting tends to achieve greater accuracy, but it also risks overfitting the model to misclassified data
June 20, 2011
59
` ` `
Given a set of d class-labeled tuples, (X1, y1), , (Xd, yd) Initially, all the weights of tuples are set the same (1/d) Generate k classifiers in k rounds. At round i, Tuples from D are sampled (with replacement) to form a training set Di of the same size Each tuples chance of being selected is based on its weight A classification model Mi is derived from Di Its error rate is calculated using Di as a test set If a tuple is misclssified, its weight is increased, o.w. it is decreased Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi error rate is the sum of the weights of the misclassified tuples:
d
The weight of error ( M i ) ! Misj vote( X j ) classifier w v err is

j
1 error ( M i ) log error ( M i )

June 20, 2011
ROC (Receiver Operating Characteristics) curves: for visual comparison of classification models Originated from signal detection theory Shows the trade-off between the true positive rate and the false positive rate The area under the ROC curve is a measure of the accuracy of the model Rank the test tuples in decreasing order: the one that is most likely to belong to the positive class appears at the top of the list The closer to the diagonal line (i.e., the closer the area is to 0.5), the less accurate is the model Vertical axis represents the true positive rate Horizontal axis rep. the false positive rate The plot also shows a diagonal line A model with perfect accuracy will have an area of 1.0
61
` `
June 20, 2011
1.
2.
3.
4.
5.
Azuraliza Abu Bakar, Zulaiha Ali Othman, Abdul Razak Hamdan, Ruhaizan Ismail & Rozianiwati Yusof. 2010. Agent Based Rough Set Classifier. Applied Soft Computing. Elsevier. IMPACT FACTOR 2.11. ISI INDEX. To appear. Khadija Al-Aidarous, Azuraliza Abu Bakar and Zalinda Othman, 2010. Data Classification using Rough Sets and Nave Bayes. The Fifth International Conference on Rough Sets and Knowledge Technology (RSKT2009), Lecture Notes in Artificial Intelligence.To appear. Faizah Shaari, Azuraliza Abu Bakar, Abdul Razak Hamdan. 2009. A Predictive Analysis on Medical Data based on Outlier Detection Method using Non-Reduct Computation. Advanced Data Mining and Applications (ADMA2009), Beijing, China. Lecture Notes in Artificial Intelligence. SCOPUS. Zulaiha Ali Othman, Azuraliza Abu Bakar, Zalinda Othman, Suzanna Rosli. 2009. Development of the Data Preprocessing Agents Knowledge for Data Mining Using Rough Set Theory. The Fourth International Conference on Rough Sets and Knowledge Technology (RSKT2009), July 14-16, Gold Coast, Australia. Lecture Notes in Artificial Intelligence.pp87-93. Faizah Shaari, Azuraliza Abu Bakar, Abdul Razak Hamdan. 2009. On New Concept in Computation of Reduct in Rough Sets Theory. Proceedings The Fourth International Conference on Rough Sets and Knowledge Technology (RSKT2009), July 14-16, Gold Coast, Australia. Lecture Notes in Artificial Intelligence.136-143.
6.
7.
8.
9.
10.
11.
Shaari, F, Bakar A.A., Hamdan A.R.,2009. Mining Outliers Using Rough Set Theory. Intelligent Data Analysis. Vol 13(2). Pp 191-206. IMPACT FACTOR 0.34 ISI INDEX. Mohamad Farhan, Azuraliza Abu Bakar,& Mohd Helmy Abd Wahab. 2008. A Comparative Study of Apriori and Rough Classifier for Data Mining. Journal of Information Technology and Multimedia. Vol.5: 1-12. Yun-Huoy Choo, Bakar A.A., Hamdan A.R.,2008. The Fitness-rough: A New Attribute Reduction Method Based on Statistical and Rough Set Theory, Intelligent Data Analysis. Vol 12(1). ISI Index. IMPACT FACTOR 0.449 (citeseer:0.36). ISI INDEX. Shaari, F, Bakar A.A., Hamdan A.R.,2008. Fast Outlier Detection Using Rough Sets Theory, WIT Transaction on Information and Communication Technologies, Vol 40. Pp25-34. SCOPUS. Yun-Huoy Choo, Bakar A.A., Hamdan A.R., 2006, Membership Reference Using Rough Sets Theory, Journal of Institute of Mathematics & Computer Science, Vol. 17, No. 1, June 2006, 02, Salinan Keras (Kertas),. Faudziah Ahmad, Prof. Dr. Abdul Razak Hamdan, Dr. Azuraliza Abu Bakar, 2004, Determining Success Indicators of E-Commerce Companies Using Rough Set Approach, The Journal of American Academy of Business, Cambridge,02, Laman Sesawang, PROQUEST (ABI/INFORM) and CABELLS DIRECTORY.
12.
13.
14.
15.
16.
Saeed, W., Sulaiman, M. N., Othman, M. & Selamat, M.H. Bakar, A. A., 2004, Effective Mining Compact Rough Classification Model, Journal Institute of Mathematics & Computer Science (Comp.Sc.Series) Bakar, A. A., Sulaiman, M. N., Othman, M. & Selamat, M.H. 2002., 2002, Propositional Satisfiability Algorithm to Find Minimal Reducts For Data Mining., International Journal of Computer Mathematics., Vol. 79, No. 4, pp.379-380,Salinan Keras (Kertas), http://taylorandfrancis.metapress.com/ SCORPUS Azuraliza Abu Bakar, Md Nasir Sulaiman, Mohamed Othman & Mohd Hasan Selamat, 2001, IP Algorithms in Compact Rough Classification Modeling., Intelligent Data Analysis, IOS Press., Vol. 5. No 5, pp 419-429.02, Salinan Keras. SCOPUS. Azuraliza Abu Bakar, Sulaiman, M. N., Othman, M. & Selamat, M.H., 2001, Improved Rough Classification Model: A Comparison with Neural Classifier., Journal Institute of Mathematics & Computer Science (Comp.Sc.Series)., Vol.12, No.1. pp 37-46., 02, Salinan Keras (Kertas). Bakar, A. A., Sulaiman, M. N., Othman, M. & Selamat, M.H. 2001, Rough Classifier: Experiments on Two Medical Data Sets,, Chiang Mai Journal of Science, Vol. 28, No.1. pp 59-63., 02, Salinan Keras (Kertas).
THANK YOU

6 Chapter

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6 Chapter

Uploaded by

Copyright:

Available Formats

ROUGH SET

THEORY AND PRACTICE

INFORMATION & DECISION SYSTEMS

INDISCERNIBILITY RELATION & EQUIVALENCE CLASS

APPROXIMATION OF CLASSES & PROPERTIES

DISCERNIBILITY RELATION, FUNCTION & REDUCTS

RULES GENERATION & FRAMEWORK

DEFAULT RULES GENERATION FRAMEWORK

The set of X The B-Lower Approximation of X The B-Upper Approximation of X

0121 1012 ---------Preprocessing Discretization

ROUGH SET THEORY

Studies Poor Poor Moderate Moderate Poor Moderate Good

1 2 3 4 5 6 7 : 99 Poor SPM 100Moderate Diploma

Studies Poor Poor Moderate Moderate Poor Moderate Good

Education SPM SPM SPM Diploma SPM Diploma MSC

IND ( B) ! {( x , x ) U | a B, (a( x ) ! a ( x ))}

Class E1 E2 E3 E4 E5,1 E5,2

E1 E1 E2 E3 E4 E5 x {c} {a} {a, b} {a, b, c}

E2 {c} x {a, c} {a, b, c} {a, b}

E3 {a} {a ,c} x {b} {a, b, c}

E4 {a, b} {a, b, c} {b} x {a, b, c}

E5 {a, b, c} {a, b} {a, b, c} {a, b, c} x

Discernibility Matrix Modulo

Discernibility Function Modulo

E1 x {c} {a} {a, b} {a, b, c}

E5 {a, b, c} {a, b} {a, b, c} {a, b, c} x

f {a, c} {a, c},{b, c} {a} {a},{b} {a},{b}

CNF of boolean function c a (a b) (a b c) c (a b) a (a b c) (a b) (a b c) (a b c) (a b)

CNF of boolean function c a (a b) (a b c) c (a b) {a,c},{b, c} a (a b c) (a b) (a b c) (a b c) (a b)

Reducts {a, c } b) {a} {a},{b} {a},{b}

Reducer ` Dynamic Reducer ` Johnson Reducer ` Holte1R ` Exhaustive ` SIP/DRIP

Class E1 E2 E3 E4 E5,1 E5,2

a1c3 p d1 a1c1 p d2,b2c1 p d2 a2 p d2 b3 p d2 a3 p d3,a3 p d4 b5 p d3,b5 p d4

Class E1 E2 E2 E3, E4 E4 E5 0.8 E5 0.2 E5 0.8 E5 0.2

Rules a1c3 p d1 a1c1 p d2 b2c1 p d2 p d2 b3 p d2 a3 p d3 a3 b5 b5

START Decision System

Rules Generation (Definite/Default) END

Class E1 E2 E3 E4 E5,1 E5,2

Projection (Cpr) {a, b, c} {b, c} {a, c} {a, b} {c} {b} {J}

Definite Rules a1c3 p d1 a1c1 p d2 b2c1 p d2 a2 p d2 a1 p d1 b3 p d2

Membership(Q) 50/50 = 1 5/5 = 1 5/5 = 1 40/40 = 1 50/50 = 1 10/10 = 1 b2 b2 c1 c1 c1 a1 a1

sup port (E F ) Accuracy(E p F ) ! sup port (E )

sup port (E F ) Coverage (E p F ) ! sup port ( F )

determine the performance of classification with other test set

decision toward new cases without decision

0121 1012 ---------Preprocessing Discretization

EXPERIMENTAL RESULTS ON SEVERAL ROUGH DATA MINING SYSTEMS

Method SIP/DRIP GA Johnson Holte1R Dynamic Exhaustive

ACC 98.48 84.44 82.93 85.52 85.09 84.88

NR M_L 71 947 40 57 4195 1749 3 4 2 1 7 6

Method SIP/DRIP GA Johnson Holte1R Dynamic Exhaustive

ACC 95.10 95.10 94.49 84.69 90.61 95.10

75.417 74.250 74.250 71.750 -

SEE ASSIGNMENT 2 HOW TO PERFORM MODELING?

C1 C1 C2 classes buy_computer = yes buy_computer = no total

C2 False negative True negative recognition(%) 99.34 86.27 95.52

True positive False positive total 7000 3000 10000

buy_computer = yes 6954 412 7366

buy_computer = no 46 2588 2634

June 20, 2011

Relative absolute error: i !1

Relative squared error:

June 20, 2011

IND ( B) ! {( x , x ) U | a B, (a( x ) ! a ( x ))}

(0.632 v acc( M i )test _ set 0.368 v acc( M i )train _ set )

1 error ( M i ) log error ( M i )