Professional Documents
Culture Documents
jalali@mshdiua.ac.ir Jalali.mshdiau.ac.ir
Data Mining
Bayesian Classification
Data Mining
a hypothesis is correct
When to use: Moderate or large training set available Attributes that describe instances are conditionally independent given classification
P(X): probability that sample data is observed P(X|H) (posteriori probability), the probability of observing the sample X, given that the hypothesis holds
E.g., Given that X will buy computer, the prob. that X is 31..40, medium income 4
Bayesian Theorem
Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes theorem
This greatly reduces the computation cost: Only counts the class distribution
and P(xk|Ci) is
P ( X | C i ) g ( xk , C i , Ci )
Class: C1:buys_computer = yes C2:buys_computer = no Data sample X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)
income student redit_rating c buys_compu high no fair no high no excellent no high no fair yes medium no fair yes low yes fair yes low yes excellent no low yes excellent yes medium no fair no low yes fair yes medium yes fair yes medium yes excellent yes medium no excellent yes high yes fair yes medium no excellent 8no
P(age = <=30 | buys_computer = yes) = 2/9 = 0.222 P(age = <= 30 | buys_computer = no) = 3/5 = 0.6 P(income = medium | buys_computer = yes) = 4/9 = 0.444 P(income = medium | buys_computer = no) = 2/5 = 0.4 P(student = yes | buys_computer = yes) = 6/9 = 0.667 P(student = yes | buys_computer = no) = 1/5 = 0.2 P(credit_rating = fair | buys_computer = yes) = 6/9 = 0.667 P(credit_rating = fair | buys_computer = no) = 2/5 = 0.4 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = yes) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = no) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019 P(X|Ci)*P(Ci) : P(X|buys_computer = yes) * P(buys_computer = yes) = 0.028 P(X|buys_computer = no) * P(buys_computer = no) = 0.007 Therefore, X belongs to class (buys_computer = yes)
Ex. Suppose a dataset with 1000 tuples, income=low (0), income= medium (990), and income = high (10), Use Laplacian correction (or Laplacian estimator)
Adding 1 to each case
Disadvantages
Assumption: class conditional independence, therefore loss of accuracy
E.g., hospitals: patients: Profile: age, family history, etc. Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc. Dependencies among these cannot be modeled by Nave Bayesian Classifier How to deal with these dependencies?
Bayesian Belief Networks 11
X
Z
Y
P
X1
X2
X1 X3
X2
x4
13
Family History
Smoker
LC
LungCancer Emphysema
0.8
0.5
0.7
0.1
~LC
0.2
0.5
0.3
0.9
CPT shows the conditional probability for each possible combination of its parents
PositiveXRay
Dyspnea
14
1. -
51
Text Classification - 2
: 1. x=(sqrt=25, bit=30, chip=25, Web= 15, limit= 20, CPU= 12, equal=30)
Word Doc1 Doc2 sqrt 42 10 bit 25 28 chip 7 45 log 56 2 web 2 69 limit 66 8 cpu 1 56 equal 89 40 class Math Comp
Doc3
Doc4 Doc5 Doc6
11
33 28 8
25
40 32 22
22
8 9 30
4
48 60 1 Meaning Maths related
40
3 6 29
9
40 29 3
44
4 7 40
34
77 60 33
Comp
Math Math Comp
Computers related
16