Professional Documents
Culture Documents
3
Bina Nusantara
Acknowledgments
These slides have been adapted from Han, J.,
Kamber, M., & Pei, Y. Data Mining: Concepts
and Technique and Tan, P.-N., Steinbach, M.,
& Kumar, V. Introduction to Data Mining.
Bina Nusantara
Outline Materi
Bayesian classification
5
Bina Nusantara
Bayesian Classification: Why?
A statistical classifier: performs probabilistic prediction, i.e., predicts
class membership probabilities
Foundation: Based on Bayes Theorem.
Performance: A simple Bayesian classifier, nave Bayesian classifier,
has comparable performance with decision tree and selected neural
network classifiers
Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct prior
knowledge can be combined with observed data
Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured
P (H | X ) = P (X | H )P (H )
P (X )
Informally, this can be written as
posteriori = likelihood x prior/evidence
Predicts X belongs to C2 iff the probability P(Ci|X) is the highest among
all the P(Ck|X) for all the k classes
Practical difficulty: require initial knowledge of many probabilities,
significant computational cost
If a patient has stiff neck, whats the probability he/she has meningitis?
P ( S | M ) P ( M ) 0.5 1 / 50000
P( M | S ) = = = 0.0002
P( S ) 1 / 20
Bayesian Classifiers
Consider each attribute and class label as random variables
P(A A K A )
1 2 n
ij
3 No Single 70K No
One for each (Ai,ci) pair
4 Yes Married 120K No
5 No Divorced 95K Yes For (Income, Class=No):
6 No Married 60K No If Class=No
7 Yes Divorced 220K No sample mean = 110
8 No Single 85K Yes
sample variance = 2975
9 No Married 75K No
10 No Single 90K Yes
10
1
( 120 110 ) 2
Bina Nusantara