Professional Documents
Culture Documents
• Relevant Issues
• Summary
2
COMP24111 Machine Learning
Bayesian Methods
• Our focus is in this lecture
• Learning and classification methods based on
probability theory.
• Bayes theorem plays a critical role in probabilistic
learning and classification.
• Uses prior probability of each category given no
information about an item.
• Categorization produces a posterior probability
distribution over the possible categories given a
description of an item.
4
COMP24111 Machine Learning
Basic Probability Formulas
• Product rule
P( A B) P( A | B) P( B) P( B | A) P( A)
• Sum rule
P( A B) P( A) P( B) P( A B)
• Bayes theorem
P ( D | h) P ( h)
P(h | D)
P( D)
• Theorem of total probability, if event Ai is mutually
exclusive and probability sum to 1
n
P( B) P( B | Ai ) P( Ai )
i 1
7
COMP24111 Machine Learning
Probability Basics
• Quiz: We have two six-sided dice. When they are tolled, it could end up
with the following occurance: (A) dice 1 lands on side “3”, (B) dice 2 lands
on side “1”, and (C) Two dice sum to eight. Answer the following questions:
1) P( A) ?
2) P(B) ?
3) P(C) ?
4) P( A | B) ?
5) P(C | A) ?
6) P( A , B) ?
7) P( A , C ) ?
8) Is P( A , C ) equal to P(A) P(C) ?
8
COMP24111 Machine Learning
Probabilistic Classification
• Establishing a probabilistic model for classification
– Discriminative model
P(C|X) C c1 , ,cL , X (X1 , , Xn )
Discriminative
Probabilistic Classifier
x1 x2 xn
x ( x1 , x2 , , xn )
9
COMP24111 Machine Learning
Probabilistic Classification
• Establishing a probabilistic model for classification (cont.)
– Generative model
P( x | c 1 ) P( x |c 2 ) P( x |c L )
x ( x1 , x2 , , xn )
10
COMP24111 Machine Learning
Probabilistic Classification
• MAP classification rule
– MAP: Maximum A Posterior
– Assign x to c* if
P(C c * |X x) P(C c |X x) c c * , c c1 , , c L
11
COMP24111 Machine Learning
Naïve Bayes
• Bayes classification
P(C|X) P(X|C)P(C) P(X1 , , Xn |C)P(C)
Difficulty: learning the joint probability P(X1 , , Xn |C)
• Naïve Bayes classification
– Assumption that all input features are conditionally independent!
P(X1 , X2 , , Xn |C) P(X1 | X2 , , Xn , C)P(X2 , , Xn |C)
P(X1 |C)P(X2 , , Xn |C)
P(X1 |C)P(X2 |C) P(Xn |C)
– MAP classification rule: for x ( x1 , x2 , , xn )
[ P( x1 |c * ) P( xn |c * )]P(c * ) [ P( x1 |c) P( xn |c)]P(c), c c * , c c1 , , c L
12
COMP24111 Machine Learning
Naïve Bayes
X (a1 , , an )
[Pˆ (a1 |c* ) Pˆ (an |c* )]Pˆ (c* ) [Pˆ (a1 |c) Pˆ (an |c)]Pˆ (c), c c* , c c1 , , cL
13
COMP24111 Machine Learning
Example
• Example: Play Tennis
14
COMP24111 Machine Learning
Example
• Learning Phase
Outlook Play=Yes Play=No Temperature Play=Yes Play=No
Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5
15
COMP24111 Machine Learning
Example
• Test Phase
– Given a new instance, predict its label
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables achieved in the learning phrase
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
16
COMP24111 Machine Learning
Naïve Bayes
• Algorithm: Continuous-valued Features
– Numberless values taken by a continuous-valued feature
– Conditional probability often modeled with the normal distribution
1 ( X j ji )2
Pˆ ( X j |C ci ) exp
2 ji 2 2
ji
ji : mean (avearage) of feature values X j of examples for which C ci
ji : standard deviation of feature values X j of examples for which C ci