Professional Documents
Culture Documents
Ke Chen
http://intranet.cs.man.ac.uk/mlo/comp20411/
• Background
• Probability Basics
• Probabilistic Classification
• Naïve Bayes
• Example: Play Tennis
• Relevant Issues
• Conclusions
2
Background
• There are three methods to establish a classifier
a) Model a classification rule directly
Examples: k-NN, decision trees, perceptron, SVM
b) Model the probability of class memberships given input data
Example: multi-layered perceptron with the cross-entropy cost
c) Make a probabilistic model of data within each class
Examples: naive Bayes, model based classifiers
• a) and b) are examples of discriminative classification
• c) is an example of generative classification
• b) and c) are both examples of probabilistic classification
3
Probability Basics
• Prior, conditional and joint probability
– Prior probability: P(X )
– Conditional probability: P( X1 |X2 ), P(X2 | X1 )
– Joint probability: X ( X1 , X2 ), P( X ) P(X1 ,X2 )
– Relationship: P(X1 ,X2 ) P( X2 | X1 )P( X1 ) P( X1 | X2 )P( X2 )
– Independence: P( X2 | X1 ) P( X2 ), P( X1 | X2 ) P( X1 ), P(X1 ,X2 ) P( X1 )P( X2 )
• Bayesian Rule
4
Example by Dieter Fox
Probabilistic Classification
• Establishing a probabilistic model for classification
– Discriminative model
P(C |X ) C c1 , , c L , X (X1 , , Xn )
– Generative model
P( X |C ) C c1 , , c L , X (X1 , , Xn )
8
Feature Histograms
P(x)
C1
C2
0
Slide by Stephen Marsland
x
Naïve Bayes
• Bayes classification
P(C |X ) P( X |C )P(C ) P( X1 , , Xn |C )P(C )
11
Naïve Bayes
• Naïve Bayes Algorithm (for discrete input attributes)
– Learning Phase: Given a training set S,
For each target value of ci (ci c1 , , c L )
Pˆ (C ci ) estimate P(C ci ) with examples in S;
For every attribute value a jk of each attribute x j ( j 1, , n; k 1, , N j )
Pˆ ( X j a jk |C ci ) estimate P( X j a jk |C ci ) with examples in S;
12
Example
• Example: Play Tennis
13
Learning Phase
P(Outlook=o|Play=b) P(Temperature=t|Play=b)
P(Humidity=h|Play=b) P(Wind=w|Play=b)
– MAP rule
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206
15
Relevant Issues
• Violation of Independence Assumption
– For many real world tasks, P( X1 , , Xn |C ) P( X1 |C ) P( Xn |C )
– Nevertheless, naïve Bayes works surprisingly well anyway!
• Zero conditional probability Problem
– If no example contains the attribute value X j a jk , Pˆ ( X j a jk |C ci ) 0
– In this circumstance, Pˆ ( x |c ) Pˆ ( a |c ) Pˆ ( x |c ) 0 during test
1 i jk i n i
17
Relevant Issues
• Continuous-valued Input Attributes
– Numberless values for an attribute
– Conditional probability modeled with the normal distribution
1 ( X j ji )2
Pˆ ( X j |C ci ) exp
2 ji 2 ji
2
ji : mean (avearage) of attribute values X j of examples for which C ci
ji : standard deviation of attribute values X j of examples for which C ci
18
Conclusions
• Naïve Bayes based on the independence assumption
– Training is very easy and fast; just requiring considering each
attribute in each class separately
– Test is straightforward; just looking up tables or calculating
conditional probabilities with normal distributions
• A popular generative model
– Performance competitive to most of state-of-the-art classifiers even
in presence of violating independence assumption
– Many successful applications, e.g., spam mail filtering
– Apart from classification, naïve Bayes can do more…
19