You are on page 1of 29

CLASSIFICATION: Bayesian Classifiers

Uses Bayes (Thomas Bayes, 1701-1781) Theorem to


build probabilistic models of relationships between
attributes and classes
Statistical principle for combining prior class
knowledge with new evidence from data
Multiple implementations

Nave Bayes

Bayesian networks

CLASSIFICATION: Bayesian Classifiers


Requires concept of conditional probability
Measures the probability of an event given that (by
evidence or information) another event has
occurred
Notation: P(A|B) = Probability of A given that
knowledge of B occurred
P(A|B) = P(AB)/P(B)
Equivalently if P(B) 0, = P(AB) = P(A|B)P(B)

BAYESIAN CLASSIFIERS: Conditional


Probability
Example:

Suppose 1% of a specific population has a form of cancer

A new diagnostic test


produces correct positive results for those with the
cancer of 99% of the time
produces correct negative results for those without the
cancer of 98% of the time

P(cancer) = 0.01

P(cancer | positive test) = 0.99

P(no cancer | negative test) = 0.98

BAYESIAN CLASSIFIERS: Conditional


Probability
Example:
But what if you tested positive? What is the
probability that you actually have cancer?
Bayes Theorem reverses the process to provide
us with an answer.

BAYESIAN CLASSIFIERS: Bayes Theorem


P(B|A) = P(BA)/P(A), if P(A)0
= P(AB)/P(A)
= P(A|B)P(B)/P(A)
Application to our example

P(cancer | test positive) =


P(test positive | cancer)*P(cancer)/P(test positive) =
0.01*0.99/(0.01*0.99+0.99*0.98) = 0.01

BAYESIAN CLASSIFIERS: Bayes Theorem

0.99
cancer
0.01

0.99

0.01

No cancer

Test positive
Test negative

0.98

Test positive

0.02

Test negative

BAYESIAN CLASSIFIERS: Nave Bayes


Bayes Theorem Interpretation
P(class C| F1, F2, , Fn) =
P(class C) P(F1, F2, , Fn| C)/P(F1, F2, , Fn)
posterior = prior likelihood/evidence

BAYESIAN CLASSIFIERS: Nave Bayes


Key concepts
Denominator independent of class C
Denominator effectively constant
Numerator equivalent to joint probability model
P(C, F1, F2, , Fn)
Nave conditional independence assumptions
P(C|F1, F2, , Fn) P(C)P(F1|C) P(F
n
2|C) P(Fn|C)
P C P Fi | C

i1

BAYESIAN CLASSIFIERS: Nave Bayes


Multiple distributional assumptions possible
Gaussian
Multinomial
Bernoulli

BAYESIAN CLASSIFIERS: Nave Bayes


Example
Training set (example from Wikipedia)
Sex

Height(feet)

Weight(pounds)

Footsize(inches)

male
male
male
male
female
female
female
female

6
5.92 (5'11")
5.58 (5'7")
5.92 (5'11")
5
5.5(5'6")
5.42 (5'5")
5.75 (5'9")

180
190
170
165
100
150
130
150

12
11
12
10
6
8
7
9

BAYESIAN CLASSIFIERS: Nave Bayes


Example
Assumptions
Continuous data
Gaussian (Normal) distribution
2

x
1

p
exp
2

2
2

P(male) = P(female) = 0.5

BAYESIAN CLASSIFIERS: Nave Bayes


Example
Classifier generated from training set
Sex
Heightmean
Heightvariance
Weightmean
Weightvariance
Footsizemean
Footsizevariance

male
5.855
0.035033
176.25
122.92
11.25
0.91667

female
5.4175
0.097225
132.5
558.33
7.5
1.6667

BAYESIAN CLASSIFIERS: Nave Bayes


Example
Test sample
Sex
Height
Weight
Footsize

sample
6
130
8

BAYESIAN CLASSIFIERS: Nave Bayes


Example
Calculate posterior probabilities for both
genders
Posterior(male) = P(male)P(height|
male)P(weight|male)P(foot size|
male)/evidence
Posterior(female) = P(female)P(height|
female)P(weight|female)P(foot size|
female)/evidence
Evidence is constant and same so we ignore
denominators

BAYESIAN CLASSIFIERS: Nave Bayes


Example
Calculations for male
P(male) = 0.5 (assumed)
1
P(height|male) =
2 0.035033

P(weight|male) =

6 5.855 2
1.5789
exp

2 0.035033

130 176.25 2
5.988110 6
exp

2 122.92
2 122.92

8 11.25 2
1.3112 10 3
exp

2 0.91667
2 0.91667
1

P(foot size|male) =

6.1984 10 9

Posterior numerator (male)

BAYESIAN CLASSIFIERS: Nave Bayes


Example
Calculations for female
P(male) = 0.5 (assumed)
6 5.4175
1
P(height|fwmale) = 2 0.097225 exp 2 0.097225
P(weight|female) =
P(foot size|female) =

0.22346

130 132.5 2
0.016789
exp
2 558.33
2 558.33

8 7.5 2
1.3112 10 3
exp

2 1.6667
2 1.6667
1

5.3778104

Posterior numerator (female)

BAYESIAN CLASSIFIERS: Nave Bayes


Example
Conclusion
Posterior numerator (significantly) greater for
female classification than for male, so classify
sample as female

BAYESIAN CLASSIFIERS: Nave Bayes


Example
Note
We did not calculate P(evidence) [normalizing
constant] since not needed, but could
P(evidence) = P(male)P(height|
male)P(weight|male)P(foot size|male) +
P(female)P(height|female)P(weight|
female)P(foot size|female)

BAYESIAN CLASSIFIERS: Bayesian Networks


Judea Pearl (UCLA Computer Science, Cognitive
Systems Lab): one of the pioneers of Bayesian
Networks
Author: Probabilistic Reasoning in Intelligent
Systems,1988
Father of journalist Daniel Pearl
Kidnapped and murdered in Pakistan in 2002
by Al-Queda

BAYESIAN CLASSIFIERS: Bayesian Networks


Probabilistic graphical model
Represents random variables and conditional
dependencies using a directed acyclic graph
(DAG)
Nodes of graph represent random variables

BAYESIAN CLASSIFIERS: Bayesian Networks


Edges of graph represent conditional
dependencies
Unconnected nodes conditionally independent
of each other
Does not require all attributes to be
conditionally independent

BAYESIAN CLASSIFIERS: Bayesian Networks


Probability table associating each node to its
immediate parent nodes
If node X has no immediate parents, table
contains only prior probability P(X)
If one parent Y, table contains P(X|Y)
If multiple parents {Y1, Y2, , Yn}, table
contains P(X|Y1, Y2, , Yn)

BAYESIAN CLASSIFIERS: Bayesian Networks

BAYESIAN CLASSIFIERS: Bayesian Networks


Model encodes relevant probabilities from
which probabilistic inferences can then be
calculated
Joint probability: P(G, S, R) = P(R)P(S|R)*P(G|S, R)
G = Grass wet
S = Sprinkler on
R = Raining

BAYESIAN CLASSIFIERS: Bayesian Networks


We can then calculate, for example:
P it is raining AND grass is wet
P it is raining | grass is wet
P grass is wet

sprinkler T,F

P grass is wet = T AND sprinkler AND raining=T

sprinkler T,F , raining T,F

P grass is wet = T AND sprinkler AND raining

BAYESIAN CLASSIFIERS: Bayesian Networks


That is
P it is raining | grass is wet
P TTT P TFT

P TTT P TTF P TFT P TFF


0.198 0.154
0.3577
0.198 0.288 0.1584 0.0

BAYESIAN CLASSIFIERS: Bayesian Networks


Building the model
Create network structure (graph)
Determine probability values of tables

Simplest case
Network defined by user

Most real-world cases


Defining network too com[plex
Use machine learning: many algorithms

BAYESIAN CLASSIFIERS: Bayesian Networks


Algorithms built into Weka

User defined network


Conditional independence tests
Genetic search
Hill climber
K2
Simulated annealing
Maximum weight spanning tree
Tabu search

BAYESIAN CLASSIFIERS: Bayesian Networks


Many other versions online
BNT (Bayes Net Tree) Matlab toolbox
Kevin Murphy, University of British Columbia
http://www.cs.ubc.ca/~murphyk/Software/

You might also like