Professional Documents
Culture Documents
= =
S P
M P M S P
S M P
likelihood
prior
prior
posterior o prior x likelihood
) (
) | ( ) (
) | (
X P
C X P C P
X C P =
verosimilitud
Before observing the data, our
prior beliefs can be expressed in a
prior probability distribution that
represents the knowledge we have
about the unknown features.
After observing the data our
revised beliefs are captured by a
posterior distribution over the
unknown features.
Aprendizagem Computacional Gladys Castillo, U.A. 6
) (
) | ( ) (
) | (
x
x
x
P
c P c P
c P
j j
j
=
Bayesian Classifier
Bayes Theorem
Maximum a
posteriori
classification
How to determine P(c
j
| x) for each class c
j
?
P(x) can be ignored because it is
the same for all the classes (a
normalization constant)
) | ( ) ( max ) (
1
*
j j
...m j
Bayes
c P c P arg h c x x
=
= =
Classify x to the class which has bigger posterior probability
Bayesian
Classification
Rule
Aprendizagem Computacional Gladys Castillo, U.A. 7
Nave because of its very nave independence assumption:
Nave Bayes (NB) Classifier
all the attributes are conditionally independent given the class
Duda and Hart (1973); Langley (1992)
P(x | c
j
) can be decomposed
into a product of n terms, one
term for each attribute
Bayes because the class c* attached to an example x is
determined by the Bayes Theorem
) | ( ) ( max ) (
1
*
j j
...m j
Bayes
c P c P arg h c x x
=
= =
when the attribute space is
high dimensional direct
estimation is hard unless we
introduce some assumptions
[
=
=
= = =
n
i
j i i j
...m j
NB
c x X P c P arg h c
1
1
*
) | ( ) ( max ) (x
NB
Classification
Rule
Aprendizagem Computacional Gladys Castillo, U.A. 8
X
i
continuous
The attribute is discretized and then treats as a discrete attribute
A Normal distribution is usually assumed
2. Estimate P(X
i
=x
k
|c
j
) for each value x
k
of the attribute X
i
and for each class c
j
X
i
discrete
1. Estimate P(c
j
) for each class c
j
Nave Bayes (NB)
Learning Phase (Statistical Parameter Estimation)
) , ; ( ) | (
ij ij k j k i
x g c x X P o = =
Given a training dataset D of N labeled examples (assuming complete data)
N
N
c P
j
j
= ) (
j
ijk
j k i
N
N
c x X P = = ) | (
2
2
) (
2
2
1
) , ; (
e
x
x g
o
o t
o
=
onde
N
j
- the number of examples of the class c
j
N
ijk
- number of examples of the class c
j
having the value x
k
for the attribute X
i
Two
options
The mean
ij
e the standard deviation o
ij
are estimated from D
Aprendizagem Computacional Gladys Castillo, U.A. 9
2. Estimate P(X
i
=x
k
|c
j
) for a value of the attribute X
i
and for each class c
j
A Normal distribution is usually assumed
X
i
| c
j
~ N(
ij
,
2
ij
) - the mean
ij
e the standard deviation o
ij
are estimated from D
Continuous Attributes
Normal or Gaussian Distribution
For a variable X ~ N(74, 36), the probability
of observing the value 66 is given by:
f(x) = g(66; 74, 6) = 0.0273
T
h
e
d
e
n
s
i
t
y
p
r
o
b
a
b
i
l
i
t
y
f
u
n
c
t
i
o
n
o
- o 2o 3o 2o 3o
N(0,o
2
)
o
- o 2o 3o 2o 3o
N(0,o
2
)
f(x) is symmetrical around its mean
) , ; ( ) | (
ij ij k j k i
x g c x X P o = =
2
2
) (
2
2
1
) , ; (
e
x
x g
o
o t
o
=
Aprendizagem Computacional Gladys Castillo, U.A. 10
1. Estimate P(c
j
) for each class c
j
2. Estimate P(X
i
=x
k
|c
j
) for each value of X
1
and each class c
j
X
1
discrete X
2
continuos
Nave Bayes
Probability Estimates
) 10 1 1.73 ( ) | (
2
. , x; g C x X P = + = =
T
r
a
i
n
i
n
g
d
a
t
a
s
e
t
D
5
3
) (
= + = C P
3
2
) | (
= + = = C a X P
i
Class
X
1
X
2
+ a
1.0
+ b
1.2
+ a
3.0
- b
4.4
- b
4.5
Two classes: + (positive) e - (negative)
Two attributes:
X
1
discrete which takes values a e b
X
2
continuos
Binary Classification Problem
3
1
) | (
= + = = C b X P
i
5
2
) (
= = C P
) 07 0 4.45 ( ) | (
2
. , x; g C x X P = = =
2
0
) | (
= = = C a X P
i
2
2
) | (
= = = C b X P
i
2+
=1.73, o
2+
= 1.10
2-
=4.45, o
2
2+
= 0.07
Example from John & Langley (1995)
Aprendizagem Computacional Gladys Castillo, U.A. 11
Probability Estimates
Discrete Attributes
for each class:
P(No) = 7/10
P(Yes) = 3/10
for each attribute value and class:
Examples:
P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0
Tid Refund Marital
Status
Taxable
Income
Evade
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
c
a
t
e
g
o
r
i
c
a
l
c
a
t
e
g
o
r
i
c
a
l
c
o
n
t
i
n
u
o
u
s
c
l
a
s
s
Adapted from Tan,Steinbach, Kumar, Introduction to Data Mining book
N
N
c P
j
j
= ) (
j
ijk
j k i
N
N
c x X P = = ) | (
To compute N
ijk
we need to count the
number of examples of the class c
j
having the value x
k
for the attribute X
i
Aprendizagem Computacional Gladys Castillo, U.A. 12
for each pair attribute-class (X
i
, c
j
)
Example
for (Income, Class=No)
If Class=No
sample mean = 110
sample standard deviation = 54.54
0072 . 0
) 54 . 54 ( 2
1
) | 120 (
) 2975 ( 2
) 110 120 (
2
= = =
e No Income P
t
Probability Estimates
Continuos Attributes
Tid Refund Marital
Status
Taxable
Income
Evade
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
c
a
t
e
g
o
r
i
c
a
l
c
a
t
e
g
o
r
i
c
a
l
c
o
n
t
i
n
u
o
u
s
c
l
a
s
s
Adapted from Tan,Steinbach, Kumar, Introduction to Data Mining book
2
2
) (
2
2
1
) ( ) | (
e
ij
ij k
x
ij
ij ij k j k i
, ; x g c x X P o
t o
o
= = =
Aprendizagem Computacional Gladys Castillo, U.A. 13
The Balance-scale Problem
Each example has 4 numerical attributes:
the left weight (Left_W)
the left distance (Left_D)
the right weight (Right_W)
right distance (Right_D)
The dataset was generated to model psychological experimental results
Left_W
Each example is classified into 3 classes:
the balance-scale:
tip to the right (Right)
tip to the left (Left)
is balanced (Balanced)
3 target rules:
If LD x LW > RD x RW tip to the left
If LD x LW < RD x RW tip to the right
If LD x LW = RD x RW it is balanced
dataset from the UCI repository
Right_W
Left_D
Right_D
Aprendizagem Computacional Gladys Castillo, U.A. 14
The Balance-scale Problem
Left_W
Left_D
Right_W Right_D
Balanced 2 6 4 3
...
2
1
Left Left_W _W
Balance Balance- -Scale DataSet Scale DataSet
... ... ... ...
Left 2 3 5
Right 2 4 5
Class Class Right Right_D _D Right Right_W _W Left Left_D _D
Balanced 2 6 4 3
...
2
1
Left Left_W _W
Balance Balance- -Scale DataSet Scale DataSet
... ... ... ...
Left 2 3 5
Right 2 4 5
Class Class Right Right_D _D Right Right_W _W Left Left_D _D
Adapted from Joo Gamas slides Aprendizagem Bayesiana
Discretization is applied: each attribute is mapped to 5 intervals
Aprendizagem Computacional Gladys Castillo, U.A. 15
Balanced 2 6 4 3
...
2
1
Left Left_W _W
Balance Balance- -Scale DataSet Scale DataSet
... ... ... ...
Left 2 3 5
Right 2 4 5
Class Class Right Right_D _D Right Right_W _W Left Left_D _D
Balanced 2 6 4 3
...
2
1
Left Left_W _W
Balance Balance- -Scale DataSet Scale DataSet
... ... ... ...
Left 2 3 5
Right 2 4 5
Class Class Right Right_D _D Right Right_W _W Left Left_D _D
The Balance-scale Problem
Learning Phase
Build
Contingency
tables
Contingency Tables
Attribute Attribute: : Left Left_W _W
25 34 49 66 86
Right Right
9 10 8 8 10
Balanced Balanced
72 71 61 42 14
Left Left
I5 I5 I4 I4 I3 I3 I2 I2 I1 I1 Class Class
Attribute Attribute: : Left Left_W _W
25 34 49 66 86
Right Right
9 10 8 8 10
Balanced Balanced
72 71 61 42 14
Left Left
I5 I5 I4 I4 I3 I3 I2 I2 I1 I1 Class Class
Attribute Attribute: : Left Left_D _D
27 37 49 57 90
Right Right
8 10 9 10 8
Balanced Balanced
77 70 59 38 16
Left Left
I5 I5 I4 I4 I3 I3 I2 I2 I1 I1 Class Class
Attribute Attribute: : Left Left_D _D
27 37 49 57 90
Right Right
8 10 9 10 8
Balanced Balanced
77 70 59 38 16
Left Left
I5 I5 I4 I4 I3 I3 I2 I2 I1 I1 Class Class
Attribute Attribute: : Right Right_W _W
79 70 58 37 16
Right Right
8 9 10 10 8
Balanced Balanced
28 33 49 63 87
Left Left
I5 I5 I4 I4 I3 I3 I2 I2 I1 I1 Class Class
Attribute Attribute: : Right Right_W _W
79 70 58 37 16
Right Right
8 9 10 10 8
Balanced Balanced
28 33 49 63 87
Left Left
I5 I5 I4 I4 I3 I3 I2 I2 I1 I1 Class Class
Attribute Attribute: : Right Right_D _D
82 67 57 37 17
Right Right
9 10 8 10 8
Balanced Balanced
25 35 44 65 91
Left Left
I5 I5 I4 I4 I3 I3 I2 I2 I1 I1 Class Class
Attribute Attribute: : Right Right_D _D
82 67 57 37 17
Right Right
9 10 8 10 8
Balanced Balanced
25 35 44 65 91
Left Left
I5 I5 I4 I4 I3 I3 I2 I2 I1 I1 Class Class
260
Left Left
Classes Classes Counters Counters
565 260 45
Total Total Right Right Balanced Balanced
260
Left Left
Classes Classes Counters Counters
565 260 45
Total Total Right Right Balanced Balanced
565 examples
Assuming complete data, the computation
of all the required estimates requires a
simple scan through the data, an operation
of time complexity O(N n), where N is the
number of training examples.
Aprendizagem Computacional Gladys Castillo, U.A. 16
We need to estimate the posterior probabilities
P(c
j
| x) for each class
How NB classifies this example?
The class for this example is the class which has bigger posterior probability
P(Left|x) P(Balanced|x) P(Right|x)
0.277796 0.135227 0.586978 Class = Right
P(c
j
|x) = P(c
j
) x P(Left_W=1 | c
j
) x P(Left_D=5 | c
j
)
x P(Right_W=4 | c
j
) x P(Right_D=2 | c
j
),
c
j
e {Left, Balanced, Right}
The class counters and
contingency tables are used to
compute the posterior
probabilities for each class
1
Left Left_W _W
Right 2 4 5
Class Class Right Right_D _D Right Right_W _W Left Left_D _D
1
Left Left_W _W
Right 2 4 5
Class Class Right Right_D _D Right Right_W _W Left Left_D _D
?
max
The Balance-scale Problem
Classification Phase
[
=
=
= = =
n
i
j i i j
...m j
NB
c x X P c P arg h c
1
1
*
) | ( ) ( max ) (x
Aprendizagem Computacional Gladys Castillo, U.A. 17
Iris Dataset
From the statistician Douglas Fisher
Three flower types (classes):
Setosa
Virginica
Versicolour
Four continuous attributes
Sepal width and length
Petal width and length
Virginica. Robert H. Mohlenbrock. USDA
NRCS. 1995. Northeast wetland flora: Field
office guide to plant species. Northeast National
Technical Center, Chester, PA. Courtesy of
USDA NRCS Wetland Science Institute.
dataset from the UCI repository
Aprendizagem Computacional Gladys Castillo, U.A. 18
Iris Dataset
Aprendizagem Computacional Gladys Castillo, U.A. 19
Iris Dataset
Aprendizagem Computacional Gladys Castillo, U.A. 20
Scatter Plot Array of Iris Attributes
The attributes
petal width and
petal length provide a
moderate separation of
the Irish species
Aprendizagem Computacional Gladys Castillo, U.A. 21
Nave Bayes
Iris Dataset Continuous Attributes
Normal Probability Density Functions
Attribute: PetalWidth
P(PetalWidth|Setosa)
P(PetalWidth|Versicolor)
P(PetalWidth|Virginica)
Class Iris-setosa
mean: 0.244, standard deviation: 0.107
Class Iris-versicolor
mean: 1.326, standard deviation: 0.198
Class Iris-virginica
mean: 2.026, standard deviation: 0.275
Aprendizagem Computacional Gladys Castillo, U.A. 22
Class Iris-versicolor (0.327):
==============================
Attribute sepallength
mean: 5.936, standard deviation: 0.516
Attribute sepalwidth
mean: 2.770, standard deviation: 0.314
Attribute petallength
mean: 4.260, standard deviation: 0.470
Attribute petalwidth
mean: 1.326, standard deviation: 0.198
Nave Bayes
Iris Dataset Continuous Attributes
Model
Class Iris-setosa (0.327):
==========================
Attribute sepallength
mean: 5.006, standard deviation: 0.352
Attribute sepalwidth
mean: 3.418, standard deviation: 0.381
Attribute petallength
mean: 1.464, standard deviation: 0.174
Attribute petalwidth
mean: 0.244, standard deviation: 0.107
Class Iris-virginica (0.327):
=============================
Attribute sepallength
mean: 6.588, standard deviation: 0.636
Attribute sepalwidth
mean: 2.974, standard deviation: 0.322
Attribute petallength
mean: 5.552, standard deviation: 0.552
Attribute petalwidth
mean: 2.026, standard deviation: 0.275
Aprendizagem Computacional Gladys Castillo, U.A. 23
Nave Bayes
Iris Dataset Continuous Attributes
Classification Phase
maximum values
Aprendizagem Computacional Gladys Castillo, U.A. 24
Nave Bayes
Iris Dataset Continuous Attributes
Estimate P(c
j
| x) for each class:
P(setosa|x) P(versicolor|x) P(virginica|x)
0 0.995 0.005 Classe = versicolor
P(c
j
|x) = P(c
j
) x P(sepalLength =5 | c
j
) x P(sepalWidth =3 | c
j
)
x P(petalLength =2 | c
j
) x P(petalWidth =2 | c
j
),
c
j
e {setosa, versicolor, virgnica}
?
max
How NB classifies this example?
The class for this example is the class which has bigger posterior probability
Aprendizagem Computacional Gladys Castillo, U.A. 25
Nave Bayes
Iris Dataset Continuous Attributes
Estimate P(c
j
| x) for the versicolor class
Class Iris-versicolor (0.327):
==============================
Attribute sepallength
mean: 5.936, standard deviation: 0.516
Attribute sepalwidth
mean: 2.770, standard deviation: 0.314
Attribute petallength
mean: 4.260, standard deviation: 0.470
Attribute petalwidth
mean: 1.326, standard deviation: 0.198
P(versicolor
|x) = P(versicolor
) x P(sepalLength = 5 | versicolor )
x P(sepalWidth =3 | versicolor) x P(petalLength =2 | versicolor)
x P(petalWidth =2 | versicolor
)
P(versicolor
|x) = 0.327 x g(5; 5.936, 0.516) x g(3; 2.770, 0.314)
x g(2; 4.260, 0.470) x g(2; 1.326, 0.198)
[
=
=
= = =
n
i
j i i j
...m j
NB
c x X P c P arg h c
1
1
*
) | ( ) ( max ) (x
Aprendizagem Computacional Gladys Castillo, U.A. 26
Nave Bayes
Continuous Attributes - Discretization
For continuos attributes
After BinDiscretization, bins=3
Aprendizagem Computacional Gladys Castillo, U.A. 27
Nave Bayes
Continuous Attributes - Discretization
Model
Class Iris-versicolor (0.327):
==============================
Attribute sepallength
range1: 0.220 range2: 0.720 range3: 0.060
Attribute sepalwidth
range1: 0.460 range2: 0.540 range3: 0.000
Attribute petallength
range1: 0.000 range2: 0.960 range3: 0.040
Attribute petalwidth
range1: 0.000 range2: 0.980 range3: 0.020
Class Iris-setosa (0.327):
==========================
Attribute sepallength
range1: 0.940 range2: 0.060 range3: 0.000
Attribute sepalwidth
range1: 0.020 range2: 0.720 range3: 0.260
Attribute petallength
range1: 1.000 range2: 0.000 range3: 0.000
Attribute petalwidth
range1: 1.000 range2: 0.000 range3: 0.000
Class Iris-virginica (0.327):
=============================
Attribute sepallength
range1: 0.020 range2: 0.640 range3: 0.340
Attribute sepalwidth
range1: 0.380 range2: 0.580 range3: 0.040
Attribute petallength
range1: 0.000 range2: 0.120 range3: 0.880
Attribute petalwidth
range1: 0.000 range2: 0.100 range3: 0.900
Aprendizagem Computacional Gladys Castillo, U.A. 28
Nave Bayes
Continuous Attributes - Discretization
Class Iris-setosa (0.327):
==========================
Attribute sepallength
range1: 0.940 range2: 0.060 range3: 0.000
Attribute sepalwidth
range1: 0.020 range2: 0.720 range3: 0.260
Attribute petallength
range1: 1.000 range2: 0.000 range3: 0.000
Attribute petalwidth
range1: 1.000 range2: 0.000 range3: 0.000
Class Iris-versicolor (0.327):
==============================
Attribute sepallength
range1: 0.220 range2: 0.720 range3: 0.060
Attribute sepalwidth
range1: 0.460 range2: 0.540 range3: 0.000
Attribute petallength
range1: 0.000 range2: 0.960 range3: 0.040
Attribute petalwidth
range1: 0.000 range2: 0.980 range3: 0.020
Class Iris-virginica (0.327):
=============================
Attribute sepallength
range1: 0.020 range2: 0.640 range3: 0.340
Attribute sepalwidth
range1: 0.380 range2: 0.580 range3: 0.040
Attribute petallength
range1: 0.000 range2: 0.120 range3: 0.880
Attribute petalwidth
range1: 0.000 range2: 0.100 range3: 0.900
Class Probabilities
Setosa Versicolor Virgnica
0.327 0.327 0.327
Atributo Sepallength
Range1 Range2 Range3
Setosa 0.940 0.060 0.000
Versicolor 0.220 0.720 0.060
Virginica 0.020 0.640 0.340
Atributo Sepalwidth
Range1 Range2 Range3
Setosa 0.020 0.720 0.260
Versicolor 0.460 0.540 0.000
Virginica 0.380 0.580 0.040
We can build a conditional probability table (CPT)
for each attribute
Aprendizagem Computacional Gladys Castillo, U.A. 29
Nave Bayes
Continuous Attributes - Discretization
Class Iris-setosa (0.327):
==========================
Attribute sepallength
range1: 0.940 range2: 0.060 range3: 0.000
Attribute sepalwidth
range1: 0.020 range2: 0.720 range3: 0.260
Attribute petallength
range1: 1.000 range2: 0.000 range3: 0.000
Attribute petalwidth
range1: 1.000 range2: 0.000 range3: 0.000
Class Iris-versicolor (0.327):
==============================
Attribute sepallength
range1: 0.220 range2: 0.720 range3: 0.060
Attribute sepalwidth
range1: 0.460 range2: 0.540 range3: 0.000
Attribute petallength
range1: 0.000 range2: 0.960 range3: 0.040
Attribute petalwidth
range1: 0.000 range2: 0.980 range3: 0.020
Class Iris-virginica (0.327):
=============================
Attribute sepallength
range1: 0.020 range2: 0.640 range3: 0.340
Attribute sepalwidth
range1: 0.380 range2: 0.580 range3: 0.040
Attribute petallength
range1: 0.000 range2: 0.120 range3: 0.880
Attribute petalwidth
range1: 0.000 range2: 0.100 range3: 0.900
Atributo Petallength
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.960 0.040
Virginica 0.000 0.100 0.900
Atributo Petalwidth
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.980 0.020
Virginica 0.000 0.100 0.900
Aprendizagem Computacional Gladys Castillo, U.A. 30
Nave Bayes
Iris Dataset Discretized Attributes
Classification (Implementation) Phase
maximum values
Discretized Examples
Aprendizagem Computacional Gladys Castillo, U.A. 31
Nave Bayes
Classification Phase
How to classify a
new example ?
P(setosa|x) = P(setosa
) x P(sepalLength = r1 | setosa ) x P(sepalWidth =r2 | setosa)
x P(petalLength = r1 | setosa) x P(petalWidth =r3 | setosa
)
P(versicolor|x) = P(versicolor) x P(sepalLength = r1 | versicolor )
x P(sepalWidth =r2 | setosa) x P(petalLength = r1 | versicolor)
x P(petalWidth =r3 | versicolor )
P(virginica|x) = P(virginica) x P(sepalLength = r1 | virginica)
x P(sepalWidth =r2 | virginica) x P(petalLength = r1 | virginica)
x P(petalWidth =r3 | virginica )
Compute the conditional posterior probabilities for each class
sepallengt sepalwidth petallength petallength
r1 r2 r1 r3
Example with
discretized attributes
sepallengt sepalwidth petallength petallength
5 3 2 2
Aprendizagem Computacional Gladys Castillo, U.A. 32
Nave Bayes
Continuous Attributes - Discretization
Class Probabilities
Setosa Versicolor Virgnica
0.327 0.327 0.327
Class Probabilities
Setosa Versicolor Virgnica
0.327 0.327 0.327
Atributo Sepallength
Range1 Range2 Range3
Setosa 0.940 0.060 0.000
Versicolor 0.220 0.720 0.060
Virginica 0.020 0.640 0.340
Atributo Sepallength
Range1 Range2 Range3
Setosa 0.940 0.060 0.000
Versicolor 0.220 0.720 0.060
Virginica 0.020 0.640 0.340
Atributo Sepalwidth
Range1 Range2 Range3
Setosa 0.020 0.720 0.260
Versicolor 0.460 0.540 0.000
Virginica 0.380 0.580 0.040
Atributo Sepalwidth
Range1 Range2 Range3
Setosa 0.020 0.720 0.260
Versicolor 0.460 0.540 0.000
Virginica 0.380 0.580 0.040
Atributo Petallength
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.960 0.040
Virginica 0.000 0.100 0.900
Atributo Petallength
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.960 0.040
Virginica 0.000 0.100 0.900
Atributo Petalwidth
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.980 0.020
Virginica 0.000 0.100 0.900
Atributo Petalwidth
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.980 0.020
Virginica 0.000 0.100 0.900
P(setosa | x) = P(setosa
)
x P(sepalLength = r1 | setosa )
x P(sepalWidth = r2 | setosa )
x P(petalLength = r1 | setosa )
x P(petalWidth = r3 | setosa
)
P(setosa | x) = 0.327 x 0.940 x 0.720 x 1.000 x 0.000 = 0
r1
petallength petallength
r1
sepallengt sepallengt
r3 r2
petallength petallength sepalwidth sepalwidth
r1
petallength petallength
r1
sepallengt sepallengt
r3 r2
petallength petallength sepalwidth sepalwidth
Aprendizagem Computacional Gladys Castillo, U.A. 33
Nave Bayes
Continuous Attributes - Discretization
Class Probabilities
Setosa Versicolor Virgnica
0.327 0.327 0.327
Class Probabilities
Setosa Versicolor Virgnica
0.327 0.327 0.327
Atributo Sepallength
Range1 Range2 Range3
Setosa 0.940 0.060 0.000
Versicolor 0.220 0.720 0.060
Virginica 0.020 0.640 0.340
Atributo Sepallength
Range1 Range2 Range3
Setosa 0.940 0.060 0.000
Versicolor 0.220 0.720 0.060
Virginica 0.020 0.640 0.340
Atributo Sepalwidth
Range1 Range2 Range3
Setosa 0.020 0.720 0.260
Versicolor 0.460 0.540 0.000
Virginica 0.380 0.580 0.040
Atributo Sepalwidth
Range1 Range2 Range3
Setosa 0.020 0.720 0.260
Versicolor 0.460 0.540 0.000
Virginica 0.380 0.580 0.040
Atributo Petallength
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.960 0.040
Virginica 0.000 0.100 0.900
Atributo Petallength
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.960 0.040
Virginica 0.000 0.100 0.900
Atributo Petalwidth
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.980 0.020
Virginica 0.000 0.100 0.900
Atributo Petalwidth
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.980 0.020
Virginica 0.000 0.100 0.900
P(versicolor | x) = P(versicolor)
x P(sepalLength = r1 | versicolor )
x P(sepalWidth = r2 | versicolor )
x P(petalLength = r1 | versicolor )
x P(petalWidth = r3 | versicolor
)
P(versicolor | x) = 0.327 x 0.220 x 0.540 x 0.000 x 0.020 = 0
r1
petallength petallength
r1
sepallengt sepallengt
r3 r2
petallength petallength sepalwidth sepalwidth
r1
petallength petallength
r1
sepallengt sepallengt
r3 r2
petallength petallength sepalwidth sepalwidth
Aprendizagem Computacional Gladys Castillo, U.A. 34
Nave Bayes
Continuous Attributes - Discretization
Class Probabilities
Setosa Versicolor Virgnica
0.327 0.327 0.327
Class Probabilities
Setosa Versicolor Virgnica
0.327 0.327 0.327
Atributo Sepallength
Range1 Range2 Range3
Setosa 0.940 0.060 0.000
Versicolor 0.220 0.720 0.060
Virginica 0.020 0.640 0.340
Atributo Sepallength
Range1 Range2 Range3
Setosa 0.940 0.060 0.000
Versicolor 0.220 0.720 0.060
Virginica 0.020 0.640 0.340
Atributo Sepalwidth
Range1 Range2 Range3
Setosa 0.020 0.720 0.260
Versicolor 0.460 0.540 0.000
Virginica 0.380 0.580 0.040
Atributo Sepalwidth
Range1 Range2 Range3
Setosa 0.020 0.720 0.260
Versicolor 0.460 0.540 0.000
Virginica 0.380 0.580 0.040
Atributo Petallength
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.960 0.040
Virginica 0.000 0.100 0.900
Atributo Petallength
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.960 0.040
Virginica 0.000 0.100 0.900
Atributo Petalwidth
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.980 0.020
Virginica 0.000 0.100 0.900
Atributo Petalwidth
Range1 Range2 Range3
Setosa 1.000 0.000 0.000
Versicolor 0.000 0.980 0.020
Virginica 0.000 0.100 0.900
P(virginica | x) = P(virginica)
x P(sepalLength = r1 | virginica )
x P(sepalWidth = r2 | virginica )
x P(petalLength = r1 | virginica )
x P(petalWidth = r3 | virginica
)
P(virginica | x) = 0.327 x 0.020 x 0.580 x 0.000 x 0.900 = 0
r1
petallength petallength
r1
sepallengt sepallengt
r3 r2
petallength petallength sepalwidth sepalwidth
r1
petallength petallength
r1
sepallengt sepallengt
r3 r2
petallength petallength sepalwidth sepalwidth
If all the class probabilities are zero
we can not determine the class for
this example
Aprendizagem Computacional Gladys Castillo, U.A. 35
Nave Bayes
Laplace Correction
N
ijk
number of examples in D such that
X
i
= x
k
and C = c
j
Nj number of examples in D of class c
j
k number of possible values of X
i
Atributo Sepallength
Range1 Range2 Range3
Setosa 0.940 0.060 0.000
Versicolor 0.220 0.720 0.060
Virginica 0.020 0.640 0.340
Atributo Sepallength
Range1 Range2 Range3
Setosa 0.940 0.060 0.000
Versicolor 0.220 0.720 0.060
Virginica 0.020 0.640 0.340
To avoid zero
probabilities due to zero
counters we can
implement the Laplace
correction
j
ijk
j k i
N
N
c x X P = = ) | (
k N
N
c x X P
j
ijk
j k i
+
+
= =
1
) | (