You are on page 1of 29

Data Mining and Machine Learning:

Fundamental Concepts and Algorithms


dataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 18: Probabilistic Classification

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 1
Bayes Classifier
Let the training dataset D consists of n points x i in a d-dimensional space, and
let yi denote the class for each point, with yi ∈ {c1 , c2 , . . . , ck }.

The Bayes classifier estimates the posterior probability P(ci |x) for each class ci ,
and chooses the class that has the largest probability. The predicted class for x is
given as
ŷ = arg max{P(ci |x)}
ci

According to the Bayes theorem, we have

P(x|ci ) · P(ci )
P(ci |x) =
P(x)

Because P(x) is fixed for a given point, Bayes rule can be rewritten as
 
P(x|ci )P(ci ) 
ŷ = arg max{P(ci |x)} = arg max = arg max P(x|ci )P(ci )
ci ci P(x) ci

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 2
Estimating the Prior Probability: P(ci )

Let D i denote the subset of points in D that are labeled with class ci :

D i = {x j ∈ D | x j has class yj = ci }

Let the size of the dataset D be given as |D| = n, and let the size of each
class-specific subset D i be given as |D i | = ni .

The prior probability for class ci can be estimated as follows:

ni
P̂(ci ) =
n

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 3
Estimating the Likelihood
Numeric Attributes, Parametric Approach

To estimate the likelihood P(x|ci ), we have to estimate the joint probability of  x


across all the d dimensions, i.e., we have to estimate P x = (x1 , x2 , . . . , xd )|ci .
In the parametric approach we assume that each class ci is normally distributed,
and we use the estimated mean µ̂i and covariance matrix Σ b i to compute the
probability density at x
( )
1 b −1 (x − µ̂ )
(x − µ̂i )T Σ
ˆ ˆ b
f i (x) = f (x|µ̂i , Σi ) = √ q exp − i i

bi | 2
( 2π)d |Σ

The posterior probability is then given as


fˆi (x)P(ci )
P(ci |x) = Pk
ˆ
j =1 f j (x)P(cj )

The predicted class for x is:


n o
ŷ = arg max fˆi (x)P(ci )
ci

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 4
Bayes Classifier Algorithm

BayesClassifier (D = {(x j , yj )}nj=1 ):


1 for i = 1, .. . , k do
2 D i ← x j | yj = ci , j = 1, . . . , n // class-specific subsets
3 ni ← |D i | // cardinality
4 P̂(ci ) ← ni /n // prior probability
P
5 µ̂i ← n1 x j ∈D i x j // mean
i
6 Z i ← D i − 1ni µ̂Ti // centered data
7 b i ← 1 Z T Z i // covariance matrix
Σ n i
i

8 b i for all i = 1, . . . , k
return P̂(ci ), µ̂i , Σ

Testing (x and P̂(ci ), µ̂i , Σ b i , for all i ∈ [1, k]):



9 b
ŷ ← arg max f (x|µ̂i , Σi ) · P(ci )
ci
10 return ŷ

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 5
Bayes Classifier: Iris Data
Consider the 2D Iris data (sepal length and sepal width). Class c1 , which
corresponds to iris-setosa (circles), has n1 = 50 points, whereas the other class
c2 (triangles) has n2 = 100 points.

n1 50 n2 100
P̂(c1 ) = = = 0.33 P̂(c2 ) = = = 0.67
n 150 n 150

   
5.006 6.262
µ̂1 = µ̂2 =
3.418 2.872

   
b 0.1218 0.0983 b2 = 0.435 0.1209
Σ1 = Σ
0.0983 0.1423 0.1209 0.1096

Plot shows the contour or level curve (corresponding to 1% of the peak density)
for the multivariate normal distribution modeling the probability density for both
classes.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 6
Bayes Classifier: Iris Data
X 1 :sepal length versus X2 :sepal width

X2
bC
x = (6.75, 4.25)T
rS
bC
bC
bC
4.0
bC
bC bC uT uT
bC bC bC
bC bC uT
bC bC bC bC
3.5
bC bC bC bC bC bC bC uT uT uT
bC bC uT uT
bC bC bC bC uT uT uT uT uT uT uT
bC bC bC uT uT uT
bC bC bC bC bC uT uT uT uT uT uT uT uT uT uT uT uT uT uT
3.0
bC uT uT uT uT uT uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT
uT uT uT uT uT uT
uT uT uT uT uT
uT uT uT uT uT uT uT
2.5
uT uT
bC uT uT uT
uT uT

uT X1
2
4 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 7
Bayes Classifier: Iris Data

Let x = (6.75, 4.25)T be a test point (shown as white square). The posterior
probabilities for c1 and c2 can be computed as:
b 1 )P̂(c1 ) = (4.951 × 10−7 ) × 0.33 = 1.634 × 10−7
P̂(c1 |x) ∝ fˆ(x|µ̂1 , Σ
P̂(c2 |x) ∝ fˆ(x|µ̂ , Σb 2 )P̂(c2 ) = (2.589 × 10−5 ) × 0.67 = 1.735 × 10−5
2

Because P̂(c2 |x) > P̂(c1 |x) the class for x is predicted as ŷ = c2 .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 8
Bayes Classifier
Categorical Attributes

Let Xj be a categorical attribute over the domain dom(Xj ) = {aj 1 , aj 2 , . . . , ajmj }.


Each categorical attribute Xj is modeled as an mj -dimensional multivariate
Bernoulli random variable X j that takes on mj distinct vector values
e j 1 , e j 2 , . . . , e jmj , where e jr is the r th standard basis vector in Rmj and corresponds
to the r th value or symbol ajr ∈ dom(Xj ).

The entire d-dimensional dataset is Pmodeled as the vector random variable


d
X = (X 1 , X 2 , . . . , X d )T . Let d ′ = j =1 mj ; a categorical point
x = (x1 , x2 , . . . , xd )T is therefore represented as the d ′ -dimensional binary vector
   
v1 e 1r1
   
v =  ...  =  ... 
vd e drd

where v j = e jrj provided xj = ajrj is the rj th value in the domain of Xj .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 9
Bayes Classifier
Categorical Attributes

The probability of the categorical point x is obtained from the joint probability
mass function (PMF) for the vector random variable X :


P(x|ci ) = f (v |ci ) = f X 1 = e 1r1 , . . . , X d = e drd | ci

The joint PMF can be estimated directly from the data D i for each class ci as
follows:
ni (v )
fˆ(v |ci ) =
ni
where ni (v ) is the number of times the value v occurs in class ci .
However, to avoid zero probabilities we add a pseudo-count of 1 for each value

ni (v ) + 1
fˆ(v |ci ) = Qd
ni + j =1 mj

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 10
Discretized Iris Data

Assume that the sepal length and sepal width attributes in the Iris dataset
have been discretized as shown below.
Bins Domain
Bins Domain
[4.3, 5.2] Very Short (a11 )
[2.0, 2.8] Short (a21 )
(5.2, 6.1] Short (a12 )
(2.8, 3.6] Medium (a22 )
(6.1, 7.0] Long (a13 )
(3.6, 4.4] Long (a23 )
(7.0, 7.9] Very Long (a14 )
(b) Discretized sepal width
(a) Discretized sepal length

We have |dom(X1 )| = m1 = 4 and |dom(X2 )| = m2 = 3.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 11
Discretized Iris Data
Class-specific Empirical Joint Probability Mass Function

X2
Class: c1 fˆX1
Short (e 21 ) Medium (e 22 ) Long (e 23 )
Very Short (e 11 ) 1/50 33/50 5/50 39/50
Short (e 12 ) 0 3/50 8/50 13/50
X1
Long (e 13 ) 0 0 0 0
Very Long (e 14 ) 0 0 0 0
fˆX2 1/50 36/50 13/50

X2
Class: c2 fˆX1
Short (e 21 ) Medium (e 22 ) Long (e 23 )
Very Short (e 11 ) 6/100 0 0 6/100
Short (e 12 ) 24/100 15/100 0 39/100
X1
Long (e 13 ) 13/100 30/100 0 43/100
Very Long (e 14 ) 3/100 7/100 2/100 12/100
fˆX2 46/100 52/100 2/100

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 12
Discretized Iris Data: Test Case

Consider a test point x = (5.3, 3.0)T corresponding to the categorical point


T
(Short, Medium), which is represented as v = e T12 e T22 .
The prior probabilities of the classes are P̂(c1 ) = 0.33 and P̂(c2 ) = 0.67.
The likelihood and posterior probability for each class is given as

P̂(x|c1 ) = fˆ(v |c1 ) = 3/50 = 0.06


P̂(x|c2 ) = fˆ(v |c2 ) = 15/100 = 0.15
P̂(c1 |x) ∝ 0.06 × 0.33 = 0.0198
P̂(c2 |x) ∝ 0.15 × 0.67 = 0.1005

In this case the predicted class is ŷ = c2 .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 13
Discretized Iris Data: Test Case with Pseudo-counts

The test point x = (6.75, 4.25)T corresponds to the categorical point


T
(Long, Long), and it is represented as v = e T13 e T23 .
Unfortunately the probability mass at v is zero for both classes. We adjust the
PMF via pseudo-counts noting that the number of possible values are
m1 × m2 = 4 × 3 = 12.
The likelihood and prior probability can then be computed as
0+1
P̂(x|c1 ) = fˆ(v |c1 ) = = 1.61 × 10−2
50 + 12
0+1
P̂(x|c2 ) = fˆ(v |c2 ) = = 8.93 × 10−3
100 + 12
P̂(c1 |x) ∝ (1.61 × 10−2 ) × 0.33 = 5.32 × 10−3
P̂(c2 |x) ∝ (8.93 × 10−3 ) × 0.67 = 5.98 × 10−3

Thus, the predicted class is ŷ = c2 .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 14
Bayes Classifier: Challenges

The main problem with the Bayes classifier is the lack of enough data to reliably
estimate the joint probability density or mass function, especially for
high-dimensional data.

For numeric attributes we have to estimate O(d 2 ) covariances, and as the


dimensionality increases, this requires us to estimate too many parameters.
For categorical attributes we have
Q to estimate
 the joint probability for all the
possible values of v , given as j |dom Xj |. Even if each categorical attribute has
only two values, we would need to estimate the probability for 2d values. However,
because there can be at most n distinct values for v , most of the counts will be
zero.
Naive Bayes classifier addresses these concerns.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 15
Naive Bayes Classifier
Numeric Attributes

The naive Bayes approach makes the simple assumption that all the attributes are
independent, which implies that the likelihood can be decomposed into a product
of dimension-wise probabilities:

d
Y
P(x|ci ) = P(x1 , x2 , . . . , xd |ci ) = P(xj |ci )
j =1

The likelihood for class ci , for dimension Xj , is given as


( )
2 1 (xj − µ̂ij )2
P(xj |ci ) ∝ f (xj |µ̂ij , σ̂ij ) = √ exp −
2πσ̂ij 2σ̂ij2

where µ̂ij and σ̂ij2 denote the estimated mean and variance for attribute Xj , for
class ci .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 16
Naive Bayes Classifier
Numeric Attributes

bi ,
The naive assumption corresponds to setting all the covariances to zero in Σ
that is,
 2 
σi 1 0 . . . 0
 0 σi22 . . . 0 
 
Σi =  . .. .. 
 .. . . 
2
0 0 . . . σid

The naive Bayes classifier thus uses the sample mean µ̂i = (µ̂i 1 , . . . , µ̂id )T and a
diagonal sample covariance matrix Σ b i = diag (σ 2 , . . . , σ 2 ) for each class ci . In
i1 id
total 2d parameters have to be estimated, corresponding to the sample mean and
sample variance for each dimension Xj .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 17
Naive Bayes Algorithm

NaiveBayes (D = {(x j , yj )}nj=1 ):


1 for i = 1, .. . , k do
D i ← xjT | yj = ci , j = 1, . . . , n // class-specific subsets

2
3 ni ← |D i | // cardinality
4 P̂(ci ) ←P ni /n // prior probability
5 µ̂i ← n1i xj ∈D i xj // mean
6 D i = D i − 1 · µ̂Ti // centered data for class ci
7 for j = 1, .., d do // class-specific var for jth attribute
8 σ̂ij2 ← n1i (Xj i )T (Xj i ) // variance
2 T
σ̂ i ← σ̂i21 , . . . , σ̂id

9 // class-specific attribute variances
10 return P̂(ci ), µ̂i , σ̂ i for all i = 1, . . . , k

Testing (x and P̂(ci ), µ̂i , σ̂ i , for all i ∈ [1, k]):


 d
Y 
11 ŷ ← arg max P̂(ci ) f (xj |µ̂ij , σ̂ij2 )
ci
j =1
12 return ŷ

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 18
Iris Data: Naive Bayes

Consider 2D Iris Data consisting of sepal length and sepal width. P̂(ci ) and
µ̂i remain unchanged. Σb are assumed to be diagonal:

   
b 0.1218 0 b 0.435 0
Σ1 = Σ2 =
0 0.1423 0 0.1096

Figure shows the contour or level curve (corresponding to 1% of the peak density)
of the multivariate normal distribution for both classes. The diagonal assumption
leads to contours that are axis-parallel ellipses.
For the test point x = (6.75, 4.25)T , the posterior probabilities for c1 and c2 are as
follows:
b 1 )P̂(c1 ) = (4.014 × 10−7 ) × 0.33 = 1.325 × 10−7
P̂(c1 |x) ∝ fˆ(x|µ̂1 , Σ
P̂(c2 |x) ∝ fˆ(x|µ̂ , Σb 2 )P̂(c2 ) = (9.585 × 10−5 ) × 0.67 = 6.422 × 10−5
2

Because P̂(c2 |x) > P̂(c1 |x), it predicted ŷ = c2 to x.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 19
Naive Bayes versus Full Bayes Classifier
X1 :sepal length versus X2 :sepal width

X2 X2
bC bC
x = (6.75, 4.25)T x = (6.75, 4.25)T
rS rS
bC bC
bC bC
bC bC
4.0 4.0
bC bC
bC bC uT uT bC bC uT uT
bC bC bC bC bC bC
bC bC uT bC bC uT
bC bC bC bC bC bC bC bC
3.5 3.5
bC bC Cb bC bC bC bC uT uT uT bC bC Cb bC bC bC bC uT uT uT
bC bC uT uT bC bC uT uT
bC bC bC bC uT uT uT uT uT uT uT bC bC bC bC uT uT uT uT uT uT uT
bC bC bC uT uT uT bC bC bC uT uT uT
bC bC bC bC bC uT uT uT uT uT uT uT uT uT uT uT uT uT uT bC bC bC bC bC uT uT uT uT uT uT uT uT uT uT uT uT uT uT
3.0 3.0
bC uT uT uT uT uT uT uT uT uT uT bC uT uT uT uT uT uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT uT uT uT
2.5 2.5
uT uT uT uT
bC uT uT uT bC uT uT uT
uT uT uT uT

uT uT
2 X1 2 X1
4 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 4 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

(a) Naive Bayes (b) Full Bayes

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 20
Naive Bayes
Categorical Attributes

The independence assumption leads to a simplification:

d
Y d
Y 
P(x|ci ) = P(xj |ci ) = f X j = e jrj | ci
j =1 j =1

where f (X j = e jrj |ci ) is the probability mass function for X j , estimated as:
ni (v j )
fˆ(v j |ci ) =
ni
where ni (v j ) is the observed frequency of the value v j = e j rj corresponding to the
rj th categorical value ajrj for the attribute Xj for class ci .
If the count is zero, we can use the pseudo-count method to obtain a prior
probability. The adjusted estimates with pseudo-counts are given as

ni (v j ) + 1
fˆ(v j |ci ) =
n i + mj

where mj = |dom(Xj )|.


Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 21
Discretized Iris Data: Naive Bayes
Class-specific Empirical Joint Probability Mass Function

X2
Class: c1 fˆX1
Short (e 21 ) Medium (e 22 ) Long (e 23 )
Very Short (e 11 ) 1/50 33/50 5/50 39/50
Short (e 12 ) 0 3/50 8/50 13/50
X1
Long (e 13 ) 0 0 0 0
Very Long (e 14 ) 0 0 0 0
fˆX2 1/50 36/50 13/50

X2
Class: c2 fˆX1
Short (e 21 ) Medium (e 22 ) Long (e 23 )
Very Short (e 11 ) 6/100 0 0 6/100
Short (e 12 ) 24/100 15/100 0 39/100
X1
Long (e 13 ) 13/100 30/100 0 43/100
Very Long (e 14 ) 3/100 7/100 2/100 12/100
fˆX2 46/100 52/100 2/100

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 22
Discretized Iris Data: Naive Bayes

The test point x = (6.75, 4.25), corresponding to (Long, Long) or v = (e 13 , e 23 ),


is classified as follows:

   
0+1 13
P̂(v |c1 ) = P̂(e 13 |c1 ) · P̂(e 23 |c1 ) = · = 4.81 × 10−3
50 + 4 50
   
43 2
P̂(v |c2 ) = P̂(e 13 |c2 ) · P̂(e 23 |c2 ) = · = 8.60 × 10−3
100 100
P̂(c1 |v ) ∝ (4.81 × 10−3 ) × 0.33 = 1.59 × 10−3
P̂(c2 |v ) ∝ (8.6 × 10−3 ) × 0.67 = 5.76 × 10−3

Thus, the predicted class is ŷ = c2 .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 23
Nonparametric Approach
K Nearest Neighbors Classifier

We consider a non-parametric approach for likelihood estimation using the nearest


neighbors density estimation.
Let D be a training dataset comprising n points x i ∈ Rd , and let D i denote the
subset of points in D that are labeled with class ci , with ni = |D i |.
Given a test point x ∈ Rd , and K , the number of neighbors to consider, let r
denote the distance from x to its K th nearest neighbor in D.
Consider the d-dimensional hyperball of radius r around the test point x,
defined as

Bd (x, r ) = x i ∈ D | δ(x, x i ) ≤ r

Here δ(x, x i ) is the distance between x and x i , which is usually assumed to be the
Euclidean distance, i.e., δ(x, x i ) = kx − x i k2 . We assume that |Bd (x, r )| = K .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 24
Nonparametric Approach
K Nearest Neighbors Classifier

Let Ki denote the number of points among the K nearest neighbors of x that are
labeled with class ci , that is

Ki = x j ∈ Bd (x, r ) | yj = ci
The class conditional probability density at x can be estimated as the fraction of
points from class ci that lie within the hyperball divided by its volume, that is

Ki /ni Ki
fˆ(x|ci ) = =
V ni V

where V = vol(Bd (x, r )) is the volume of the d-dimensional hyperball.


The posterior probability P(ci |x) can be estimated as
fˆ(x|ci )P̂(ci )
P(ci |x) = Pk
ˆ
j =1 f (x|cj )P̂(cj )
ni
However, because P̂(ci ) = n
, we have
Ki ni Ki
fˆ(x|ci )P̂(ci ) = · =
ni V n nV
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 25
Nonparametric Approach
K Nearest Neighbors Classifier

The posterior probability is given as


Ki
Ki
P(ci |x) = PknV Kj
=
K
j =1 nV

Finally, the predicted class for x is

 
Ki
ŷ = arg max {P(ci |x)} = arg max = arg max {Ki }
ci ci K ci

Because K is fixed, the KNN classifier predicts the class of x as the majority class
among its K nearest neighbors.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 26
Iris Data
K Nearest Neighbors Classifier

Consider the 2D Iris dataset. Class c1 (circles) has n1 = 50 points and c2


(triangles) has n2 = 100 points.
We use KNN (K = 5) to classify the test point x = (6.75, 4.25)T .
The√distance from x to its 5th nearest neighbor, namely (6.2, 3.4)T , is given as
r = 1.025 = 1.012.
The enclosing ball or circle of radius r encompasses K1 = 1 point from c1 and
K2 = 4 points from c2 .
Therefore, the predicted class for x is ŷ = c2 .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 27
Iris Data
K Nearest Neighbors Classifier

X2
bC
x = (6.75, 4.25)T
rS
bC
bC
bC
4.0
bC
bC bC
r uT uT
bC bC bC
bC bC uT
bC bC bC bC
3.5
bC bC bC bC bC bC uT uT uT
bC bC uT uT
bC bC bC bC uT uT uT uT uT uT uT
bC bC bC uT uT uT
bC bC bC bC bC uT uT uT uT uT uT uT uT uT uT uT uT uT uT
3.0
bC uT uT uT uT uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT
uT uT uT uT uT uT
uT uT uT uT uT
uT uT uT uT uT uT uT
2.5
uT uT
bC uT uT uT
uT uT

uT
2 X1
4 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 28
Data Mining and Machine Learning:
Fundamental Concepts and Algorithms
dataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 18: Probabilistic Classification

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 18: Probabilistic Classification 29

You might also like