Chap2 Part2 PDF

Bayes Decision Theory
Minimum-Error-Rate Classification
Classifiers, Discriminant Functions and Decision Surfaces
The Normal Density
CSE 555: Srihari 0

Minimum-Error-Rate Classification
• Actions are decisions on classes

If action αi is taken and the true state of nature is ωj
then: decision is correct if i = j and in error if i ≠ j
Seek a decision rule that minimizes the

probability of error which is the error rate
CSE 555: Srihari 1

Minimum Error Rate Classifier Derivation
• zero-one loss function: ⎧0 i = j
λ ( α i ,ω j ) = ⎨ i , j = 1 ,..., c
⎩1 i ≠ j
• Therefore, the conditional risk is: j =c

R( α i | x ) = ∑ λ ( α i | ω j ) P ( ω j | x )
j =1
= ∑ P( ω j | x ) = 1 − P( ω i | x )
j≠1
The risk corresponding to this loss function is the average probability error”
• Minimize the risk requires maximize P(ωi | x)
(since R(αi | x) = 1 – P(ωi | x))
For Minimum error rate
Decide ωi if P (ωi | x) > P(ωj | x) ∀j ≠ i
CSE 555: Srihari 2

Likelihood Ratio Classification
• Regions of decision and zero-one loss function,
λ12 − λ 22 P ( ω 2 ) P( x | ω1 )
Let . = θ λ then decide ω 1 if : > θλ
λ 21 − λ11 P ( ω 1 ) P( x | ω 2 )
• If λ is the zero-one loss function which means:

⎛ 0 1⎞
λ = ⎜⎜ ⎟⎟
⎝ 1 0 ⎠
P( ω 2 )
then θ λ = = θa
P( ω1 )
⎛0 2 ⎞ 2 P( ω 2 )
if λ = ⎜⎜ ⎟⎟ then θ λ = = θb
⎝ 1 0 ⎠ P ( ω 1 ) • Likelihood Ratio p(x/ω1)/p(x/ω2).
• If we use a zero-one loss function
decision boundaries are determined by
threshold θa.
• If loss function penalizes miscategorizing ω2
as ω1 more than converse we get larger
Class-conditional pdfs threshold θb and hence R1 becomes smaller3
CSE 555: Srihari
Classifiers, Discriminant Functions
and Decision Surfaces
• Many methods of representing pattern classifiers

Set of discriminant functions gi(x), i = 1,…, c
Classifier assigns feature x to class ωi if gi(x) > gj(x) ∀j ≠ i
Classifier is a machine
that computes c
discriminant functions
Functional structure of a general statistical pattern

Classifier with d inputs and c discriminant functions
gi(x)
CSE 555: Srihari 4
Forms of Discriminant Functions
• Let gi(x) = - R(αi | x)

(max. discriminant corresponds to min. risk!)
• For the minimum error rate, we take

gi(x) = P(ωi | x)
(max. discrimination corresponds to max. posterior!)
gi(x) ≡ P(x | ωi) P(ωi)
gi(x) = ln P(x | ωi) + ln P(ωi)
CSE 555: Srihari 5

Decision Region
• Feature space divided into c decision regions
if gi(x) > gj(x) ∀j ≠ i then x is in Ri
2-D, two-category classifier with Gaussian pdfs

Decision Boundary = two hyperbolas
Hence decision region R2 is not simply connected 6
CSE 555: Srihari
Ellipses mark where density is 1/e times that of peak distribution
The Two-Category case
A classifier is a dichotomizer
that has two discriminant functions g1 and g2
Let g(x) ≡ g1(x) – g2(x)

Decide ω1 if g(x) > 0 ; Otherwise decide ω2
The computation of g(x)
g( x ) = P ( ω 1 | x ) − P ( ω 2 | x )
P( x | ω1 ) P( ω1 )
= ln + ln
P( x | ω 2 ) P( ω 2 )
CSE 555: Srihari 7
The Normal Distribution
A bell-shaped distribution defined by the probability density function
1 x−µ 2
1 − ( )
p( x) = e 2 σ
2πσ 2
If the random variable X follows a normal distribution, then
• The probability that X will fall into the interval (a,b) is
given by b
∫a
p ( x)dx
∞
• Expected, or mean, value of X is E[ X ] = ∫ xp( x)dx =µ
−∞
• Variance of X is ∞
Var ( x) = E[( x − µ ) 2 ] = ∫ ( x − µ ) 2 p( x)dx = σ 2
−∞
• Standard deviation of X,σ 2, is
σx =σ 8
CSE 555: Srihari
Relationship between Entropy and Normal Density
Entropy of a distribution
∞
H ( p ( x)) = ∫ p( x) ln p( x)dx
−∞
Measured in nats. If log2 is uses the unit is bits
Entropy measures uncertainty in the values of points

selected randomly from a distribution
Normal distribution has maximum entropy over all

distributions having a given mean and variance
CSE 555: Srihari 9
Normal Distribution, Mean 0, Standard
Deviation 1
With 80% confidence the r.v. will lie in the two-sided interval[-1.28,1.28]
CSE 555: Srihari 10

The Normal Density in Pattern Recognition
• Univariate density
• Analytically tractable, continuous
• A lot of processes are asymptotically Gaussian
• Central Limit Theorem: aggregate effect of a sum of a large number of
small, independent random disturbances will lead to a Gaussian
distribution
• Handwritten characters, speech sounds are ideal or prototype
corrupted by random process
1 ⎡ 1 ⎛ x − µ ⎞2 ⎤
P( x ) = exp ⎢ − ⎜ ⎟ ⎥,
2π σ ⎢⎣ 2 ⎝ σ ⎠ ⎥⎦
Where:
µ = mean (or expected value) of x
σ2 = expected squared deviation
or variance
Univariate normal distribution has roughly 95% of its area in the range
CSE 555: Srihari |x-µ|<2σ. 11
The peak of the distribution has value p(µ)=1/sqrt(2πσ)
Multivariate density
Multivariate normal density in d dimensions is:

1 ⎡ 1 ⎤
p( x) = exp ⎢− ( x − µ ) t Σ −1 ( x − µ )⎥
(2π ) d / 2 Σ ⎣ 2 ⎦
1/ 2
abbreviated as
p ( x) ~ N ( µ , Σ)
where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
µ = (µ1, µ2, …, µd)t mean vector
Σ = d*d covariance matrix
|Σ| and Σ-1 are determinant and inverse respectively
CSE 555: Srihari 12

Mean and Covariance Matrix
• Formal Definitions
µ = Ε[ x] = ∫ xp( x)dx
∑ = Ε[( x − µ )( x − µ ) t
= ∫ ( x − µ )( x − µ ) t
p ( x)dx
Mean vector has its components which are means of variables
Covariance :
Diagonal elements are variances of variables
Cross-diagonal elements are covariances of pairs of variables
Statistical independence means off-diagonal elements are zero
CSE 555: Srihari 13
Multivariate Normal Density
• Specified by d+d(d+1)/2 parameters: mean and

independent elements of covariance matrix
Locii of points
of constant
density are
hyperellipsoids
Samples drawn from a 2-D Gaussian lie in a cloud centered at the mean µ.
Ellipses show lines
CSE 555: of equal probability density of the Gaussian
Srihari 14
Linear Combinations of Normally distributed
variables are normally distributed
Whitening Transform
Φ is matrix whose columns are the orthonormal Eigen vectors
of ∑ and A is diagonal matrix of corresponding Eigen values
then the transformation A w = ΦA−1/ 2 applied to the coordinates
ensures that transformed distribution has covariance matrix equal
to the identity matrix
Action of a linear transformation on the feature space will convert an arbitrary normal
distribution into another normal distribution. One transformation A, takes the source
distribution into distribution N(Atµ,AtΣA). Another linear transformation- a projection P onto
a Line defined by vector a– leads to N(µ, σ2) measured along that line. While the
transforms yield distributions in a different space they are shown superimposed on the
original x1-x2 space. A whitening transform Aw leads to a circularly symmetric Gaussian,
here shown displaced.
CSE 555: Srihari 15
Mahanalobis Distance
r 2 = ( x − µ ) t Σ −1 ( x − µ ) Contours of constant
is the Mahanalobis distance from x toµ Density are hyperellipsoids
of constant Mahanalobis
Distance
For a given dimensionality

Scatter of samples varies
directly with |E|1/2
Samples drawn from a 2-D Gaussian lie in a cloud centered at the mean µ.
Ellipses show lines of equal probability density of the Gaussian
CSE 555: Srihari 16

Chap2 Part2 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap2 Part2 PDF

Uploaded by

Copyright:

Available Formats

Bayes Decision Theory

Classifiers, Discriminant Functions and Decision Surfaces

The Normal Density

CSE 555: Srihari 0

• Actions are decisions on classes

Seek a decision rule that minimizes the

CSE 555: Srihari 1

• Therefore, the conditional risk is: j =c

CSE 555: Srihari 2

• If λ is the zero-one loss function which means:

• Many methods of representing pattern classifiers

Functional structure of a general statistical pattern

• Let gi(x) = - R(αi | x)

• For the minimum error rate, we take

CSE 555: Srihari 5

2-D, two-category classifier with Gaussian pdfs

Let g(x) ≡ g1(x) – g2(x)

The computation of g(x)

Measured in nats. If log2 is uses the unit is bits

Entropy measures uncertainty in the values of points

Normal distribution has maximum entropy over all

CSE 555: Srihari 10

Multivariate normal density in d dimensions is:

CSE 555: Srihari 12

Mean vector has its components which are means of variables

• Specified by d+d(d+1)/2 parameters: mean and

For a given dimensionality

CSE 555: Srihari 16

You might also like