Objectives:: Linear Discriminant Analysis

ECE 8443 – Pattern Recognition
LECTURE 09: LINEAR DISCRIMINANT ANALYSIS
• Objectives:
Fisher Linear Discriminant Analysis
Multiple Discriminant Analysis
Examples
• Resources:
D.H.S.: Chapter 3 (Part 2)
W.P.: Fisher
DTREG: LDA
S.S.: DFA
URL: Audio:
Component Analysis (Review)
• Previously introduced as a “whitening transformation”.
• Component analysis is a technique that combines features to reduce the

dimension of the feature space.
• Linear combinations are simple to compute and tractable.
• Project a high dimensional space onto a lower dimensional space.
• Three classical approaches for finding the optimal transformation:
 Principal Components Analysis (PCA): projection that best represents the

data in a least-square sense.
 Multiple Discriminant Analysis (MDA): projection that best separates the

data in a least-squares sense.
 Independent Component Analysis (IDA): projection that minimizes the

mutual information of the components.
ECE 8443: Lecture 09, Slide 2

Discriminant Analysis
• Discriminant analysis seeks directions that are efficient for discrimination.
• Consider the problem of projecting data from d dimensions onto a line with
the hope that we can optimize the orientation of the line to minimize error.
• Consider a set of n d-dimensional samples x1,…,xn in the subset D1 labeled 1

and n2 in the subset D2 labeled 2.
• Define a linear combination of x: y  wtx
and a corresponding set of n samples y1, …, yn divided into Y1 and Y2.
n 2
• Our challenge is to find w that J0  x0    x0  xk
k 1
maximizes separation.
• This can be done by considering

the ratio of the between-class
scatter to the within-class
scatter.

Separation of the Means and Scatter
1
• Define a sample mean for class i: mi  x
ni xDi
• The sample mean for the projected points are:
~  1  y  1  w t x  w tm
mi i
ni yYi ni xDi
The sample mean for the projected points is just the projection of the mean
(which is expected since this is a linear transformation).
• It follows that the distance between the projected means is:

~ m
m ~  w tm  w tm
1 2 1 2
• Define a scatter for the projected samples:
si 2    y  m
~ ~ 2
i
yYi
1 2 ~2
• An estimate of the variance of the pooled data is: ( )(~
s1  s2 )
n
and is called the within-class scatter.

Fisher Linear Discriminant and Scatter
m ~ 2
~ m
• The Fisher linear discriminant maximizes the criteria: J ( w)  12 22
~
s ~
s 1 2
• Define a scatter for class I, Si :
Si    x - mi  x - mi  t
xDi
• The total scatter, Sw, is:
SW  S1  S 2
• We can write the scatter for the projected samples as:

~ 
si 2   w t x  w t mi  2
xDi
  w t  x  mi  x  mi  t w  w t Si w
xDi
• Therefore, the sum of the scatters can be written as:

~
s2~s 2  wtS w
1 2 W

Separation of the Projected Means
• The separation of the projected means obeys:
~ m
m1 2 
~  w tm  w tm
1 2 2
 w t  m1  m2  m1  m2  t w
 wtSBw
• Where the between class scatter, SB, is given by:
S B   m1  m2  m1  m2  t
• Sw is the within-class scatter and is proportional to the covariance of the

pooled data.
• SB , the between-class scatter, is symmetric and positive definite, but because

it is the outer product of two vectors, its rank is at most one.
• This implies that for any w, SBw is in the direction of m1-m2.

wtSBw
J  w  t
• The criterion function, J(w), can be written as: w SW w

Linear Discriminant Analysis
• This ratio is well-known as the generalized Rayleigh quotient and has the
well-known property that the vector, w, that maximizes J(), must satisfy:
S B w  SW w
• The solution is: w  SW -1 (m1  m2 )
• This is Fisher’s linear discriminant, also known as the canonical variate.
• This solution maps the d-dimensional problem to a one-dimensional problem

(in this case).
• From Chapter 2, when the conditional densities, p(x|i), are multivariate

normal with equal covariances, the optimal decision boundary is given by:
w t x  w0  0
where w    μ1 - μ2  , and w0 is related to the prior probabilities.
1
• The computational complexity is dominated by the calculation of the within-

class scatter and its inverse, an O(d2n) calculation. But this is done offline!
• Let’s work some examples (class-independent PCA and LDA).

Multiple Discriminant Analysis
• For the c-class problem in a d-dimensional space, the natural generalization
involves c-1 discriminant functions.
• The within-class scatter is defined as:

c c
SW   Si     x - mi  x - mi  t
i 1 i 1xDi
• Define a total mean vector, m:

1 c
m   nimi
n i 1
and a total scatter matrix, ST, by:
ST    x - m x - m t
x
• The total scatter is related to the within-class scatter (derivation omitted):
c
ST  SW  S B S B   ni (mi  m)(mi  m)t
i 1
• We have c-1 discriminant functions of the form:
 
y  W t x  w ti x i  1,2 ,...,c-1

Multiple Discriminant Analysis (Cont.)
• The criterion function is:
WtSBW
J (W) 
W t SW W
• The solution to maximizing J(W) is once again found via an eigenvalue
decomposition:
S B  i SW  0  S B  i SW  w i  0
• Because SB is the sum of c rank one or less matrices, and because only c-1 of
these are independent, SB is of rank c-1 or less.
• An excellent presentation on applications of LDA can be found at PCA Fails!

Summary
• Introduced Component Analysis.
• Defined a criterion that maximizes discrimination.
• Derived the solution to the two-class problem.
• Generalized this solution to c classes.
• Compared LDA and PCA on some interesting data sets.

Objectives:: Linear Discriminant Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Objectives:: Linear Discriminant Analysis

Uploaded by

Copyright:

Available Formats

ECE 8443 – Pattern Recognition

LECTURE 09: LINEAR DISCRIMINANT ANALYSIS

• Component analysis is a technique that combines features to reduce the

• Linear combinations are simple to compute and tractable.

• Project a high dimensional space onto a lower dimensional space.

• Three classical approaches for finding the optimal transformation:

 Principal Components Analysis (PCA): projection that best represents the

 Multiple Discriminant Analysis (MDA): projection that best separates the

 Independent Component Analysis (IDA): projection that minimizes the

ECE 8443: Lecture 09, Slide 2

• Consider a set of n d-dimensional samples x1,…,xn in the subset D1 labeled 1

• Define a linear combination of x: y  wtx

and a corresponding set of n samples y1, …, yn divided into Y1 and Y2.

• This can be done by considering

ECE 8443: Lecture 09, Slide 3

• It follows that the distance between the projected means is:

• Define a scatter for the projected samples:

ECE 8443: Lecture 09, Slide 4

• We can write the scatter for the projected samples as:

• Therefore, the sum of the scatters can be written as:

ECE 8443: Lecture 09, Slide 5

• Where the between class scatter, SB, is given by:

• Sw is the within-class scatter and is proportional to the covariance of the

• SB , the between-class scatter, is symmetric and positive definite, but because

• This implies that for any w, SBw is in the direction of m1-m2.

ECE 8443: Lecture 09, Slide 6

• The solution is: w  SW -1 (m1  m2 )

• This is Fisher’s linear discriminant, also known as the canonical variate.

• This solution maps the d-dimensional problem to a one-dimensional problem

• From Chapter 2, when the conditional densities, p(x|i), are multivariate

• The computational complexity is dominated by the calculation of the within-

ECE 8443: Lecture 09, Slide 7

• The within-class scatter is defined as:

• Define a total mean vector, m:

ECE 8443: Lecture 09, Slide 8

• An excellent presentation on applications of LDA can be found at PCA Fails!

ECE 8443: Lecture 09, Slide 9

ECE 8443: Lecture 09, Slide 10

You might also like