P. 1
KL

|Views: 5|Likes:

See more
See less

03/04/2011

pdf

text

original

# KL Transform

G.A. GIRALDI
1
1
LNCC–National Laboratory for Scientiﬁc Computing -
Av. Getulio Vargas, 333, 25651-070, Petr´ opolis, RJ, Brazil
{gilson}@lncc.br
Abstract.
1 Introduction
In image databases it is straightforward to consider each data point (image) as a point in a n-dimensional space,
where n is the number of pixels of each image. Therefore, dimensionality reduction may be necessary in order
to discard redundancy and simplify further operations. The most know technique in this subject is the Principal
Components Analysis (PCA) [1, 2].
In what follows we review the basic theory behind PCA, the KL Transform, which is the best unitary
transform from the viewpoint of compression.
2 KL Transform
The Karhunen-Loeve (or KL) transform, can be seen as a method for data compression or dimensionality reduction
(see [3], section 5.11 also). Thus, let us suppose that the data to be compressed consist of N tuples or data vectors,
from a n-dimensional space. Then, PCA searches for k n-dimensional orthonormal vectors that can best be used
to represent the data, where k ≤ n. Figure 1.a,b picture this idea using a bidimensional representation. If we
suppose the the data points are distributed over the elipse, it follows that the coordinate system (

X, Y

, shown
in Figure 1.b is more suitable for representing the data set in a sense that will be formally described next.
Thus, let S = {u
1
, u
2
, ..., u
N
} the data set represented on Figure 1. By now, let us suppose that the centroid
of the data set is the center of the coordinate system, that means:
C
M
=
1
N
N
¸
i=1
u
i
= 0. (1)
(a) (b)
Figure 1: (a) Dataset and original coordinate system. (b) Principal directions X and Y .
To address the issue of compression, we need a vector basis that satisﬁes a proper optimization criterium
(rotated axes in Figure 1.b). Following [3], consider the operations in Figure 2. The vector u
j
is ﬁrst transformed
to a vector v
j
by the matrix (transformation) A. Thus, we truncate v
j
by choosing the ﬁrst m elements of v
j
.
The obtained vector w
j
is just the transformation of v
j
by I
m
, that is a matrix with 1s along the ﬁrst m diagonal
elements and zeros elsewhere. Finally, w
j
is transformed to z
j
by the matrix B. Let the square error deﬁned as
follows:
J
m
=
1
N
N
¸
j=0
u
j
−z
j

2
=
1
n
Tr

N
¸
j=0
(u
j
−z
j
) (u
j
−z
j
)
∗T
¸
¸
, (2)
where Tr means the trace of the matrix between the square brackets and the notation (∗T) means the transpose of
the complex conjugate of a matrix. Following Figure 2, we observe that z
j
= BI
m
Au
j
. Thus we can rewrite (2)
as:
J
m
=
1
N
Tr
¸
N
¸
i=0
(u
j
−BI
m
Au
j
) (u
j
−BI
m
Au
j
)
∗T
¸
, (3)
which yields:
J
m
=
1
N
Tr

(I −BI
m
A) R(I −BI
m
A)
∗T

, (4)
where:
R =
N
¸
i=0
u
j
u
∗T
j
. (5)
Following the literature, we call R the covariance matrix. We can now stating the optimization problem by
saying that we want to ﬁnd out the matrices A, B that minimizes J
m
. The next theorem gives the solution for this
problem.
Theorem 1: The error J
m
in expression (4) is minimum when
A = Φ
∗T
, B = Φ, AB = BA = I, (6)
where Φ is the matrix obtained by the orthonormalized eigenvectors of R arranged according to the decreasing
order of its eigenvalues.
Proof. To minimize J
m
we ﬁrst observe that J
m
must be zero if m = n. Thus, the only possibility would be
I = BA ⇒ A = B
−1
. (7)
Besides, by remembering that
Tr (CD) = Tr (DC) , (8)
we can also write:
J
m
=
1
n
Tr

(I −BI
m
A)
∗T
(I −BI
m
A) R

. (9)
Again, this expression must be null if m = n. Thus:
J
n
=
1
n
Tr

I −BA−A
∗T
B
∗T
+A
∗T
B
∗T
BA

R

.
This error is minimum if:
B
∗T
B = I, A
∗T
A = I, (10)
that is, if A and B are unitary matrix. The next condition comes from the differentiation of J
m
respect to the
elements of A. We should set the result to zero in order to obtain the necessary condition to minimize J
m
. This
yields:
I
m
A
∗T

I −A
∗T
I
m
A

R = 0, (11)
which renders:
J
m
=
1
n
Tr

I −A
∗T
I
m
A

R

. (12)
By using the property (8), the last expression can be rewritten as
J
m
=
1
n
Tr

R −I
m
ARA
∗T

.
Since R is ﬁxed, J
m
will be minimized if
˜
J
m
= Tr

I
m
ARA
∗T

=
m−1
¸
i=0
a
T
i
Ra

i
, (13)
is maximized where a
T
i
is the ith row of A. Once A is unitary, we must impose the constrain:
a
T
i
a

i
= 1. (14)
Thus, we shall maximize
˜
J
m
subjected to the last condition. The Lagrangian has the form:
˜
J
m
=
m−1
¸
i=0
a
T
i
Ra

i
+
m−1
¸
i=0
λ
i

1 −a
T
i
a

i

,
where the λ
i
are the Lagrangian multipliers. By differentiating this expression respect to a
i
we get:
Ra

i
= λ
i
a

i
, (15)
Thus, a

i
are orthonormalized eigenvectors of R. Substituting this result in expression (13) produces:
˜
J
m
=
m−1
¸
i=0
λ
i
, (16)
which is maximized if {a

i
, i = 0, 1, ..., m− 1} correspond to the largest eigenvalues of R. (End of prof)
A straightforward variation of the above statement is obtained if we have a random vector u with zero mean.
In this case, the pipeline of Figure 2 yields a random vector z and the square error can be expressed as:
J
m
=
1
n
Tr

E

(u −BI
m
Au) (u −BI
m
Au)
∗T
¸
,
which can be written as:
J
m
=
1
n
Tr

(I −BI
m
A) R(I −BI
m
A)
∗T

, (17)
where R = E

uu
∗T

is the covariance matrix. Besides, if C
m
in expression (1) is not zero, we must translate
the coordinate system to C
m
before computing the matrix R , that is:
u
j
= u
j
−C
m
. (18)
In this case, matrix R will be given by:
R =
N
¸
i=0
u
j
u
j
∗T
.
Also, sometimes may be useful to consider in expression (2) some other norm, not necessarily the 2-norm. In
this case, there will be a real, symmetric and positive-deﬁned matrix M, that deﬁnes the norm. Thus, the square
error J
m
will be rewritten in more general form:
J
m
=
1
n
N
¸
j=0
u
j
−z
j

2
M
=
1
n
N
¸
j=0
(u
j
−z
j
)
∗T
M (u
j
−z
j
) . (19)
Obviously, if M = I we recover expression (2). The link between this case and the above one is easily
obtained by observing that there is non-singular and real matrix W, such that:
W
T
MW = I. (20)
The matrix W deﬁnes the transformation:
W´ u
j
= u
j
, W ´ z
j
= z
j
. (21)
Thus, by inserting these expressions in equation (19) we obtain:
J
m
=
1
n
N
¸
j=0
(´ u
j
− ´ z
j
)
∗T
(´ u
j
− ´ z
j
) . (22)
Expression (22) can be written as:
J
m
=
1
n
N
¸
j=0
´ u
j
− ´ z
j

2
, (23)
now using the 2-norm, like in expression (2). Therefore:
J
m
=
1
n
Tr

N
¸
j=0
(´ u
j
− ´ z
j
) · (´ u
j
− ´ z
j
)
∗T
¸
¸
. (24)
Following the same development performed above, we will ﬁnd that we must solve the equation:
´
R
´
a

i
= λ
i
´
a

i
, (25)
where:
´
R =
N
¸
j=0
´ u
j
´ u
j
∗T
. (26)
Thus, from transformations (21) it follows that:
´
R = WRW
T
. (27)
and, therefore, we must solve the following eigenvalue/eigenvector problem:

WRW
T

´
a

i
= λ
i
´
a

i
. (28)
The eigenvectors, in the original coordinate system, are ﬁnally given by:
W
´
a

i
= a

i
. (29)
Deﬁnition: Given an input vector u, its KL transform is deﬁned as:
v = Φ
∗T
u, (30)
where Φ is the matrix obtained by the orthonormalized eigenvectors of the associated covariance matrix R ar-
ranged according to the decreasing order of its eigenvalues.
3 Exercises
1. Apply the KL transform to a database. Compare the compression efﬁciency with DFT and DCT.
2. Show that the the components of vector v in expression (30) are uncorrelated and have zero mean,
that is:
E [v (k)] = 0, k = 0, 1, 2, ..., N − 1, (31)
E [v (k) v

(l)] = 0, k, l = 0, 1, 2, ..., N − 1. (32)
3. Extend the KL transform to an N ×N image u(m, n) (see [3], page 164-165).
References
[1] K. Fukunaga. Introduction to Statistical Patterns Recognition. Academic Press, New York., 18(8):831–836,
1990.
[2] T. Hastie, R. Tibshirani, and J.H. Friedman. The Elements of Statistical Learning. Springer, 2001.
[3] Anil K. Jain. Fundamentals of Digital Image Processing. Prentice-Hall, Inc., 1989.

N (4) N R= i=0 uj u∗T . Finally. B = Φ. Theorem 1: The error Jm in expression (4) is minimum when A = Φ∗T . Thus we can rewrite (2) as: Figure 2: KL Transform formulation.to a vector vj by the matrix (transformation) A. that is a matrix with 1s along the ﬁrst m diagonal elements and zeros elsewhere. n Again. Let the square error deﬁned as follows:   N N 1 1 Jm = uj − zj 2 = T r  (uj − zj ) (uj − zj )∗T  . Thus. AB = BA = I. j (5) Following the literature. wj is transformed to zj by the matrix B. (6) where Φ is the matrix obtained by the orthonormalized eigenvectors of R arranged according to the decreasing order of its eigenvalues. We can now stating the optimization problem by saying that we want to ﬁnd out the matrices A. the only possibility would be I = BA ⇒ A = B −1 . Thus: Jm = (9) (8) (7) . The obtained vector wj is just the transformation of vj by Im . Thus. we observe that zj = BIm Auj . by remembering that T r (CD) = T r (DC) . To minimize Jm we ﬁrst observe that Jm must be zero if m = n. i=0 (3) Jm = where: 1 T r (I − BIm A) R (I − BIm A)∗T . Besides. B that minimizes Jm . this expression must be null if m = n. (2) N n j=0 j=0 where T r means the trace of the matrix between the square brackets and the notation (∗T ) means the transpose of the complex conjugate of a matrix. Proof. N Jm = which yields: 1 Tr N (uj − BIm Auj ) (uj − BIm Auj )∗T . The next theorem gives the solution for this problem. we can also write: 1 T r (I − BIm A)∗T (I − BIm A) R . Following Figure 2. Reprinted from [3]. we truncate vj by choosing the ﬁrst m elements of vj . we call R the covariance matrix.

. m − 1} correspond to the largest eigenvalues of R. In this case. . if A and B are unitary matrix. 1. B ∗T B = I. i i ˜ (14) Thus. (End of prof) i A straightforward variation of the above statement is obtained if we have a random vector u with zero mean. Once A is unitary. (10) that is.. A∗T A = I. n aT Ra∗ . i = 0. i i where the λi are the Lagrangian multipliers. the last expression can be rewritten as Jm = Jm = Since R is ﬁxed. This yields: Im A∗T I − A∗T Im A R = 0. We should set the result to zero in order to obtain the necessary condition to minimize Jm . n (17) 1 T r E (u − BIm Au) (u − BIm Au)∗T n . we must impose the constrain: i aT a∗ = 1. the pipeline of Figure 2 yields a random vector z and the square error can be expressed as: Jm = which can be written as: Jm = 1 T r (I − BIm A) R (I − BIm A)∗T .. which renders: 1 T r I − A∗T Im A R . The next condition comes from the differentiation of Jm respect to the elements of A. a∗ are orthonormalized eigenvectors of R.Jn = This error is minimum if: 1 Tr n I − BA − A∗T B ∗T + A∗T B ∗T BA R . Substituting this result in expression (13) produces: i ˜ m−1 (15) Jm = i=0 λi . By differentiating this expression respect to ai we get: Ra∗ = λi a∗ . The Lagrangian has the form: ˜ m−1 m−1 Jm = i=0 aT Ra∗ + i i i=0 λi 1 − aT a∗ . . we shall maximize J m subjected to the last condition. Jm will be minimized if J m = T r Im ARA∗T = i=0 ˜ m−1 (11) (12) 1 T r R − Im ARA∗T . n By using the property (8). (16) which is maximized if {a∗ . i i Thus. i i (13) is maximized where aT is the ith row of A.

there will be a real. symmetric and positive-deﬁned matrix M . j=0 (19) Obviously. we will ﬁnd that we must solve the equation: Ra∗ = λi a∗ . like in expression (2). such that: W T M W = I. therefore. Therefore:   N 1 Jm = T r  (uj − zj ) · (uj − zj )∗T  . j=0 (22) uj − zj j=0 2 . n j=0 (24) Following the same development performed above.where R = E uu∗T is the covariance matrix. by inserting these expressions in equation (19) we obtain: Jm = Expression (22) can be written as: Jm = 1 n N 1 n N (uj − zj )∗T (uj − zj ) . that deﬁnes the norm. W zj = z j . sometimes may be useful to consider in expression (2) some other norm. from transformations (21) it follows that: R = W RW T . the square error Jm will be rewritten in more general form: Jm 1 = n N uj − j=0 zj 2 M 1 = n N (uj − zj )∗T M (uj − zj ) . The matrix W deﬁnes the transformation: W uj = uj . (21) (20) Thus. and. Besides. that is: uj = uj −Cm . The link between this case and the above one is easily obtained by observing that there is non-singular and real matrix W . In this case. (23) now using the 2-norm. we must translate the coordinate system to Cm before computing the matrix R . i i where: N (25) R= j=0 uj uj ∗T . Also. In this case. if Cm in expression (1) is not zero. not necessarily the 2-norm. we must solve the following eigenvalue/eigenvector problem: (27) . if M = I we recover expression (2). (26) Thus. matrix R will be given by: N (18) R= i=0 uj uj ∗T . Thus.

Tibshirani... Introduction to Statistical Patterns Recognition. Compare the compression efﬁciency with DFT and DCT. N − 1. Hastie. l = 0. 1989. 1.. are ﬁnally given by: W a∗ = a∗ . 3 Exercises 1. Friedman. 1. Inc. 18(8):831–836. The Elements of Statistical Learning.H. Apply the KL transform to a database. Fukunaga. page 164-165). [3] Anil K. . n) (see [3]. R.. . 2. N − 1. in the original coordinate system. i i Deﬁnition: Given an input vector u. i i The eigenvectors.W RW T a∗ = λi a∗ .. its KL transform is deﬁned as: v = Φ∗T u. (31) k. New York. 2. Show that the the components of vector v in expression (30) are uncorrelated and have zero mean. . (32) 3. E [v (k) v ∗ (l)] = 0. Fundamentals of Digital Image Processing. 1990.. (28) (29) (30) where Φ is the matrix obtained by the orthonormalized eigenvectors of the associated covariance matrix R arranged according to the decreasing order of its eigenvalues. Extend the KL transform to an N × N image u(m. [2] T.. 2001. Academic Press. Jain. 2. k = 0. Prentice-Hall. Springer. References [1] K. and J. that is: E [v (k)] = 0..

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->