\

=
) , cov( ) , cov( ) , cov(
) , cov( ) , cov( ) , cov(
) , cov( ) , cov( ) , cov(
z z y z x z
z y y y x y
z x y x x x
C
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 26
Covariance Matrix
Down the main diagonal, one can see that the
covariance value is computed between one of
the dimensions and itself (which are the
variances for that dimension).
Since cov(a,b) = cov(b,a), the covariance matrix is
symmetrical about the main diagonal.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 27
Eigenvectors and Eigenvalues
A vector v is an eigenvector of a square matrix
(m by m) M if M*v (multiplication of the matrix
M by the vector v) gives a multiple of v, i.e., a
*v (multiplication of the scalar by the vector
v).
In this case, is called the eigenvalue of M that
is associated to the eigenvector v.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 28
Eigenvector Properties
Eigenvectors can only be found for square
matrices.
Not every square matrix has eigenvectors.
An m by m matrix has m eigenvectors, given that
they exist.
For example, given a 3 by 3 matrix that has
eigenvectors, there are three of them.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 29
Eigenvector Properties
Even if the eigenvector is scaled by some amount
before being multiplied, one still gets the same
multiple of it as a result.
This is because if a vector is scaled by some amount,
all it is done is to make it longer, not changing its
direction
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 30
Eigenvector Properties
All the eigenvectors of a matrix are
perpendicular (orthogonal), i.e., at right angles
to each other, no matter how many dimensions
the matrix have.
This is important because it means that the data
can be expressed in terms of these
perpendicular eigenvectors, instead of
expressing them in terms of their axes.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 31
The PCA Method
Step 1: Get some data to use in a simple
example.
I am going to use my own two dimensional data set.
I have chosen a two dimensional data set because I can
provide plots of the data to show what the PCA analysis is
doing at each step.
The data I have used is found in the next slide.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 32
The PCA Method
The data used in this
example is shown
here.
alturas pesos
183 79
173 69
120 45
168 70
188 81
158 61
201 98
163 63
193 79
167 71
178 73
Data =
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 33
The PCA Method
Step 2: Subtract the mean.
For PCA to work properly, you have to subtract the
mean from each of the data dimensions.
The mean subtracted is the average across each
dimension.
All the x values have their mean value subtracted from
them, as well as all the y values have their mean value
subtracted from them.
This produces a data set whose mean is zero.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 34
The PCA Method
The data with its
mean subtracted
(adjusted data) is
shown here.
Both the data and the
adjusted data are
plotted in the next
slide.
alturas pesos
11 7.27
1 2.72
52 26.72
4 1.72
16 9.27
14 10.72
29 26.27
9 8.72
21 7.27
5 0.72
6 1.27
Data =
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 35
The PCA Method
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 36
The PCA Method
Step 3: Calculate the covariance matrix.
Since the data is two dimensional, the covariance
matrix will have two rows and two columns:


.

\

=
02 . 180 70 . 277
70 . 277 80 . 471
C
As the nondiagonal
el ement s i n t hi s
covariance matrix are
positive, we should
expect that both x
a nd y v a r i a b l e s
i ncrease together.
One shoul d noti ce t hat
hei ghts and wei ghts do
normally increase together.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 37
The PCA Method
Step 4: Calculate the eigenvectors and
eigenvalues of the data matrix.
In Matlab, this step is performed using eig (only
for square matrices) or svd (matrices with any
shape) commands.
As the data matrix is not square, we only can use the
svd command.
The eigenvectors and eigenvalues are rather
important, giving useful information about the data.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 38
The PCA Method
Step 4: Calculate the eigenvectors and
eigenvalues of the data matrix.
Here are the eigenvectors, which are found along the
diagonal of the matrix S, diag(S) in Matlab, and the
eigenvalues:


.

\

+
=


.

\

=
0.9220 0.3871
0.3871 0.9220
0392 . 16
623.1194
rs eigenvecto
s eigenvalue
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 39
The PCA Method
Looking at the plot of the
adjusted data shown
here, one can see how it
has quite a strong
pattern.
As expected from the
covariance matrix (and
from the common sense),
both of the variables
increase together.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 40
The PCA Method
On top of the adjusted data I have plotted both
eigenvectors as well (appearing as a red and a
green line).
As stated earlier, they are perpendicular to each
other.
More important than this is that they provide
information about the data patterns.
One of the eigenvectors goes right through the
middle of the points, drawing a line of best fit.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 41
The PCA Method
The first eigenvector (the one plotted in green)
shows us that these two data sets are very
related to each other along that line.
The second eigenvector (the one plotted in red)
gives the other, and less important, pattern in
the data.
It shows that all the points follow the main line, but
are off to its side by some amount.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 42
The PCA Method
By the process of taking the eigenvectors of the
covariance matrix, we have been able to extract
lines that characterize the data.
The rest of the steps involve transforming the
data so that this data is expressed in terms of
these lines.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 43
The PCA Method
Recalling the important aspects from the
previous figure:
Two lines are perpendicular to each other, being
interchangeably orthogonal;
The eigenvectors provides us a way to see hidden
patterns of the data;
One of the eigenvectors draws a line which best fits
to the data.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 44
The PCA Method
Step 5: Choosing components and forming a
feature vector.
Here comes the notion of data compression and
reduced dimensionality.
Eigenvalues have different values: the highest one
corresponds to the eigenvector that is the principal
component of the data set (the most significant
relationship between the data dimensions).
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 45
The PCA Method
Once the eigenvectors are found from the data
matrix, they are ordered by their eigenvalues,
from the highest to the lowest.
This gives the components in order of significance.
The components which are less significant can
be ignored.
Some information is lost but, if the eigenvalues are
small, the amount lost is not too much.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 46
The PCA Method
If some components are left out, the final data
set will have less dimensions than the original.
If the original data set has n dimensions and n
eigenvectors are calculated (together with their
eigenvalues) and only the first p eigenvectors are
chosen, then the final data set will have only p
dimensions.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 47
The PCA Method
Now, what needs to be done is to form a feature
vector (a fancy name for a matrix of vectors).
This feature vector is constructed by taking the
eigenvectors that are to be kept from the list of
eigenvectors and form a matrix with them in the
columns.
T
n
r eigenvecto
r eigenvecto
r eigenvecto
Vector Feature
(
(
(
= , ,
,
_
2
1
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 48
The PCA Method
Using the data set seen before, and the fact that
there are two eigenvectors, there are two
choices.
One is to form a feature vector with both of the
eigenvectors:


.

\

+
=
0.9220 0.3871
0.3871 0.9220
rs eigenvecto
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 49
The PCA Method
The other is to form a feature vector leaving out
the smaller, less significant, component and only
have a single column:
r eigenvecto t significan Less r eigenvecto t significan Most
eigenvalue t significan Less
eigenvalue t significan Most
0.9220 0.3871
0.3871 0.9220
0392 . 16
623.1194


.

\

+
=


.

\

=
rs eigenvecto
s eigenvalue
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 50
The PCA Method
In other words, the result is a feature vector with
p vectors, selected from n eigenvectors (where p
< n).
This is the most common option.
( )}
r eigenvecto t significan Most
eigenvalue t significan Most
3871 . 0
9220 . 0
1194 . 623


.

\

=
=
r eigenvecto
eigenvalue
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 51
The PCA Method
Step 6: Deriving the new data set.
This the final step in PCA (and the easiest one).
Chose the components (eigenvectors) to be kept in
the data set and form a feature vector.
Just remember that the eigenvector with the highest
eigenvalue is the principal component of the data set.
Take the transpose of the vector and multiply it on
the left of the transposed original data set.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 52
The PCA Method
The matrix called RowFeaureVector has the
transposed eigenvectors in its columns.
The eigenvectors are now in the rows, with the most
significant one at the top.
The matrix called RowDataAdusted has the
transposed mean adjusted data in its columns.
The data items are in each column, each row holding a
separate dimension.
usted RowDataAdj Vector RowFeature Data Final = _
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 53
The PCA Method
This sudden transpose of all data is confusing,
but equations from now on are easier if the
transpose of the feature vector and the data is
taken first.
Better that having to always carry a little T symbol
above their names!
Final_Data is the final data set, with data items
in columns, and dimensions along rows.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 54
The PCA Method
The original data is now only given in terms of
the chosen vectors.
The original data set was written in terms of the x
and y axes.
The data can be expressed in terms of any axes,
but the expression is most efficient if these axes
are perpendicular.
This is why it was important that eigenvectors are
always perpendicular to each other.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 55
The PCA Method
So, the original data (expressed in terms of the x
and y axes ) is now expressed in terms of the
eigenvectors found.
If a reduced dimension is needed (throwing
some of the eigenvectors out), the new data will
be expressed in terms of the vectors that were
kept.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 56
The PCA Method
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 57
The PCA Method
Among all possible orthogonal transforms, PCA is
optimal in the following sense:
KLT completely decorrelates the signal; and
KLT maximally compacts the energy (in other words,
the information) contained in the signal.
But the PCA is computationally expensive and is
not supposed to be used carelessly.
Instead, one can use the Discrete Cosine Transform,
DCT, which approaches the KLT in this sense.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 58
The PCA Method Examples
Here, we switch to Matlab in order to run some
examples that (I sincerely hope) may clarify the
things to you:
Project the data into the principal component axis,
show the rank one approximation, and compress an
image by reducing the number of its coefficients
(PCA.m), pretty much as by using the DCT.
Show the difference between the least squares and
the PCA and do the alignment of 3D models using the
PCA properties (SVD.m).
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 59
The PCA Method Examples
Some things should be noticed about the power
of PCA to compress an image (as seen in the
PCA.m example).
The amount of memory required to store an
uncompressed image of size m n is M
image
=
m*n.
So, notice that the amount of memory we need to
store an image increases exponentially as its
dimensions get larger.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 60
The PCA Method Examples
But, the amount of memory required to store an
SVD image (also of size m n) approximation
using rank k is M
approx
= k(m + n + 1).
So, notice that the amount of memory required
increases linearly as the dimensions get larger, as
opposed to exponentially.
Thus, as the image gets larger, more memory is
saved by using SVD.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 61
The PCA Method Examples
Perform face recognition using the Principal
Component Analysis approach!
This is accomplished using a technique known in
the literature by the Eigenface Technique.
We will see an example of how to do it using a
well known Face Database called The AT & T
Faces Database.
Two Matlab functions: faceRecognitionExample.m
and loadFaceDatabase.m.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 62
What is the Eigenface Technique?
The idea is that face images can be economically
represented by their projection onto a small
number of basis images derived by finding the
most significant eigenvectors of the pixel wise
covariance matrix for a set of training images.
A lot of people like to play with this technique, but in
my tutorial I will simply show how to get some
eigenfaces and play with them in Matlab.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 63
AT&T Database of Faces
AT&T Database of Faces contains a set of face
images.
Database used in the context of a face recognition
project.
Ten different images of 40 distinct subjects taken
at different times (varying lighting, facial details
and expressions) and against a dark homogeneous
background with subjects in an upright, frontal
position (some side movement was tolerated).
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 64
AT&T Database of Faces
The images have a size of 92x112 pixels (in other
words, 10304 pixels) and 256 grey levels per pixel,
organized in 40 directories (one for each subject) and
each directory contains ten different images of a
subject.
Matlab can read PNG files and other formats
without help.
So, it is relatively easy to load all face database into
Matlabs workspace and process it.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 65
Getting The Faces Into One Big Matrix
First of all, we need to put all the faces of the
database in one huge matrix with a size of
112*92 = 10304 lines and 400 columns.
This step is done by the function called
loadFaceDatabase.m.
It reads a bunch of images, makes column vectors
out of each of one of them, put all together and
return the result.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 66
Getting the Recognition to Work
Here we change to Matlab directly, because the
steps we do to perform the face recognition task
are better explained seeing the function called
faceRecognitionExample.m.
All the steps necessary to perform this task are done
in this function and it is ready to be executed and
commented.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 67
Cases When PCA Fail (1)
PCA projects data onto a set of orthogonal
vectors (principle components).
This restricts the new input components to be a
linear combination of old ones.
However, there are cases where the intrinsic
freedom of data can not be expressed as a linear
combination of input components
In such cases PCA will overestimate the input
dimensionality.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 68
Cases When PCA Fail (1)
So, PCA does is not capable to find the non
linear intrinsic dimension of data (like the angle
between the two vectors in the example above).
Instead, it will find out two components with equal
importance.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 69
Cases When PCA Fail (2)
In cases when components with small variability
really matter, PCA will make mistakes due to its
unsupervised nature.
In such cases, if we only consider the projections of
two classes of data as input, they will become
indistinguishable.
2 de janeiro de 2012 JanKees van der Poel, D.Sc. Student, Mechanical Enginering 70
A
n
y
(
R
e
a
s
o
n
a
b
l
e
)
D
o
u
b
t
s
?