Lecture PCA

Principal Component
Analysis
1
Contents
ØImportance of PCA
ØDimensionality Reduction
ØLinear Transformation
ØPCA Steps
Dr.- Ing. Maggie Mashaly
Importance of PCA
ML 3
Correlation
𝑥" ++ +++
𝑥" + +++ ++ + +
+ + + ++++
++++ +++ +
+ ++++ + + +++ +++ +
+ + + +++
+ + ++ + +++++++
+ + ++ + + + + +
++ +
+ + + + ++++++
++ + + + +++ + + +
+ + ++ ++ + +
+++
+ +
+ ++ +
+ + +++ +
++ + +++++ +
+ + + ++ + +
++++ ++
+
𝑥! 𝑥!
Uncorrelated Correlated
Main Questions
› How to remove correlation?
› What are the dimensions with the most

information?
› How can we provide a better scaling or

normalization?
Principal Component Analysis
To identify the principal directions in

which the data varies
qBetter data representation
ØReduce the number of features
ØReduce the relation between features

qEliminate non principal data
qCan be used for compression applications
Principal Component Analysis
2D Example
Projection of the
data along the Variation in the data along
Principal direction direction is low
𝑥" +
+
+
+ +
+ +
+
Principal direction of
++ + data variation
Variation in the data along
+ direction is very high
+
+ +
+ 𝑤!
𝑥!
Dimensionality Reduction
ML 8
Dimensionality Reduction
We are interested to find a linear
transformation from an m
dimension to n dimension (reduce
the number of features from m to n
𝑥" +
+ + +
+ +
+ +
+ + + + + + + + 𝑤!
𝑥!
2D 1D
Linear Transformation
! ! ! !
𝑥! 𝑥" 𝑥# … 𝑥$ 𝑤!,! 𝑤!," 𝑤!,# … 𝑤!,'
𝑥!
"
𝑥"
" "
𝑥# … 𝑥$
" 𝑤",! 𝑤"," 𝑤",# … 𝑤",'
𝑥= 𝑊= 𝑤#,! 𝑤#," 𝑤#,# … 𝑤#,'
𝑥!# 𝑥"# 𝑥## … 𝑥$# …………….
……………. 𝑤$,! 𝑤$," 𝑤$," … 𝑤$,'
% % % %
𝑥! 𝑥" 𝑥# … 𝑥$
Transformation matrix
Data Matrix
nxk nxm mxk

Transform
into nxk
𝑍 = 𝑥𝑊
K features M features
N data points N data points
Example
0 1 1
𝑤= 1
𝑥= 3 2 𝑍= 5
1
−1 −2 −3
3 1
2
0.5
1
0 0
-1
-0.5
-2
-3 -1
-4 -2 0 2 4 -6 -4 -2 0 2 4 6
Finding the right Transformation
› Let us first forget about reducing the dimensions and
focus on reducing the relation between features
› Relation between features is described by the covariance
matrix
› 𝐶𝑜𝑣 𝑥, 𝑥 = 𝑥 # 𝑥
› The best transformation would give no relation between

any two features
– (this means that the covariance matrix an identity matrix )
– 𝐶𝑜𝑣 𝑧, 𝑧 = 𝑧 ( 𝑧 = 𝐼
nxm nxk
Correlated 𝑥 𝑧 Uncorrelated
M features 𝑧(𝑧 = 𝐼 K features
𝑍 = 𝑥𝑊
[kxm] [mxn] [nxm] [mxk]

# # #
𝑧 𝑧 = 𝑤 𝑥 𝑥𝑤 = 𝐼
› 𝑐𝑜𝑣 𝑥, 𝑥 𝑈 = 𝑈𝜆
› Where 𝑈 is the eigen vector and 𝜆 is the eigen value
› 𝑈 # 𝑐𝑜𝑣 𝑥, 𝑥 𝑈 = 𝜆
! ! #
› 𝑈 # 𝑐𝑜𝑣 𝑥, 𝑥 𝑈 = 𝜆 𝐼 𝐼 𝜆
" "
!# !
› 𝜆 𝑈 # 𝑐𝑜𝑣 𝑥, 𝑥 𝑈𝜆 = 𝐼
" "
! # !
$ $
› 𝜆 𝑈 " 𝑐𝑜𝑣 𝑥, 𝑥 𝑈𝜆 " =𝐼
Singular Value Decomposition
› 𝑐𝑜𝑣 𝑥, 𝑥 𝑈 = 𝑈𝜆
› 𝑐𝑜𝑣 𝑥, 𝑥 = 𝑈𝜆𝑈 #
› SVD(𝑐𝑜𝑣 𝑥, 𝑥 ) = 𝑈𝑆𝑉 #
› Because of the characteristics of covariance matrix
› 𝑈 = 𝑉#
PCA Steps
– Centering each feature around the feature average
𝑥! = 𝑥! − 𝐸(𝑥! )
Normalize the data (optional)
𝑥! − 𝐸(𝑥! )
𝑥! =
𝜎!
Calculate the covariance matrix of x
1 "
𝑐𝑜𝑣 𝑥, 𝑥 = 𝑥𝑥
𝑛
Use singular value decomposition to calculate eigen vectors matrix U
If reduction is to be applied pick the first k columns from the u matrix to
obtain 𝑈#$%&'$%
Obtain the new transformed data by
"
𝑧 = 𝑈#$%&'$% 𝑥
Note in doing so we ignored the eigen values which represent the
significance of each dimension
Obtaining the Original Dimension
Approximation
› 𝑧 = 𝑈%&'()&'
#
𝑥
› 𝑥*++%,- = 𝑈%&'()&' 𝑧
› In this case the approximation error can be calculated as
! "
/
› ∑1/0! 𝑥 /
− 𝑥*++%,-
.
› Percentage error
"
1 + )
∑)*! 𝑥 ) − 𝑥,--./0
𝑚
+ ) "
∑)*! 𝑥
We are interested in a very small percentage of error
Obtaining the Original Dimension
Approximation
& 𝑥'((!)* = 𝑈!"#$%"# 𝑧

𝑧= 𝑈!"#$%"# 𝑥
How to Choose K?
› Iteratively increase k from 1 to n and calculate the error
› Pick the k that ensures an error that is smaller that
predefined value
› Luckily the eigen values returned for the case of the
covariance matrix provides the contribution of each
feature and hence it provides the same value as
calculating the error

Thank You
ML 20

Lecture PCA

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture PCA

Uploaded by

Copyright:

Available Formats

Principal Component

› What are the dimensions with the most

› How can we provide a better scaling or

To identify the principal directions in

ØReduce the relation between features

nxk nxm mxk

› The best transformation would give no relation between

[kxm] [mxn] [nxm] [mxk]

& 𝑥'((!)* = 𝑈!"#$%"# 𝑧

calculating the error

You might also like