You are on page 1of 18

Principal component

Analysis(PCA)
What is it?
• It is a dimensionality reduction method.
• It analyze the data and generate the most important part of it.
• It is used :
• to analyze all the dimensions of a given data and
• Reduce the dimensions as much as possible without incurring a
significant information loss.
• E.g. In machine learning an nD (n>3) feature vector can be
transformed into a 2D or 3D feature vector.
Why do we use it?
• In Machine learning:
• As unsupervised learning:
 for data analysis ,
 for data compression, i.e., to compress information,
 to group data into dependent and independent variables.
• As Preprocessing for both supervised and unsupervised learning:
 in data cleaning, data de-noising and preparing a good feature vector for training.
• Used as a preprocessing in:
 face recognition,
 Image classification,
 pattern recognition,
 Clustering, etc
Steps in PCA
1. Standardizing the range of values in the data or feature vector.
2. Compute the Covariance matrix to identify correlations.
3. Compute the eigenvalues and eigenvectors of the covariance matrix to identify the principal
components.
4. Create a feature vector to decide the principal components.
5. Recast the data along the principal components axes.

Compute Create a
Standardizing Compute the Recast the
eigenvalues feature
the range of Covariance data
and vector
values matrix
eigenvectors
Standardizing
• The range of variables is calculated and standardized
 To analyze the contribution of each variable equally.
Let
 𝑥 is a value in a given input data or feature vector.
 𝑥ҧ is the mean of the input data or feature vector.
𝑥−𝑥ҧ
 𝜎(𝑥) is the standard deviation of the input data given by , where 𝑛 is the size
𝑛
or number of values(terms) in the input data.
Then, the standardization or transformation is done using the z-score:
𝑥 − 𝑥ҧ
𝑧=
𝜎(𝑥)
The standardized data will have a zero mean and a 1 standard deviation.
Example
• Let us assume a given dataset has four features(F1,F2,F3, and F4) with
the following data collected.
F1 F2 F3 F4
1 5 3 1
4 2 6 3
1 4 3 2
4 4 1 1
5 5 2 3

• Then, standardize the data.


Example(cont’ed)
First calculate the mean and standard deviation of each feature.
F1 F2 F3 F4
mean 3 4 3 2
Std. 1.87 1.223 1.87 1
Standardization using z-score.
F1 F2 F3 F4
-1.0695 0.8196 0 -1
0.5347 -1.6393 1.6042 1
-1.0695 0 0 0
0.5347 0 -1.0695 -1
1.0695 0.8196 -0.5347 1
Covariance matrix computation
• Is used to the variation in the given dataset.
• For a n-feature data or feature set the covariance vector is an nxn matrix.
• E.g. for a two variables(X and Y) data ,
𝑐𝑜𝑣(𝑋, 𝑋) 𝑐𝑜𝑣(𝑋, 𝑌)
• The covariance matrix = ,
𝑐𝑜𝑣(𝑌, 𝑋) 𝑐𝑜𝑣(𝑌, 𝑌)
σ𝑀 ത ത
𝑖=1(𝑋𝑖 −𝑋)(𝑌𝑖 −𝑌)
• Where 𝑐𝑜𝑣 𝑋, 𝑌 = , M is the data point of the length of
𝑀
the variables.
• Note: 𝑐𝑜𝑣 𝑋, 𝑋 = 𝑣𝑎𝑟(𝑋) and 𝑐𝑜𝑣 𝑋, 𝑌 = 𝑐𝑜𝑣 𝑌, 𝑋 , hence it is
symmetric.
Covariance matrix computation (cont’ed)
• If 𝑐𝑜𝑣 𝑋, 𝑌 = 0, they are not correlated.
• Else if 𝑐𝑜𝑣 𝑋, 𝑌 > 0, they are correlated and the magnitude reflect
the degree of their correlation.
• Else if 𝑐𝑜𝑣 𝑋, 𝑌 < 0, they are inversely correlated.
• Note: if they are correlated, they are not unique.
Example
• The covariance matrix for the previous data.
• Note that due to standardization the mean and std. of each feature are 0 and 1
respectively.
• Accordingly, the covariance matrix is
F1 F2 F3 F4
F1 0.78 -0.8586 -0.055 0.424
F2 -0.8586 0.78 -0.607 -0.326
F3 -0.055 -0.607 0.78 0.426
F4 0.424 -0.326 0.426 0.78
( −1.0695−0 2 + 0.5347−0 2 + −1.0695−0 2 + 0.5347−0 2 + 1.0695−0 2 ))
• 𝑣𝑎𝑟 𝐹1 = = 0.78
5
• 𝑐𝑜𝑣 𝐹1, 𝐹2 =
( −1.0695−0 0.8196−0 +( 0.5347−0 −1.6393−0 +( −1.0695−0 0−0 +( 0.5347−0 0−0 +( 1.0695−0 (0.8196−0)
5
= 0.8586
Etc.
Eigenvalue and eigenvector
• Each eigenvalue carries part of the information, variation or energy in
a given data.
• The eigenvector shows the direction of the information, variation or
energy.
• Let 𝐴 is a square matrix, e.g. the Covariance matrix.
 A nonzero vector 𝑣 is an eigenvector of 𝐴 is 𝐴𝑣 = 𝜆𝑣,
where 𝜆 is the corresponding eigenvalue.
Note: 𝐴𝑣 − 𝜆𝑣=0, where 0 is a zero vector.
since 𝑣 is non zero vector => 𝑑𝑒𝑡 𝐴 − 𝜆𝐼 = 0.
Eigenvalue and eigenvector (cont’ed)
• Steps
Determine the eigenvalue of the covariance matrix.
Sort the eigenvalues in descending order.
Pick the first k eigenvalues and determine the corresponding
eigenvectors.
 The k eigenvectors are the principal components.
They represent the direction of maximum variation or information in the
data.
Example
• Find the two largest eigenvalues of the previous examples and
determine the corresponding eigenvectors.
0.78 − 𝜆 −0.8586 −0.055 0.424
• det −0.8586 0.78 − 𝜆 −0.607 −0.326 =0
−0.055 −0.607 0.78 − 𝜆 0.426
0.424 −0.326 0.426 0.78 − 𝜆
• It gives us,
• 𝜆1 = 2.11691, 𝜆2 = 0.855413, 𝜆3 = 0.481689, 𝜆4 = 0.334007
• The two largest eigenvalues are 𝜆1 = 2.11691, 𝜆2 = 0.855413.
Example(cont’ed)
• The eigenvectors of the two eigenvalues.
1. For 𝜆1 = 2.11691, 𝐴𝑣1 − 𝜆1𝑣1 = 0 ⇒ 𝐴 − 𝜆1 𝑣1 = 0
2. For 𝜆2 = 0.855413, 𝐴𝑣2 − 𝜆2𝑣2 = 0 ⇒ 𝐴 − 𝜆2 𝑣2 = 0
0.515514 −0.623014
Then, 𝑣1 = −0.616625 and 𝑣2 = 0.113105
0.399314 0.744256
0.441098 0.212477
Example
• The PCA feature vectors
v1 v1
0.515514 -0.623014
-0.616625 0.113105
0.399314 0.744256
0.441098 0.212477
Recast the data along the principal
component axis
• This step aims at the reorientation of the data from their original axis
to the ones you have calculated from the principal components.
• The reduction of dimension.
• It is done by the following formula:
• Final data set= standardized original datatset x PCA feature vector.
Example
• Recast the data.
• Final dataset=standardized X PCA feature vector.

F1 F2 F3 F4 v1 v1
-1.0695 0.8196 0 -1 0.515514 -0.623014
0.5347 -1.6393 1.6042 1
-0.616625 0.113105
-1.0695 0 0 0
0.399314 0.744256
0.5347 0 -1.0695 -1
1.0695 0.8196 -0.5347 1 0.441098 0.212477
Example

Final data note


Fpca1 Fpca2 • From 4 features to 2 features.

You might also like