Principal Component Analysis

PRINCIPAL COMPONENT ANALYSIS
PRINCIPAL COMPONENT
ANALYSIS
2
An Example of PCA
• In a hydrologic application : you want to estimate

runoff at a particular location as a dependent
variable.
• And it is dependent on several independent variables,
let say it may depend on soil moisture, area of
catchment etc.
• You want to relate the runoff at a particular location
with all the independent variables.
• Let us say you had ten variables and y is a function of
x1, x2, .. X10.
• Each of the ten variables has 50 years of data and
monthly data is what you are considering. Therefore
you have 600 values. ( 12*50 = 600 )
Principal Component
Analysis
• PCA is a way of identifying patterns in the data; data
is expressed in such a way that the similarities and
differences are highlighted.
• Once the patterns are found in the data, it can be

compressed (reduce the number of dimensions)
without losing much information.
4
Matrix
Algebra
Eigenvectors and Eigenvalues:
• Let A be a square matrix. If  is a scalar and X
is a non–zero column vector satisfying
AX = X
X is an eigenvector of A;  is an eigenvalue of A.
• Eigenvectors are possible only for square

matrices.
• Eigenvectors of a matrix are orthogonal.
5
Matrix
Algebra
•  is an eigenvalue of an n x n matrix A, with
corresponding eigenvector X.
(A − I)X = 0, with X  0 leads to
A  I  0
• There are at most n distinct
eigenvalues of A.
6
Example – 1
Obtain the eigenvalues and eigenvectors for the
matrix,
1 2
A
2 
1 
The eigenvalues are obtained as
A  I 0
1  2
0
2 1 
7
Example – 1 (Contd.)
1  1    4  0
 2  2  3  0
Solving the equation,
  3, 1
The eigenvalues are 3 and -1 for matrix
A. The eigenvector is obtained by

 A  I  X  0
8
For 1 
3 1 2
A
 A  1 I  X1  2 
01
 2   x1 1

3 2    
1 3  1 y
0 
 2 2   x1
 
2   
    1y
2 0

2x1  2 y1 
0
2x1  2 y1  0 9
which has solution x1 = y1, x1 arbitrary.
eigenvectors corresponding to 1 = 3 are the  x1 ,
vectors with x1  0. 
 y1

e.g., if we take x1 = 2 then y1 = 2
2
The eigenvector is 2
 
10
For 1  1
 A  2 I  X 2 
0 2 2   x2
    
 2   2y

2
2x 02 y 
2 2
0
2x2  2 y2 
which0has solution x2 = -y2, x2 arbitrary.
eigenvectors corresponding to 2 = -1 are the
x2
vectors, with x2  0. 

y 2
 11
Principal Component
Analysis
Principal Component Analysis (PCA):
• Data on ʻ p ʼ variables; these variables may be
correlated.
• Correlation indicates information contained in one
variable is also contained in some of the other p-
1 variables.
• PCA transforms the ʻ p ʼ original correlated variables
into ʻ p ʼ uncorrelated components (also called as
orthogonal components or principal components)
• These components are linear functions of the
original variables.
12
Principal Component
Analysis
The transformation is written as
ZXA
where
X is nxp matrix of n observations on p variables
Z is nxp matrix of n values for each of p
components
A is pxp matrix of coefficients defining the linear
transformation
All X are assumed to be deviations from their
respective means, hence X is a matrix of deviations
from mean
13
Principal Component
Analysis
Steps for PCA:
• Start with the data for n observations on p variables.

• Form a matrix of size n x p with deviations from
mean for each of the variables.
• Calculate the covariance matrix (p x p).
• Calculate the eigenvalues and eigenvectors of the
covariance matrix.
• Choose principal components and form a feature
vector.
• Derive the new data set.
14
Principal Component
Analysis
The procedure is explained with a simple data set of
the yearly rainfall and the yearly runoff of a catchment
for 15 years.
Year 1 2 3 4 5 6 7 8 9 10
Rainfall
(cm) 105 115 103 94 95 104 120 121 127 79
Runoff
42 46 26 39 29 33 48 58 45 20
(cm)
Year 11 12 13 14 15
Rainfall Mean of Rainfall = 108.5 cm
(cm) 133 111 127 108 85 Mean of Runoff = 38.3 cm
Runoff
54 37 39 34 25
(cm)
15
Principal Component
Analysis
Step 2: Form a matrix with deviations from mean
Original matrix Matrix with deviations from
mean 
105 42   1.3

115
3.4 
 8.7 7.4 
  
46
103  3.3

 
 0.4 
2694
 12.3
12.6
 X  11.3 
9.3

39
 95    
29 
   2.3
 
104 19.4

33
127
 
5.6
  

  27.
20.7 18.6
6.4

45
120 
 79 3 13.7  16
20 
Principal Component
Analysis
Step 3: Calculate the covariance matrix
n
 xi  x  yi  y
cov( X ,Y )   i1
s X ,Y n
1
cov( X ,Y ) 
 cov( X , X cov(Y ,Y )  216.67
141.35 141.35

) cov(Y , X   133.38
) 
17
Principal Component
Analysis
Step 4: Calculate the eigenvalues and eigenvectors of
the covariance matrix
216.67 141.35
A
141.35 
133.38
Eigenvalues: Eigenvectors:
A  I   A  I  X 
0 322.4 and
1 = 00.801 0.599
X  
2 = 27.7  0.599 0.801 


18
Principal Component
Analysis
Step 5: Choose components and form a feature vector
The fraction of the total variance accounted for by the
jth principal component is
j
Trace S 
where
Trace S     j
Trace (S) = 322.4 + 27.7 = 350.1

19
Principal Component
Analysis
The total variance accounted for by the first principal
component is
1 322.3
 
350.1
0.92
Trace S
i.e., 92% of total system variance is represented by the
 principal component and the remaining 8% is
first
represented by the second component.
Hence the second principal component can be

neglected and only the first one considered.
20
Principal Component
Analysis
From the two eigenvectors, the feature vector is
selected
A  0.801
 0.599
  
21
Principal Component
Analysis
Step 6: Derive the new data set
ZXA
 1.3
 0.995
  
3.48.7
 7.4 11.39

 
10.2
 3.3

 12.3 0.4
12.6 
 

  9.61
14.8
11.3 9.3 0.801
 0.599   

     
 
 2.3 5.6  5.20

 13.7  
16.60
9.4 19.4


   
 14.7
 27.
20.7 18.6
6.4 
23.39
33.0
3  22
 
Principal Component
Analysis
Using both the eigenvalues, the new data set is
 1.3
  0.995 3.510
  

3.48.7 7.4 
 11.39 0.716

 
 3.3 10.2

 12.3 0.4 
 9.61 7.687
12.6 8.11
14.8 
 
11.3 9.3  0.801 0.599 
0.92
   0.599 0.801   
 
 2.3 5.6  5.20 3.11
  
 13.7  16.60 0.68
9.4 19.4 6.732
23.39

   
 
 14.7 7.27 
 27.
20.7 18.6
6.4  33.
20.41 1.455
3  0 
 23
THANK YOU

Principal Component Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Principal Component Analysis

Uploaded by

Copyright:

Available Formats

PRINCIPAL COMPONENT ANALYSIS

• In a hydrologic application : you want to estimate

• Once the patterns are found in the data, it can be

• Eigenvectors are possible only for square

(A − I)X = 0, with X  0 leads to

The eigenvalues are 3 and -1 for matrix

A. The eigenvector is obtained by

• Start with the data for n observations on p variables.

Trace (S) = 322.4 + 27.7 = 350.1

Hence the second principal component can be

You might also like