Dimension Reduction: DSCI 5240 Data Mining and Machine Learning For Business

DSCI 5240
Dimension Reduction
DSCI 5240 Data Mining and Machine Learning for Business
Javier Rubio-Herrero
DSCI 5240
Dimension Reduction
2
DSCI 5240
Dimensionality Reduction
• Dimensionality reduction is a set of
techniques used to reduce the amount of
data necessary for prediction while still
providing accurate models Airplane
Deicing
• Many data mining techniques are not Snowplow Costs
Costs
effective for high-dimensional data
• Two high level approaches
Heat
• Feature selection - selectively stroke
exclude dimensions from
consideration
• Feature extraction - mathematically
combine dimensions to produce
intrinsic/latent dimensions
Temperature?
3
DSCI 5240
Main Approaches
• Feature Selection
• Manual Feature Selection – The data analyst can examine the available
dimensions and exclude those they feel are not useful for modeling purposes
• Feature Selection based on Objective Function – A modeling approach is
used to identify the features that appear to have the most influence on the
dependent variable
• Feature Extraction – Maps high dimensional data onto a lower dimensional

subspace (i.e., combines variables)
4
DSCI 5240
The Curse of Dimensionality
(Bellman 1961)
• The curse of dimensionality is based
𝑛
2 >𝑛1
Classifier Performance
on the fact that statistical methods
count observations that occur in a
given space
• As dimensions increase
• The data needed to make accurate
inferences grows exponentially
𝑛2
• The observations become sparser
(more spread out)
𝑛1
• Model performance often suffers as
dimensionality increases
Dimensionality
5
DSCI 5240
Goal
• Simplify the data by removing unnecessary dimensions (noise)

• Improve speed of learning
• Improve predictive accuracy
6
DSCI 5240
Applications
• Dimension reduction is useful anytime there is high-dimension data
that may be simplified
• Text mining
• Image retrieval
• Intelligent character recognition
• Facial recognition
• Dimension reduction is used in the business world when producing
models with large datasets
• Customers
• Products
• Etc.
7
DSCI 5240
Principal Components Analysis
8
DSCI 5240
Dimension Reduction and PCA

• Principal Components Analysis (PCA) is a feature extraction method which takes a
classical linear approach to dimension reduction
• PCA projects high-dimensional data onto a lower dimensional sub-space using linear
transformation
• All dimension reduction techniques involve some degree of information loss
• Goal of PCA is to reduce dimensionality while retaining as much information
(variation) as possible in the dataset
𝑎1 𝑏1
𝑎
…
𝑎𝑁
[] ′
𝑿 = 2 ⇒ 𝑃𝐶𝐴 ⇒ 𝑿 =
𝑤h𝑒𝑟𝑒 𝐾 ≪ 𝑁

[]
𝑏2
…
𝑏𝐾
9
DSCI 5240
PCA Requirements
1. PCA only involves predictors, not target
variables.
2. PCA can only be performed on dimensions
which are numeric in nature
10
DSCI 5240
11
DSCI 5240
Some Matrix Terminology

• Matrix: a rectangular array of rows and columns
• If the number of rows is equal to the number of columns, the matrix is square
• Principal: Diagonal from upper-left to lower right of a matrix
• Principal elements: Elements of the diagonal
• Trace: Sum of the principal elements
Principal (Diagonal)
5 2 4
[ −3
3
6
−3
2
1 ] 𝑡𝑟𝑎𝑐𝑒
=( 5+6+1 )= 12
Useful reference for linear algebra: Lay, D. C. (2003). Linear Algebra and its Applications 4th edition.
12
DSCI 5240
More Matrix Terminology

• Diagonal matrix: Matrix • Identity/Unity matrix:
in which all non-diagonal Scalar matrix in which all
elements are zero diagonal elements equal 1
5 0 0 1 0 0
[ ]

[ 0
0
6
0
0
1 ] 𝐼= 0
0
1
0
0
1
• Scalar matrix: Diagonal • Transpose matrix AT of matrix

matrix in which all diagonal A obtained by converting rows to
elements are equal columns and columns to rows
𝑘 0 0 5 4
[ ]

𝑘𝐼 = 0
0 [ 𝑘
0
0
𝑘 ] 𝑨= 2
3 1
𝑻
3 ⇒𝑨 =
5
4 [ 2
3
3
1 ]
13
DSCI 5240
Matrix Operations
3 6 −6 1
[ ]

𝑨= 5
−2 [ ]8
9
𝑩= 0
8
9
3
Addition: Subtraction:
𝐙 = 𝐀+ 𝐁 ⟺ 𝑧 𝑖 , 𝑗 =𝑎 𝑖, 𝑗 +𝑏 𝑖, 𝑗
𝐙 = 𝐀 − 𝐁 ⟺ 𝑧 𝑖 , 𝑗=𝑎 𝑖 , 𝑗 −𝑏 𝑖 , 𝑗

3 6 −6 1 3 6 −6 1
𝑨 + 𝑩= 5
[
−2 9][
8 + 0
8
9
3 ] 𝑨 − 𝑩= 5
−2 [ ][
8 − 0
9 8
9
3 ]
−3 7 9 5
¿ 5
6[ ]
17
12
¿ 5
[
− 10 ]
−1
6
14
DSCI 5240
Matrix Operations
• Multiplication: where is a scalar
𝒁 = 𝑨 ∗ 𝑏 ⟺ 𝑧 𝑖𝑗 =𝑎𝑖 1 ∗ 𝑏+ 𝑎𝑖 2 ∗ 𝑏+… 𝑎𝑖𝑚 ∗𝑏

2 6

[ ]𝑨=
0 5𝑏=3
2 6 6 18

𝑨 ∗ 𝑏=
[ 0 5 ] [ 0 15 ]
∗ 3=
2 ∗3=6
6 ∗3=18

0 ∗3=0
5 ∗3=15

15
DSCI 5240
Matrix Operations
•Multiplication:
defined if number of columns in equals the number of rows in
𝒁 = 𝑨 ∗ 𝑩 ⟺ 𝑧 𝑖 , 𝑗=𝑎𝑖 ,1 ∗ 𝑏1 , 𝑗 +𝑎 𝑖, 2 ∗ 𝑏2 , 𝑗 +𝑎𝑖 , 3 ∗ 𝑏3 , 𝑗 +… 𝑎 𝑖 ,𝑚 ∗ 𝑏 𝑛 , 𝑗
4 1 9
[ ] [ ] 2 9

6 2 8 𝑩= 5
𝑨= 12
7 3 5 8 10
11 10 12
4 1 9 85 138
𝑨 ∗ 𝑩= 6
7
11
[ 2
3
10
8 ∗
5
12
2
5
8 ][ ][ ] 9
12
10
= 86
69
168
158
149
339

¿ +1∗ 5 +9 ∗ 8¿ ¿ 85
+1∗ 12 +9 ∗10 ¿ ¿ 138

¿ 16
DSCI 5240
Determinant
• Determinant: A function that associates a scalar to a square matrix
• Singularity
• A matrix with a nonzero determinant is called non-singular (it has an inverse)
• Conversely, a matrix with a zero determinant is called singular (it has no inverse)
• The determinant of matrix A is denoted |A| or det(A)
• Laplace’s Formula
𝑛 𝑛
𝑖+ 𝑗
| 𝑨|=∑ 𝑎𝑖𝑗 𝐶𝑖𝑗 =∑ 𝑎𝑖 , 𝑗 (− 1) 𝑀 𝑖 , 𝑗
𝑗 =1 𝑖=1
• Where
• : the i, j minor matrix of . Obtained by removing row i and column j
• : the scalar cofactor of
17
DSCI 5240
Determinant Example
𝑛 𝑛
| 𝑨|=∑ 𝑎𝑖𝑗 𝐶𝑖𝑗 =∑ 𝑎𝑖 , 𝑗 (− 1)𝑖+ 𝑗 𝑀 𝑖 , 𝑗
𝑗 =1 𝑖=1
2 x 2 Matrix 3 x 3 Matrix
𝑨= 1 3 5 1 2 3

[ 2 4 ]
𝑩= 4
[ 5 6
]
7 8 9
5 6 4 6 4 5
| 𝑨|= 𝑎1,1 𝑎2,2 − 𝑎1,2 𝑎2,1
| |=¿ 1
𝑩

∙ |
8 | −2 |
9

∙
7 |
∙
9 +3 7 | 8|
|𝑩|=1 ∙ ( ( 5 ∗ 9 ) −(8 ∗ 6) )

−2 ∙ ( ( 4 ∗ 9 ) −(7 ∗ 6) )
| 𝑨|= (13 )( 4 ) − ( 5 ) ( 2 ) =42

3 ∙ ( ( 4 ∗8 ) − ( 7 ∗5 ) ) =0
18
DSCI 5240
PCA Methodology
• Step 1: Calculate the mean of each dimension (variable)
• Step 2: Calculate the variance/covariance matrix
a. Calculate the variance of each attribute
b. Calculate the covariance of the attributes
c. Construct the matrix
• Step 3: Compute the Eigenvalues of the covariance matrix and order them from
largest to smallest
• Step 4: Compute the Eigenvectors of the covariance matrix for the associated with the
Eigenvalues
• Step 5: Keep terms corresponding to the K largest Eigenvalues
Useful reference for PCA: Alpaydin, E. (2020). Introduction to machine learning. MIT press.
19
DSCI 5240
Example Data
3.5
2.500 2.400 3.0

0.500 0.700
2.5
2.200 2.900
2.0 x
1.900 2.200
x2
1.5
3.100 3.000
2.300 2.700 1.0
2.000 1.600 0.5 x

1.000 1.100
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
1.500 1.600
x1
1.100 0.900
20
DSCI 5240
Example Data
2.500 2.400
0.500 0.700
2.200 2.900
1.900 2.200
3.100 3.000
2.300 2.700
2.000 1.600
1.000 1.100
1.500 1.600
1.100 0.900
21
DSCI 5240
Step 1: Calculate Means
𝑛

∑ 𝑥1 𝑖
2.500 2.400 𝑥
´1 = 𝑖=1
0.500 0.700 𝑛
2.200 2.900 18.1
1.900 2.200 𝑥´1 = =1.810
3.100 3.000
10
2.300 2.700 𝑛

2.000 1.600
∑ 𝑥2 𝑖
𝑖=1
𝑥
´2 =
1.000 1.100 𝑛
1.500 1.600 19.1
1.100 0.900 𝑥´2 = =1.910
Sum: 18.100 19.100 10
22
DSCI 5240
Step 2a: Calculate Variances
𝑛

(1) (2) (3) (4) (5) (6)
∑ ( 𝑥1 𝑖 − 𝑥´1 )2
2.500 2.400 0.690 0.490 0.476 0.240
2.500 2.400 0.690 0.490 0.476 0.240 𝑠 2𝑥 = 𝑖=1
0.500 0.700 -1.310 -1.210 1.716 1.464
1
( 𝑛− 1)
0.500 0.700 -1.310 -1.210 1.716 1.464
2.200 2.900 0.390 0.990 0.152 0.980
2.200 2.900 0.390 0.990 0.152 0.980 2 5.549
1.900
1.900 2.200
2.200 0.090
0.090 0.290
0.290 0.008
0.008 0.084
0.084 𝑠 =
𝑥1 =0.617
3.100
3.100
3.000
3.000
1.290
1.290
1.090
1.090
1.664
1.664
1.188
1.188
9
2.300 2.700 0.490 0.790 0.240 0.624 𝑛
2.300 2.700 0.490 0.790 0.240 0.624
2.000 1.600 0.190 -0.310 0.036 0.096
∑ ( 𝑥2 𝑖 − 𝑥´2 )2
2.000 1.600 0.190 -0.310 0.036 0.096 2 𝑖=1
𝑠𝑥 =
1.000
1.000
1.100
1.100
-0.810
-0.810
-0.810
-0.810
0.656
0.656
0.656
0.656
2
(𝑛− 1)
1.500
1.500
1.600
1.600
-0.310
-0.310
-0.310
-0.310
0.096
0.096
0.096
0.096 2 6.449
1.100 0.900 -0.710 -1.010 0.504 1.020 𝑠 =
𝑥2 =0.717
1.100 0.900 -0.710 -1.010
Sum:
0.504
5.549
1.020
6.449 9
Sum: 5.549 6.449
23
DSCI 5240
Step 2b: Calculate Covariance
(1) (2) (3) (4) (5)

2.500 2.400 0.690 0.490 0.338
0.338
2.500 2.400 0.690 0.490
0.500 0.700 -1.310 -1.210 1.585
1.585
0.500 0.700 -1.310 -1.210
2.200 2.900 0.390 0.990 0.386
0.386 𝑛
2.200 2.900 0.390 0.990
1.900
1.900 2.200
2.200 0.090
0.090 0.290
0.290
0.026
0.026 𝐶𝑂𝑉 𝑥1 𝑥 2 =∑ ¿ ¿ ¿
3.100 3.000 1.290 1.090 1.406 𝑖=1
3.100 3.000 1.290 1.090 1.406
2.300 2.700 0.490 0.790 0.387
2.300 2.700 0.490 0.790 0.387
5.539
2.000
2.000
1.600
1.600
0.190
0.190
-0.310
-0.310
-0.059
-0.059 𝐶𝑂𝑉 𝑥 𝑥 = =0.615
1.000 1.100 -0.810 -0.810 0.656
1 2
9
1.000 1.100 -0.810 -0.810 0.656
1.500 1.600 -0.310 -0.310 0.096
1.500 1.600 -0.310 -0.310 0.096
1.100 0.900 -0.710 -1.010 0.717
1.100 0.900 -0.710 -1.010 0.717 Note: COVx1x2=COVx2x1
Sum: 5.539
Sum: 5.539
24
DSCI 5240
Step 2c: Construct Variance/Covariance
Matrix
Variance of
0.617 0.615
0.617 0.615
0.615 0.717
0.615 0.717
Variance of
Covariance of and
25
DSCI 5240
Step 3: Compute Eigenvalues
Eigenvalue is a measure of the variation within the data along a particular path
• An
(Eigenvector)
• Given
• : Scalar (this is what we are solving for… our eigenvalues)
• : Identity Matrix
• : Non-singular matrix (our variance/covariance matrix)
• For what value of is ? In other words: for what value of does the matrix not have an
inverse?
0 .617 0.615 1 0
{1}

det ( 𝑨 − 𝜆 𝑰 )=
|[ 0.615 0.717 ] [
−𝜆
0 1 ]|
=0
0 .617 0.615 𝜆 0
{2}

¿
|[ 0.615 0.717 ] [
−
0 ]|
𝜆
=0
26
DSCI 5240
¿ 0 .617 − 𝜆 0.615
{1}

|[
0.615 0.717 − 𝜆 ]|
=0
2
{2} ¿ ( 0.617 − λ )( 0.717 − 𝜆 ) −0.615 =0
Characteristic Equation
{3} ¿ ( 0.442 ) − ( 0.617 𝜆 ) − ( 0.717 𝜆 ) + 𝜆2 −0.6152 =0

{4} ¿ 𝜆2 −1.333 𝜆+0.063=0

27
DSCI 5240

2
{1} ¿ 1 𝜆 − 1.333 𝜆 +0.063=0

2
− 𝑏 ± √ 𝑏 − 4 𝑎𝑐

𝑎=1
{2}
2𝑎
=0 𝑏=−1.333

𝑐=0.063

2

− ( −1.333 ) + √ (− 1.333) − 4(1)(0.063)
{3} 𝜆1 = =1.284
2(1)

− ( −1.333 ) − √ (− 1.333)2 − 4(1)(0.063)
{4} 𝜆2 = =0.050
2(1)
28
DSCI 5240
Observation
0 .617 0.615

𝑨=
[ 0.615 0.717 ]

𝑡𝑟𝑎𝑐𝑒 ( 𝑨 ) =( 0.617+0.717 )=1.334
(𝜆 ¿ ¿ 1+ 𝜆2 )=( 1.284 +0.050 )=1.334 ¿

You can write Eigenvalues as a percentage of trace(A)

𝜆1 1.284
∗100= ∗ 100=96.25 %
𝑡𝑟𝑎𝑐𝑒 ( 𝑨 ) 1.334
𝜆2 0.050
∗100= ∗ 100=3.75 %
𝑡𝑟𝑎𝑐𝑒 ( 𝑨 ) 1.334
The largest Eigenvalue () is referred to as the principal Eigenvalue
29
DSCI 5240
Another Observation
The product of the Eigenvalues is the determinant of the matrix
0 .617 0.615

[
𝑨=
0.615 0.717 ]
𝑑𝑒𝑡 ( 𝑨 )= ( 0.617 ∗0.717 ) −(0.615 ∗0.615)=0.064

(𝜆 ¿ ¿ 1 ∗ 𝜆 2)= ( 1.284 ∗0.050 )=0.064 ¿

30
DSCI 5240
Step 4: Calculate Eigenvectors

• An Eigenvector is the magnitude and direction of a path through the data
• For each Eigenvalue , there is a set of associated Eigenvectors
• The number of Eigenvectors in the set is infinite
• Eigenvectors corresponding to different Eigenvalues are linearly independent
• Given , find a vector such that
0 .617 0.615 − 𝜆 1 0
{1} ([ 0.615 0.717 ]𝑖
0 [ 1 ]) ⃗𝑍 = 0
0 .617 0.615 − 𝜆 𝑖 0
{2} ([ 0.615 0.717 ] [
0 𝜆𝑖 ]) 𝑍=0
⃗
0 .617 − 𝜆 𝑖 0.615
{3} ([ 0.615 0.717 − 𝜆 𝑖 ]) 𝑍 =0
⃗
31
DSCI 5240
Consider
•
0 .617 − 1.284 0.615

{1} ([ 0.615 0.717 − 1.284 ]) ⃗𝑍 = 0
− 0.667 0.615 𝑧1 0
{2} ¿
([ 0.615 − 0.567 ])[ ] [ ]
𝑧2
=
0
{3} −0.667
𝑧 1 +0.615 𝑧2 =0
These are linearly dependent ({4} is {3}*(-0.922))
{4} 0.615
𝑧 1+(−0.567) 𝑧 2 =0
0.922 1.844 2.766 …
{5} 𝑧 1=0.922 𝑧2 1 2 3 …
1 2 3 …
32
DSCI 5240
Consider
•
0 .617 − 0.050 0.615

{1} ([ 0.615 0.717 − 0.050 ]) ⃗𝑍 =0
0.567 0.615 𝑧1 0
{2} ¿
([ 0.615 0.667 ]) [ ] [ ]
𝑧2
=
0
{3} 0.567
𝑧 1+0.615 𝑧 2= 0
{4} 0.615
𝑧 1+0.667 𝑧 2= 0
-1.085 -2.170 -3.255 …
{5} 𝑧 1=−1.085 𝑧2 1 2 3 …
1 2 3 …
33
DSCI 5240
Normalizing Eigenvectors
• Dealing with infinite sets is difficult [ 𝑧1 , 𝑧2 ]
[ 𝑧 1 𝑠 , 𝑧 2 𝑠 ]= 2 2
• Statistical packages typically normalize Eigenvectors √ 𝑧1 + 𝑧2
{1}
𝑧 1 𝑠=
𝑧1
=
0.922
=0.678
2 2 2 2
0.922 1.844 2.766 … √ 𝑧 +𝑧
1 2 √ 0.922 +1
𝜆1 𝑧2
1 2 3 … 1
1 2 3 … 𝑧 2 𝑠= 2 2
= 2 2
=0.735
√ 𝑧 +𝑧
1 2 √ 0.922 +1
{1} 𝑧1 −1.085
-1.085 -2.170 -3.255 … 𝑧 1 𝑠= 2 2
= 2 2
=−0.735
𝜆2
√ 𝑧 +𝑧
1 2 √− 1.085 +1
1 2 3 … 𝑥2
1 2 3 … 1
𝑧 2 𝑠= 2 2
= 2 2
=0.678
√ 𝑧 +𝑧
1 2 √− 1.085 +1
34
DSCI 5240
Eigenvalues and Eigenvectors

• For our example:
Component Eigenvalue Normalized Eigenvectors

PC1 1.284 =0.678 =0.735
PC2 0.050 =-0.735 =0.678
• It is clear that PC1 is associated with significantly more variation than PC2
• Discarding PC2 allows us to reduce dimensions (2  1) and doesn’t cost us much
information
35
DSCI 5240
Calculating Principal Component Values
New reduced dimension are created by multiplying the matrix of retained Eigenvectors
by a matrix of the original dimensions minus their means (centered data)
(1) (2) (3) (4) Dimensions Eigenvector PC1

2.500
2.500 2.400
2.400 0.690
0.690 0.490
0.490
0.500 0.700 -1.310 -1.210 0.69 0.49 0 .828
[ ] []
0.500 0.700 -1.310 -1.210
2.200 2.900 0.390 0.990
− 1.31 − 1.21 − 1.778
2.200 2.900 0.390 0.990
0.39 0.99 0.992
1.900 2.200 0.090 0.290
1.900 2.200 0.090 0.290 0.09 0.29 0.274
3.100 3.000 1.290 1.090
1.29 1.09 0.6779 1.676
3.100
2.300
2.300
3.000
2.700
2.700
1.290
0.490
0.490
1.090
0.790
0.790 0.49 0.79
∙
[0.7352
=
] 0.913
2.000
2.000
1.600
1.600
0.190
0.190
-0.310
-0.310 0.19 − 0.31 − 0.099
1.000 1.100 -0.810 -0.810 − 0.81 − 0.81 − 1.145
1.000 1.100 -0.810 -0.810
1.500 1.600 -0.310 -0.310 − 0.31 − 0.31 − 0.438
1.500 1.600 -0.310 -0.310
1.100 0.900 -0.710 -1.010 − 0.71 − 1.01 − 1.224
1.100 0.900 -0.710 -1.010 36
DSCI 5240
Choosing K
• In an example with two dimensions this is easy… what if we had two hundred
dimensions? How many should we keep?
• There are two common approaches for choosing how many principal components to
keep
• Threshold – Determine how much of the information you want to retain, keep
enough components to satisfy that threshold
𝐾
∑ 𝜆 𝑖
𝑖=1
>𝑇h𝑟𝑒𝑠h𝑜𝑙𝑑 (𝑒 . 𝑔 . , 0.9 𝑜𝑟 0.95 )
𝑡𝑟𝑎𝑐𝑒
• Scree Plot – Order Eigenvalues in descending order

and plot the kth Eigenvalue against k
37
DSCI 5240
A Note on PCA and Classification

• PCA works well when the target variable is interval
• If the target is nominal, care must be used if PCA will be employed on the
independent variables
• Projection axes chosen by PCA may not give good discrimination power
• PCA maintains what is common in the data, not what differentiates them
• Thus, PCA may reduce the efficacy of classification algorithms if not used with
care
• Linear discriminant analysis may be advisable in these situations
39
DSCI 5240
Some Issues
• Covariance is sensitive to large values
• Dimensions with large scales dominate
• Such dimensions are likely to become principal components
• Normalization can help reduce this issue (we’ll see this in the future)
• PCA assumes the underlying subspace is linear (i.e., that variables are numeric) and
thus transformations may be required
40
DSCI 5240
Some Useful Resources

• Eigenvalue and Eigenvector Calculator
• Principal Component Analysis 4 Dummies: Eigenvectors, Eigenvalues and Dimension R
eduction
41

Dimension Reduction: DSCI 5240 Data Mining and Machine Learning For Business

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dimension Reduction: DSCI 5240 Data Mining and Machine Learning For Business

Uploaded by

Copyright:

Available Formats

DSCI 5240

• Feature Extraction – Maps high dimensional data onto a lower dimensional

• Simplify the data by removing unnecessary dimensions (noise)

Principal Components Analysis

Dimension Reduction and PCA

Some Matrix Terminology

More Matrix Terminology

• Scalar matrix: Diagonal • Transpose matrix AT of matrix

𝒁 = 𝑨 ∗ 𝑏 ⟺ 𝑧 𝑖𝑗 =𝑎𝑖 1 ∗ 𝑏+ 𝑎𝑖 2 ∗ 𝑏+… 𝑎𝑖𝑚 ∗𝑏

2.500 2.400 3.0

2.000 1.600 0.5 x

Step 1: Calculate Means

Step 2a: Calculate Variances

Step 2b: Calculate Covariance

(1) (2) (3) (4) (5)

Step 3: Compute Eigenvalues

Step 3: Compute Eigenvalues

{3} ¿ ( 0.442 ) − ( 0.617 𝜆 ) − ( 0.717 𝜆 ) + 𝜆2 −0.6152 =0

{4} ¿ 𝜆2 −1.333 𝜆+0.063=0

Step 3: Compute Eigenvalues

You can write Eigenvalues as a percentage of trace(A)

The product of the Eigenvalues is the determinant of the matrix

(𝜆 ¿ ¿ 1 ∗ 𝜆 2)= ( 1.284 ∗0.050 )=0.064 ¿

Step 4: Calculate Eigenvectors

Step 4: Calculate Eigenvectors

0 .617 − 1.284 0.615

Step 4: Calculate Eigenvectors

0 .617 − 0.050 0.615

Eigenvalues and Eigenvectors

Component Eigenvalue Normalized Eigenvectors

Calculating Principal Component Values

(1) (2) (3) (4) Dimensions Eigenvector PC1

• Scree Plot – Order Eigenvalues in descending order

A Note on PCA and Classification

Some Useful Resources

You might also like