You are on page 1of 44

Machine Learning

Dimensionality Reduction

Machine Learning
Overview

Motivation
Principle Component Analysis (PCA) Problem
Formulation
The Algorithm
Choosing k
Applying PCA
Motivation
Motivation :: Data Compression (1)

X Reduce data from 2D to 1D


X
X
Pilot Enjoyment

X
X
X
X
X
X

Pilot Skill
Motivation :: Data Compression (2)

X Reduce data from 2D to 1D


X
X
𝑥
X
X
X .
X
X
X .
.

Will help us run the learning algorithms more quickly


Motivation :: Data Compression (3)
Reduce data from 3D to 2D
Quiz
Dimensionality Reduction

Mean
GDP Per Capita Life Poverty
Country HDI Household ….
(Trillion $) GDP (K $) Expectancy Index
income (K $)

Canada 1.577 39.17 0.908 80.7 32.6 67.293 ….


China 5.878 7.54 0.687 73 46.9 10.22 ….
India 1.632 3.41 0.547 64.7 36.8 0.735 ….
Russia 1.48 19.84 0.755 65.5 39.9 0.72 ….
Singapore 0.223 56.69 0.866 80 42.5 67.1 ….
USA 14.527 46.86 0.91 78.3 40.8 84.3 ….
… …. …. …. …. …. …. ….
Dimensionality Reduction
()

Country 𝟏 𝟐

Canada 1.6 1.2

China 1.7 0.3


Data reduced from 50D to 2D
India 1.6 0.2

Russia 1.4 0.5

Singapore 0.5 1.7

USA 2 1.5

… …. ….
Data Visualization
Quiz
Principal Component Analysis (PCA)
PCA Problem Formulation (1)
Perform Mean Normalization /
X Feature scaling before PCA

X
X

X
X
PCA Problem Formulation (2)

X X

X X
PCA Problem Formulation (3)

X X

X X
PCA Problem Formulation (4)

X X

X X
PCA Problem Formulation (5)

X Principal Component

X X

X X
PCA Problem Formulation (6)
PCA Problem Formulation (7)

X
X X

X X

• Reduce from 2-dim to 1-dim : Find a direction (vector 𝑢( ) ∈ ℝ ) onto which to project the data so as to
minimize the projection error.
• Reduce from n-dim to k-dim : Find k vectors 𝑢( ) , 𝑢( ) , … . . , 𝑢( ) onto which to project the data so as to
minimize the projection error.
PCA is not Linear Regression

X X

X X X X

X X X
X

• Fitting a straight line to minimize the


distance between point and the straight line • Shortest Orthogonal Distance
• Vertical Distance
Question

X X

X X

Given the projection, can we reconstruct the original data points?


PCA Algorithm
Data Pre-processing

Training Set:

Pre-processing (Feature Scaling/Mean Normalization)


()

()
○ Replace each by

○ Scale features to have compatible range of values (apply only if


different features on different scales)
()

PCA Algorithm

1. Compute Covariance Matrix

2. Perform Eigen Value Decomposition

3. Select the Number of Principal Components

4. Reduce the Dimension


Numerical Example – I

3 5

9 7

6 5

11 8

8 6
Numerical Example – II
𝑿 𝒀 1. Compute Covariance Matrix

3 5 2. Perform Eigen Value Decomposition


3. Select the No. of Principal Components
9 7
4. Reduce the Dimension
6 5

11 8

8 6

∑ 𝑥 − 𝜇 𝑦 − 𝜇
𝐶𝑜𝑣 𝑿, 𝒀 =
𝑚−1
Numerical Example – III
𝑿 𝒀
1. Compute Covariance Matrix
3 5
2. Perform Eigen Value Decomposition
9 7
3. Select the No. of Principal Components
6 5 ∑ 𝑥 − 𝜇 𝑦 − 𝜇
4. Reduce the Dimension 𝐶𝑜𝑣 𝑿, 𝒀 =
𝑚−1
11 8

8 6
Numerical Example – IV
𝑿 𝒀
1. Compute Covariance Matrix
3 5
2. Perform Eigen Value Decomposition
9 7
3. Select the No. of Principal Components
6 5 ∑ 𝑥 − 𝜇 𝑦 − 𝜇
4. Reduce the Dimension 𝐶𝑜𝑣 𝑿, 𝒀 =
𝑚−1
11 8

8 6
Numerical Example – V
𝑿 𝒀
1. Compute Covariance Matrix
3 5
2. Perform Eigen Value Decomposition
9 7
3. Select the No. of Principal Components
6 5
4. Reduce the Dimension
11 8

8 6 2.1 Eigen Values


Numerical Example – VI
𝑿 𝒀
1. Compute Covariance Matrix
3 5
2. Perform Eigen Value Decomposition
9 7
3. Select the No. of Principal Components
6 5
4. Reduce the Dimension
11 8

8 6 2.1 Eigen Values

First Principal Component


Numerical Example – VII
𝑿 𝒀
1. Compute Covariance Matrix
3 5
2. Perform Eigen Value Decomposition
9 7
3. Select the No. of Principal Components
6 5
4. Reduce the Dimension
11 8

8 6 2.2 Eigen Vector of 𝜆


Numerical Example – VIII
𝑿 𝒀
1. Compute Covariance Matrix
3 5
2. Perform Eigen Value Decomposition
9 7
3. Select the No. of Principal Components
6 5
4. Reduce the Dimension
11 8

8 6 2.2 Eigen Vector of 𝜆


Numerical Example – IX
𝑿 𝒀
1. Compute Covariance Matrix
3 5
2. Perform Eigen Value Decomposition
9 7
3. Select the No. of Principal Components
6 5
4. Reduce the Dimension
11 8

8 6 2.3 Normalize Eigen Vector


Numerical Example – X
𝑿 𝒀
1. Compute Covariance Matrix
3 5
2. Perform Eigen Value Decomposition
9 7
3. Select the No. of Principal Components
6 5
4. Reduce the Dimension
11 8

8 6
Numerical Example – XI
𝑿 𝒀
1. Compute Covariance Matrix
3 5
2. Perform Eigen Value Decomposition
9 7
3. Select the No. of Principal Components
6 5
4. Reduce the Dimension
11 8

8 6
Numerical Example – XII
1. Compute Covariance Matrix
2. Perform Eigen Value Decomposition
3. Select the No. of Principal Components
4. Reduce the Dimension

𝑿 𝒀 𝒁

3 5 -4.53

9 7 1.783

6 5 -1.7468

11 8 4.012

8 6 0.4822
Applying PCA
Applying PCA
Supervised Learning Speedup

,
(say, computer vision, where input is 100 x 100 image)
Extract inputs
○ Unlabelled data set : ( ) ( ) ,

Apply PCA and get


○ ( ) ( ) , ( ) ( )

New Training Set


○ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

Apply Hypothesis
Applications of PCA

Compression
○ Reduce memory/disk needed to store data
○ Speed up learning algorithm

Visualization
○ , or
Bad Use of PCA (1)

Use instead of to reduce the number of features .


Fewer features, less likely to over fit. (BAD USE)

This might work but is not a good way to address overfitting.


Use regularization instead.
Bad Use of PCA (2)

PCA is sometime used where it should not be.


Design of ML Systems
○ Get training set ( ) ( ) ( ) ( ) ( ) ( )

○ Run PCA to reduce in dimension to get


○ Train logistic regression on ( ) ( ) ( ) ( ) ( ) ( )
() ()
○ Test on test set: Map to Run
How about doing the whole process without PCA?
Before implementing PCA, first try running whatever you
want to do with original (raw) data. Only if that does not
work, then implement PCA.
Acknowledgements
Acknowledgements

Material presented in these slides is obtained from


Prof. Andrew Ng course on Machine Learning

You might also like