17 DimensionalityReduction

Machine Learning
Dimensionality Reduction
Machine Learning
Overview
Motivation
Principle Component Analysis (PCA) Problem
Formulation
The Algorithm
Choosing k
Applying PCA
Motivation
Motivation :: Data Compression (1)
X Reduce data from 2D to 1D

X
X
Pilot Enjoyment
X
X
X
X
X
X
Pilot Skill
X Reduce data from 2D to 1D

X
X
𝑥
X
X
X .
X
X
X .
.
Will help us run the learning algorithms more quickly

Reduce data from 3D to 2D
Quiz
Mean
GDP Per Capita Life Poverty
Country HDI Household ….
(Trillion $) GDP (K $) Expectancy Index
income (K $)
Canada 1.577 39.17 0.908 80.7 32.6 67.293 ….

China 5.878 7.54 0.687 73 46.9 10.22 ….
India 1.632 3.41 0.547 64.7 36.8 0.735 ….
Russia 1.48 19.84 0.755 65.5 39.9 0.72 ….
Singapore 0.223 56.69 0.866 80 42.5 67.1 ….
USA 14.527 46.86 0.91 78.3 40.8 84.3 ….
… …. …. …. …. …. …. ….
()
Country 𝟏 𝟐
Canada 1.6 1.2
China 1.7 0.3

Data reduced from 50D to 2D
India 1.6 0.2
Russia 1.4 0.5
Singapore 0.5 1.7
USA 2 1.5
… …. ….
Data Visualization
Quiz
Principal Component Analysis (PCA)
PCA Problem Formulation (1)
Perform Mean Normalization /
X Feature scaling before PCA
X
X
X
X
X X
X X
X X
X X
X X
X X
X Principal Component
X X
X X
X
X X
X X
• Reduce from 2-dim to 1-dim : Find a direction (vector 𝑢( ) ∈ ℝ ) onto which to project the data so as to
minimize the projection error.
• Reduce from n-dim to k-dim : Find k vectors 𝑢( ) , 𝑢( ) , … . . , 𝑢( ) onto which to project the data so as to
minimize the projection error.
PCA is not Linear Regression
X X
X X X X
X X X
X
• Fitting a straight line to minimize the

distance between point and the straight line • Shortest Orthogonal Distance
• Vertical Distance
Question
X X
X X
Given the projection, can we reconstruct the original data points?

PCA Algorithm
Data Pre-processing
Training Set:
Pre-processing (Feature Scaling/Mean Normalization)

()
○
()
○ Replace each by
○ Scale features to have compatible range of values (apply only if

different features on different scales)
()
■
PCA Algorithm
1. Compute Covariance Matrix
2. Perform Eigen Value Decomposition
3. Select the Number of Principal Components
4. Reduce the Dimension

Numerical Example – I
3 5
9 7
6 5
11 8
8 6
Numerical Example – II
𝑿 𝒀 1. Compute Covariance Matrix
3 5 2. Perform Eigen Value Decomposition

3. Select the No. of Principal Components
9 7
6 5
11 8
8 6
∑ 𝑥 − 𝜇 𝑦 − 𝜇
𝐶𝑜𝑣 𝑿, 𝒀 =
𝑚−1
Numerical Example – III
𝑿 𝒀
3 5
9 7
6 5 ∑ 𝑥 − 𝜇 𝑦 − 𝜇
4. Reduce the Dimension 𝐶𝑜𝑣 𝑿, 𝒀 =
𝑚−1
11 8
8 6
Numerical Example – IV
𝑿 𝒀
3 5
9 7
6 5 ∑ 𝑥 − 𝜇 𝑦 − 𝜇
4. Reduce the Dimension 𝐶𝑜𝑣 𝑿, 𝒀 =
𝑚−1
11 8
8 6
Numerical Example – V
𝑿 𝒀
3 5
9 7
6 5
11 8
8 6 2.1 Eigen Values

Numerical Example – VI
𝑿 𝒀
3 5
9 7
6 5
11 8
8 6 2.1 Eigen Values
First Principal Component

Numerical Example – VII
𝑿 𝒀
3 5
9 7
6 5
11 8
8 6 2.2 Eigen Vector of 𝜆

Numerical Example – VIII
𝑿 𝒀
3 5
9 7
6 5
11 8
8 6 2.2 Eigen Vector of 𝜆

Numerical Example – IX
𝑿 𝒀
3 5
9 7
6 5
11 8
8 6 2.3 Normalize Eigen Vector

Numerical Example – X
𝑿 𝒀
3 5
9 7
6 5
11 8
8 6
Numerical Example – XI
𝑿 𝒀
3 5
9 7
6 5
11 8
8 6
Numerical Example – XII
𝑿 𝒀 𝒁
3 5 -4.53
9 7 1.783
6 5 -1.7468
11 8 4.012
8 6 0.4822
Applying PCA
Applying PCA
Supervised Learning Speedup
,
(say, computer vision, where input is 100 x 100 image)
Extract inputs
○ Unlabelled data set : ( ) ( ) ,
Apply PCA and get

○ ( ) ( ) , ( ) ( )
New Training Set

○ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
Apply Hypothesis
Applications of PCA
Compression
○ Reduce memory/disk needed to store data
○ Speed up learning algorithm
Visualization
○ , or
Bad Use of PCA (1)
Use instead of to reduce the number of features .

Fewer features, less likely to over fit. (BAD USE)
This might work but is not a good way to address overfitting.

Use regularization instead.
Bad Use of PCA (2)
PCA is sometime used where it should not be.

Design of ML Systems
○ Get training set ( ) ( ) ( ) ( ) ( ) ( )
○ Run PCA to reduce in dimension to get

○ Train logistic regression on ( ) ( ) ( ) ( ) ( ) ( )
() ()
○ Test on test set: Map to Run
How about doing the whole process without PCA?
Before implementing PCA, first try running whatever you
want to do with original (raw) data. Only if that does not
work, then implement PCA.
Acknowledgements
Acknowledgements
Material presented in these slides is obtained from

Prof. Andrew Ng course on Machine Learning

17 DimensionalityReduction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

17 DimensionalityReduction

Uploaded by

Copyright:

Available Formats

Machine Learning

X Reduce data from 2D to 1D

X Reduce data from 2D to 1D

Will help us run the learning algorithms more quickly

Canada 1.577 39.17 0.908 80.7 32.6 67.293 ….

Canada 1.6 1.2

China 1.7 0.3

Russia 1.4 0.5

Singapore 0.5 1.7

• Fitting a straight line to minimize the

Given the projection, can we reconstruct the original data points?

Pre-processing (Feature Scaling/Mean Normalization)

○ Scale features to have compatible range of values (apply only if

1. Compute Covariance Matrix

2. Perform Eigen Value Decomposition

3. Select the Number of Principal Components

4. Reduce the Dimension

3 5 2. Perform Eigen Value Decomposition

8 6 2.1 Eigen Values

8 6 2.1 Eigen Values

First Principal Component

8 6 2.2 Eigen Vector of 𝜆

8 6 2.2 Eigen Vector of 𝜆

8 6 2.3 Normalize Eigen Vector

Apply PCA and get

New Training Set

Use instead of to reduce the number of features .

This might work but is not a good way to address overfitting.

PCA is sometime used where it should not be.

○ Run PCA to reduce in dimension to get

Material presented in these slides is obtained from

You might also like