14 Machine Learning Dimensionality Reduction

STATISTICAL PROGRAMMING -
PYTHON
Machine Learning:
dimensionality reduction
Pablo Monfort Instituto de Empresa

Summary
▪ Machine Learning overview

▪ Unsupervised learning
▪ clustering
▪ KMeans
▪ hierarchical clustering
▪ dimensionality reduction
▪ feature selection
▪ feature extraction
▪ PCA: principal component analysis
▪ LDA: linear discriminant analysis
▪ association rules
▪ Supervised learning
▪ classification
▪ regression
2
2
Machine learning: algorithms
Supervised learning:
- Classification: classification trees…
- Regression: linear regression, logistic regression, stepwise regression,
regression trees…
Unsupervised learning:
- Clustering: k-means, k-medians, hierarchical clustering…
- Dimensionality reduction: principal component analysis, discriminant
analysis…
- Association rule: a priori agorithm…
3
3
Machine Learning: use cases
Supervised learning:
- Classification: customer retention, fraud detection, image classification…
- Regression: market forecasting, population growth prediction…
Unsupervised learning:
- clustering: customer segmentation, recommender systems…
- dimensionality reduction: structure discovery, big data visualization…
- association rule: targetted marketing…
4
4
Summary

▪ clustering
▪ KMeans
▪ classification
▪ regression
5
5
Dimensionality reduction: overview
Goal:
reducing the number of variables under consideration by obtaining a set of principal
variables.
How does it work?

Transforming the data in the high-dimensional space to a space in fewer
dimensions. Attending to the different transformations that can be applied, several
algorithms appear:
- PCA (not considering labels. Unsupervised. Optimization according to all data)
- kernel PCA (PCA nonlinear extension)
- LDA (considering labels. Supervised. Optimization according to each segment)

- GDA (LDA nonlinear extension)…
Usages:
a dimensionality reduction algorithm can be applied:
- prior to applying a K-nearest neighbors algorithm to avoid the curse of
dimensionality
- prior to regressions to avoid overfitting because of strong correlations
- for images treatment 6
6
Dimensionality reduction: overview
Goal:
reducing the number of variables under consideration by obtaining a set of
principal variables.
How does it work?

Transforming the data in the high-dimensional space to a space in fewer
dimensions. Attending to the different transformations that can be applied,
several algorithms appear:
- PCA
- kernel PCA (PCA nonlinear extension)
- LDA
- GDA (LDA nonlinear extension)…
Usages:
a dimensionality reduction algorithm can be applied:
- prior to applying a K-nearest neighbors algorithm to avoid the curse of
dimensionality
- prior to regressions to avoid overfitting because of strong correlations
- for images treatment
7
7
Summary

▪ clustering
▪ KMeans
▪ classification
▪ regression
8
8
PCA: overview
Number of components: 2
2nd component
x+y
1st component
PC1: x-y
9
9
PCA: description
Goal:
- Reduction of the dimensionality (number of variables) maximizing the variance explained
How does it work?
- The first component is built according to a linear combination of other variables and having
the largest possible variance of the data.
The second component is built in a same way but orthogonal to the previous component(s).
…third, forth components…
At the end, a orthogonal basis set of vectors (principal components) has been created.
[The number of components will be the minimum between number of variables
and number of elements minus one.]
Advantages:
- Reduce the number of variables for a simpler understanding and visualization
- Improve the application of a regression analysis or a K-nearest neighbors
Disadvantages: 2nd component
- High difficulty to understand and x-2y 1st component
explain the meaning of a principal component 2x+y
- High computational cost with high amounts
of data (solution: incremental PCA) 10
10
PCA: description
Example 2:
2 components
Example 1:
Total variance using
2 components
only 1 PC: 65%
Total variance using
only 1 PC: 50%
Example 3:
2 components
Total variance using 1
PC: 90%
Remember to scale
the data!
11
11
PCA
from sklearn.decomposition import PCA
pca = PCA(…)
Arguments in PCA:
- n_components = number of components
- svd_solver = # 'randomized'…
- whiten = True #True or False
pca.fit(data)
Attributes:
- pca.explained_variance_ratio_ # np.cumsum(pca.explained_variance_ratio_)
- pca.components_ # coefficients of the linear transformation of the
original data to obtain the components
- pca.n_components_ # number of components
data_pca = pca.transform(data) # data coordinates using the principal components
12
12
PCA: exercise
Programming challenge DRED.1
Taking in consideration the iris dataset:
1. How many principal components can we consider?

2. How do you think is going to be the cumulated percentage of explained variance
attending to the number of components? Calculate it.
3. Consider the necessary number of components to explain at least a 99% of the
variance. Give the equations to calculate these components.
4. Calculate the new values for this decomposition and plot them.
5. Repeat the steps 3 and 4 taking a 95% of the variance.
13
13
PCA: exercise
Taking in consideration the iris dataset:
1. How many principal components can we consider?

2. How do you think is going to be the cumulated percentage of explained variance
attending to the number of components? Calculate it.
3. Consider the necessary number of components to explain at least a 99% of the
variance. Give the equations to calculate these components.
4. Calculate the new values for this decomposition and plot them.
5. Repeat the steps 3 and 4 taking a 95% of the variance.

1. Build a dataset with 4 variables where 100% of the explained variance is reached
with only 2 components.
2. Can you describe the variance explained in a dataset with 4 variables (X,Y,Z,T)
where cor(Z,T) = 1
3. Which is the number of principal components that we should take to explain the
100% of the variance for a dataset with 4 variables (X,Y,Z,T) where Z=Y, T=7,
X~N(1,1) and Y~N(1,1)? And if X~N(1,1000) and Y~N(1,1000)? 14
14
PCA: exercise

The digits dataset contains 1.797 images with size 8x8 with numbers from 0 to 9.
Taking in consideration this dataset:
1. How many principal components are required to explain at least a 45% of the
variance?
2. Calculate the new values for this decomposition. Which are the equations to
calculate these new coordinates?
15
15
PCA: exercise

Taking in consideration the previous digits dataset:
1. Which is the amount of variance explained taking only 2 principal components?

2. Calculate the new values for this decomposition.
3. Plot all the digits registers using these 2 principal components giving the color for
each target.
16
16
Summary

▪ clustering
▪ KMeans
▪ classification
▪ regression
17
17
LDA vs PCA
LDA is a classification method using linear combination of variables while PCA is
dimension reduction method per se.
PCA seeks to find the direction that maximizes intra-cluster variance. The idea is to
project the cluster along a dimension such that all the data points are very well
separated.
LDA seeks to find a direction that maximizes inter-cluster variance. The idea is to
make different sets of data as distinguishable as possible.
18
18
LDA vs PCA
LDA is a classification method using linear combination of variables while PCA is
dimension reduction method per se.
PCA seeks to find the direction that maximizes intra-cluster variance. The idea is to
project the cluster along a dimension such that all the data points are very well
separated.
LDA seeks to find a direction that maximizes inter-cluster variance. The idea is to
make different sets of data as distinguishable as possible.
19
19
LDA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(…)
Arguments in LinearDiscriminantAnalysis:
- n_components = number of components
- solver = "svd' (default), "eigen"…
lda.fit(data, target_vector)
Attributes:
- lda.explained_variance_ratio_ # np.cumsum(pca.explained_variance_ratio_)
- lda.coef_ # coefficients of the linear transformation of the
original data to obtain the components
projected_data = lda.transform(data) # new data coordinates using the LDA components

Repeat the previous exercise using LDA instead of PCA 20
20
Session Wrap-up
21
Glossary
PCA(…)
.fit()
.explained_variance_ratio_
.components_
.n_components_
.transform(data)
22
22

14 Machine Learning Dimensionality Reduction

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

14 Machine Learning Dimensionality Reduction

Uploaded by

Copyright:

Available Formats

STATISTICAL PROGRAMMING -

Pablo Monfort Instituto de Empresa

▪ Machine Learning overview

▪ Machine Learning overview

How does it work?

- LDA (considering labels. Supervised. Optimization according to each segment)

How does it work?

▪ Machine Learning overview

from sklearn.decomposition import PCA

data_pca = pca.transform(data) # data coordinates using the principal components

1. How many principal components can we consider?

1. How many principal components can we consider?

Programming challenge DRED.2

Programming challenge DRED.3

Programming challenge DRED.4

1. Which is the amount of variance explained taking only 2 principal components?

▪ Machine Learning overview

projected_data = lda.transform(data) # new data coordinates using the LDA components

Programming challenge DRED.5

You might also like