Professional Documents
Culture Documents
14 Machine Learning Dimensionality Reduction
14 Machine Learning Dimensionality Reduction
PYTHON
Machine Learning:
dimensionality reduction
Supervised learning:
- Classification: classification trees…
- Regression: linear regression, logistic regression, stepwise regression,
regression trees…
Unsupervised learning:
- Clustering: k-means, k-medians, hierarchical clustering…
- Dimensionality reduction: principal component analysis, discriminant
analysis…
- Association rule: a priori agorithm…
3
3
Machine Learning: use cases
Supervised learning:
- Classification: customer retention, fraud detection, image classification…
- Regression: market forecasting, population growth prediction…
Unsupervised learning:
- clustering: customer segmentation, recommender systems…
- dimensionality reduction: structure discovery, big data visualization…
- association rule: targetted marketing…
4
4
Summary
Usages:
a dimensionality reduction algorithm can be applied:
- prior to applying a K-nearest neighbors algorithm to avoid the curse of
dimensionality
- prior to regressions to avoid overfitting because of strong correlations
- for images treatment 6
6
Dimensionality reduction: overview
Goal:
reducing the number of variables under consideration by obtaining a set of
principal variables.
Usages:
a dimensionality reduction algorithm can be applied:
- prior to applying a K-nearest neighbors algorithm to avoid the curse of
dimensionality
- prior to regressions to avoid overfitting because of strong correlations
- for images treatment
7
7
Summary
Number of components: 2
2nd component
x+y
1st component
PC1: x-y
9
9
PCA: description
Goal:
- Reduction of the dimensionality (number of variables) maximizing the variance explained
How does it work?
- The first component is built according to a linear combination of other variables and having
the largest possible variance of the data.
The second component is built in a same way but orthogonal to the previous component(s).
…third, forth components…
At the end, a orthogonal basis set of vectors (principal components) has been created.
[The number of components will be the minimum between number of variables
and number of elements minus one.]
Advantages:
- Reduce the number of variables for a simpler understanding and visualization
- Improve the application of a regression analysis or a K-nearest neighbors
Disadvantages: 2nd component
- High difficulty to understand and x-2y 1st component
explain the meaning of a principal component 2x+y
- High computational cost with high amounts
of data (solution: incremental PCA) 10
10
PCA: description
Example 2:
2 components
Example 1:
Total variance using
2 components
only 1 PC: 65%
Total variance using
only 1 PC: 50%
Example 3:
2 components
Total variance using 1
PC: 90%
Remember to scale
the data!
11
11
PCA
pca = PCA(…)
Arguments in PCA:
- n_components = number of components
- svd_solver = # 'randomized'…
- whiten = True #True or False
pca.fit(data)
Attributes:
- pca.explained_variance_ratio_ # np.cumsum(pca.explained_variance_ratio_)
- pca.components_ # coefficients of the linear transformation of the
original data to obtain the components
- pca.n_components_ # number of components
12
12
PCA: exercise
Programming challenge DRED.1
Taking in consideration the iris dataset:
13
13
PCA: exercise
Programming challenge DRED.1
Taking in consideration the iris dataset:
1. How many principal components are required to explain at least a 45% of the
variance?
2. Calculate the new values for this decomposition. Which are the equations to
calculate these new coordinates?
15
15
PCA: exercise
16
16
Summary
PCA seeks to find the direction that maximizes intra-cluster variance. The idea is to
project the cluster along a dimension such that all the data points are very well
separated.
LDA seeks to find a direction that maximizes inter-cluster variance. The idea is to
make different sets of data as distinguishable as possible.
18
18
LDA vs PCA
LDA is a classification method using linear combination of variables while PCA is
dimension reduction method per se.
PCA seeks to find the direction that maximizes intra-cluster variance. The idea is to
project the cluster along a dimension such that all the data points are very well
separated.
LDA seeks to find a direction that maximizes inter-cluster variance. The idea is to
make different sets of data as distinguishable as possible.
19
19
LDA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(…)
Arguments in LinearDiscriminantAnalysis:
- n_components = number of components
- solver = "svd' (default), "eigen"…
lda.fit(data, target_vector)
Attributes:
- lda.explained_variance_ratio_ # np.cumsum(pca.explained_variance_ratio_)
- lda.coef_ # coefficients of the linear transformation of the
original data to obtain the components
21
Glossary
PCA(…)
.fit()
.explained_variance_ratio_
.components_
.n_components_
.transform(data)
22
22