3 - Dimensionality Reduction

CHAPTER 3 
Dimensionality reduction
Hervé Gross, PhD - Reservoir Engineer

Advanced Resources and Risk Technology
hgross@ar2tech.com
© ADVANCED RESOURCES AND RISK TECHNOLOGY This document can only be distributed within UFRGS
What is dimensionality reduction and why?
o Reduction of the number of features used to predict a given outcome

o Identify “Principal Variables”, sometimes even tuned specifically for a given outcome
o Goal: reduce training time/cost, facilitate interpretation, avoid overfitting
o Curse of dimensionality
Dimensionality reduction
Feature selection Feature extraction

Identify a subset of the original variables to Transform the original variables linearly
maximize: information gain, accuracy, or (PCA) or not (kernel, multidimensional) to
impact on decision maximize the independence of variables
and minimize their number
© ADVANCED RESOURCES AND RISK TECHNOLOGY 2

Featured Feature Extraction
Think of feature extraction as a smart way to solve almost the same problem with less
variables. A useful tool, not en end in itself.
We will focus on the following techniques
o Principal Component Analysis

o Kernel Principal Component Analysis
o Independent Component Analysis
o Multidimensional Scaling

Principal Component Analysis
o Orthogonal transformation of data into a set of

uncorrelated variables called principal
components seeking to maximize the
variance separation.
o Principal components are linear combinations
of the original variables.
o ~ Find the “best viewing angle/shadow
projection” for maximum data separation
o Sort components according to their
importance. If the least important ones are
dropped, we reduce dimensionality.
o Uses the same matrix factorization
mechanism as Singular Value Decomposition
(rotation, scaling [diagonal matrix], rotation)
o Always possible to back-transform
o Non-linear generalizations exist: Kernel PCA is
one example (Kernel = a function to measure
similarity).
https://en.wikipedia.org/wiki/Principal_component_analysis

Kernel-PCA
o Kernel transformation is projection of the data in a much larger (sometimes) infinite

dimension space.
o This alters the relative distance of data points and often allows for a better separation
between data points.
o PCA uses linear combinations of variables to form new, independent, variables.
o One could also perform non-linear component analyses, but a simpler way is to distortt
space instead
o K-PCA is PCA in a non-linear transform of space (using the kernel eigenfunctions)
o Instead of transforming the variables, we transform the metric space
o To goal is to make the data linearly separable in the Kernel space (not in the
Euclidian)
o Almost always possible to back transform
o We can then compare the original data to the transformed-and-back data

Ex. 4: PCA, K-PCA, Incremental-PCA

Independent Component Analysis
• PCA relies on a Gaussian distribution of features (construction of PC on mean,

covariance). If the phenomena is not Gaussian, ICA helps.
• ICA spreads the data according to higher order moments
• ICA usually “whitens” the data: maximize non-Gaussianity by making it more like white
noise
• Representing ICA in the feature space gives the view of ‘geometric ICA’: ICA is an
algorithm that finds directions in the feature space corresponding to projections with high
non-Gaussianity.
• These directions need not be orthogonal in the original feature space, but they are
orthogonal in the feature space, in which all directions correspond to the same variance.
• ICA is often used in single processing to separate several uncorrelated signals

Multidimensional scaling
o Instead of maximizing
variance separation (PCA),
MDS maximizes distance
separation
o It also creates a transformed
version of space.
o Positions do not matter, only
relative distances
o Allows to find representative
models or select from
ensembles by relative
positioning

References
Bernhard Schoelkopf, Alexander J. Smola, and Klaus-Robert Mueller. 1999. Kernel principal component analysis. In
Advances in kernel methods, MIT Press, Cambridge, MA, USA 327-352.
A. Hyvarinen and E. Oja, Independent Component Analysis: Algorithms and Applications, Neural Networks, 13(4-5), 2000,
pp. 411-430
Uncertainty Quantification in Reservoir Performance Using Distances and Kernel Methods—Application to a West Africa
Deepwater Turbidite Reservoir Céline Scheidt, SPE, and Jef Caers, SPE, Stanford University, SPEJ-118740 (2009)

3 - Dimensionality Reduction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 - Dimensionality Reduction

Uploaded by

Copyright:

Available Formats

CHAPTER 3

Hervé Gross, PhD - Reservoir Engineer

o Reduction of the number of features used to predict a given outcome

Feature selection Feature extraction

© ADVANCED RESOURCES AND RISK TECHNOLOGY 2

We will focus on the following techniques

o Principal Component Analysis

© ADVANCED RESOURCES AND RISK TECHNOLOGY 3

o Orthogonal transformation of data into a set of

© ADVANCED RESOURCES AND RISK TECHNOLOGY 4

o Kernel transformation is projection of the data in a much larger (sometimes) infinite

© ADVANCED RESOURCES AND RISK TECHNOLOGY 5

© ADVANCED RESOURCES AND RISK TECHNOLOGY 6

• PCA relies on a Gaussian distribution of features (construction of PC on mean,

© ADVANCED RESOURCES AND RISK TECHNOLOGY 7

© ADVANCED RESOURCES AND RISK TECHNOLOGY 8

© ADVANCED RESOURCES AND RISK TECHNOLOGY 9

You might also like