You are on page 1of 9

CHAPTER 3


Dimensionality reduction

Hervé Gross, PhD - Reservoir Engineer


Advanced Resources and Risk Technology
hgross@ar2tech.com

© ADVANCED RESOURCES AND RISK TECHNOLOGY This document can only be distributed within UFRGS
What is dimensionality reduction and why?

o Reduction of the number of features used to predict a given outcome


o Identify “Principal Variables”, sometimes even tuned specifically for a given outcome
o Goal: reduce training time/cost, facilitate interpretation, avoid overfitting
o Curse of dimensionality

Dimensionality reduction

Feature selection Feature extraction


Identify a subset of the original variables to Transform the original variables linearly
maximize: information gain, accuracy, or (PCA) or not (kernel, multidimensional) to
impact on decision maximize the independence of variables
and minimize their number

© ADVANCED RESOURCES AND RISK TECHNOLOGY 2


Featured Feature Extraction

Think of feature extraction as a smart way to solve almost the same problem with less
variables. A useful tool, not en end in itself.

We will focus on the following techniques

o Principal Component Analysis


o Kernel Principal Component Analysis
o Independent Component Analysis
o Multidimensional Scaling

© ADVANCED RESOURCES AND RISK TECHNOLOGY 3


Principal Component Analysis

o Orthogonal transformation of data into a set of


uncorrelated variables called principal
components seeking to maximize the
variance separation.
o Principal components are linear combinations
of the original variables.
o ~ Find the “best viewing angle/shadow
projection” for maximum data separation
o Sort components according to their
importance. If the least important ones are
dropped, we reduce dimensionality.
o Uses the same matrix factorization
mechanism as Singular Value Decomposition
(rotation, scaling [diagonal matrix], rotation)
o Always possible to back-transform
o Non-linear generalizations exist: Kernel PCA is
one example (Kernel = a function to measure
similarity).
https://en.wikipedia.org/wiki/Principal_component_analysis

© ADVANCED RESOURCES AND RISK TECHNOLOGY 4


Kernel-PCA

o Kernel transformation is projection of the data in a much larger (sometimes) infinite


dimension space.
o This alters the relative distance of data points and often allows for a better separation
between data points.
o PCA uses linear combinations of variables to form new, independent, variables.
o One could also perform non-linear component analyses, but a simpler way is to distortt
space instead
o K-PCA is PCA in a non-linear transform of space (using the kernel eigenfunctions)
o Instead of transforming the variables, we transform the metric space
o To goal is to make the data linearly separable in the Kernel space (not in the
Euclidian)
o Almost always possible to back transform
o We can then compare the original data to the transformed-and-back data

© ADVANCED RESOURCES AND RISK TECHNOLOGY 5


Ex. 4: PCA, K-PCA, Incremental-PCA

© ADVANCED RESOURCES AND RISK TECHNOLOGY 6


Independent Component Analysis

• PCA relies on a Gaussian distribution of features (construction of PC on mean,


covariance). If the phenomena is not Gaussian, ICA helps.
• ICA spreads the data according to higher order moments
• ICA usually “whitens” the data: maximize non-Gaussianity by making it more like white
noise
• Representing ICA in the feature space gives the view of ‘geometric ICA’: ICA is an
algorithm that finds directions in the feature space corresponding to projections with high
non-Gaussianity.
• These directions need not be orthogonal in the original feature space, but they are
orthogonal in the feature space, in which all directions correspond to the same variance.
• ICA is often used in single processing to separate several uncorrelated signals

© ADVANCED RESOURCES AND RISK TECHNOLOGY 7


Multidimensional scaling

o Instead of maximizing
variance separation (PCA),
MDS maximizes distance
separation
o It also creates a transformed
version of space.
o Positions do not matter, only
relative distances
o Allows to find representative
models or select from
ensembles by relative
positioning

© ADVANCED RESOURCES AND RISK TECHNOLOGY 8


References

Bernhard Schoelkopf, Alexander J. Smola, and Klaus-Robert Mueller. 1999. Kernel principal component analysis. In
Advances in kernel methods, MIT Press, Cambridge, MA, USA 327-352.
A. Hyvarinen and E. Oja, Independent Component Analysis: Algorithms and Applications, Neural Networks, 13(4-5), 2000,
pp. 411-430
Uncertainty Quantification in Reservoir Performance Using Distances and Kernel Methods—Application to a West Africa
Deepwater Turbidite Reservoir Céline Scheidt, SPE, and Jef Caers, SPE, Stanford University, SPEJ-118740 (2009)

© ADVANCED RESOURCES AND RISK TECHNOLOGY 9

You might also like