Principal Component
Analysis (PCA) and Kmeans Clustering
Class Note PETE 630 Geostatistics
PCA
• Principal component analysis (PCA) finds a set of standardized linear combination (called principal components) which are orthogonal and taken together explain all the variances of the original data.
• In general there are as many principal components as variables. However, in most cases it is possible to consider first few principal components that explain most of the data variance.
PETE 630 Geostatistics
PCA Details
• Orthogonal linear transformation
• First, normalize the data to a mean of zero and variance of unity. This removes the undue
impact of size or dimensions of the variables
involved
• Next we construct the covariance matrix (this is actually a correlation matrix as the data has
been normalized)
PETE 630 Geostatistics
PCA Example: Well Log Data Analysis
Variance given as:
Covariance given as:
PETE 630 Geostatistics
Covariance for Each Variables
Covariance matrix given by:
Advantage:
Reduction of data to a 6 x 6 covariance matrix instead of multiple well log data
PETE 630 Geostatistics
Eigenvalues
Eigenvalues of a matrix A are given by:
Corresponding to each eigenvalues there will be a non trivial solution, i.e. x not equal to 0
A x = l x
PETE 630 Geostatistics
Eigenvalues
Sum of the eigenvalues yields the overall variance of data set
PETE 630 Geostatistics
PCA
•
•
•
•
Each principal axis is a linear combination of the original two variables
PC
j
= a _{i}_{1} Y _{1} + a _{i}_{2} Y _{2} + … a _{i}_{n} Y _{n}
a ’s are the coefficients for factor i, multiplied by the measured value for variable j PC 1 is simultaneously the direction of maximum variance and a leastsquares “line of best fit” (squared distances of points away from PC 1 are minimized).
ij
6
6
4
6
8
2
4
1
PC
2
4
4
8
12
10
2
0
0
2
PC 2
8
6
PETE 630 Geostatistics
Kmeans Algorithm
•

Uses an iterative refinement technique.

•

Given an initial set of k means m _{1} , m _{2} , ….,m _{k} , the algorithm proceeds by alternating between two steps:

xm
p
()
t
i
1 jk
where each x _{p} goes into exactly one two of them.
S
i
1
x
j
S
()
t
()
t
i
xS
ji
PETE 630 Geostatistics
K initial “means” (in this case k = 3) are randomly selected from the data set.
5
4
k 1
3
k 2
2
1
k 3
0
0
1
2
3
4
5
PETE 630 Geostatistics
K clusters are created by associating every observation with the nearest mean.
5
4
k 1
3
k 2
2
1
k 3
0
0
1
2
3
4
5
PETE 630 Geostatistics
The centroid of each of the k clusters becomes the new means.
5
4
k 1
3
2
k 3
k 2
1
0
0
1
2
3
4
5
PETE 630 Geostatistics
Kmeans Clustering: Step 4
Step2 through 3 are repeated until convergence has been reached.
5
4
k 1
3
2
k 3
k 2
1
0
0
1
2
3
4
5
PETE 630 Geostatistics
Use of KMeans Clustering
• Strength
given data set
• The global optimum solution may be searched with other techniques (for example, Genetic algorithms)
• Weakness
• No way of knowing how many clusters exist
• Different result of cluster if different initial condition is used
PETE 630 Geostatistics
Discriminant Analysis
Discriminant analysis aims at classifying unknown samples to one of several possible groups or classes.
Classical discriminant function analysis attempts to develop a linear
equation (a linear weighted function of the variables) that best
differentiates between two different classes
The approach requires a training data set for which assignment to population groups are already known ( a supervised method)
Discriminant Analysis
If two data groups are plotted in a multidimensional space, they would appear as two data clouds with either a distinct separation or some overlap.
An axis is located on which the distance between each cloud is
maximized while the dispersion within each cloud is simultanneously
minimized.
The new axis defines the linear discriminant function and is calculated from the multivariate means, variances and covariances of the data
groups.
The data points of the two groups may be projected onto this axis to collapse multidimensional data in terms of a single discriminant variable
Discriminant Analysis
We assume that each data group is characterized by a group specific
probability density function
If each group is characterized by a group specific probability density
function f _{c} (x) and the prior probability of the group p _{c} is known, then the posterior distribution of the classes given the observation x is (Bayes’ theorem)
The observation is allocated to the group with the maximal posterior probability
Discriminant Analysis
• By Bayes’ theorem, the posterior distribution of the classes given the observation is

(



)

p

c
p

x

c


p

c

c



( x )

p c


x

p ( x )


p ( x )


p

c

f

c

Q

c

2 log
(
x μ

f ( x )
c
)
T
Σ
1
(

2 log
x μ

p
c
)

c
log  Σ






2 log

p


c

c

c



c

By Maximum Likelihood rule,
PETE 630 Geostatistics