You are on page 1of 25

Principal Component

Analysis (PCA) and K-means Clustering

Class Note PETE 630 Geostatistics

Principal Component Analysis (PCA) and K-means Clustering Class Note PETE 630 Geostatistics

PCA

Principal component analysis (PCA) finds a set of standardized linear combination (called principal components) which are orthogonal and taken together explain all the variances of the original data.

In general there are as many principal components as variables. However, in most cases it is possible to consider first few principal components that explain most of the data variance.

PCA • Principal component analysis (PCA) finds a set of standardized linear combination (called principal components)

PETE 630 Geostatistics

2

PCA Details

Orthogonal linear transformation

First, normalize the data to a mean of zero and variance of unity. This removes the undue

impact of size or dimensions of the variables

involved

Next we construct the covariance matrix (this is actually a correlation matrix as the data has

been normalized)

PCA Details • Orthogonal linear transformation • First, normalize the data to a mean of zero
PCA Details • Orthogonal linear transformation • First, normalize the data to a mean of zero

PETE 630 Geostatistics

3

PCA Details

From matrix algebra, the covariance(correlation) matrix

can be factored into a diagonal matrix and an orthogonal matrix, L and Q respectively

     

The principal components are given by the following

PCA Details • From matrix algebra, the covariance(correlation) matrix can be factored into a diagonal matrix
PCA Details • From matrix algebra, the covariance(correlation) matrix can be factored into a diagonal matrix

PETE 630 Geostatistics

4

PCA Example: Well Log Data Analysis

PCA Example: Well Log Data Analysis Variance given as: Covariance given as: PETE 630 Geostatistics
PCA Example: Well Log Data Analysis Variance given as: Covariance given as: PETE 630 Geostatistics

Variance given as:

PCA Example: Well Log Data Analysis Variance given as: Covariance given as: PETE 630 Geostatistics

Covariance given as:

PCA Example: Well Log Data Analysis Variance given as: Covariance given as: PETE 630 Geostatistics
PCA Example: Well Log Data Analysis Variance given as: Covariance given as: PETE 630 Geostatistics

PETE 630 Geostatistics

Covariance for Each Variables

Covariance matrix given by:

Covariance for Each Variables Covariance matrix given by: Advantage: Reduction of data to a 6 x
Covariance for Each Variables Covariance matrix given by: Advantage: Reduction of data to a 6 x

Advantage:

Reduction of data to a 6 x 6 covariance matrix instead of multiple well log data

Covariance for Each Variables Covariance matrix given by: Advantage: Reduction of data to a 6 x
Covariance for Each Variables Covariance matrix given by: Advantage: Reduction of data to a 6 x

PETE 630 Geostatistics

Eigenvalues

Eigenvalues of a matrix A are given by:

Eigenvalues Eigenvalues of a matrix A are given by: Corresponding to each eigenvalues there will be

Corresponding to each eigenvalues there will be a non- trivial solution, i.e. x not equal to 0

A x = l x

Eigenvalues Eigenvalues of a matrix A are given by: Corresponding to each eigenvalues there will be
Eigenvalues Eigenvalues of a matrix A are given by: Corresponding to each eigenvalues there will be

PETE 630 Geostatistics

Eigenvalues

Sum of the eigenvalues yields the overall variance of data set

Eigenvalues Sum of the eigenvalues yields the overall variance of data set PETE 630 Geostatistics

PETE 630 Geostatistics

Eigenvalues Sum of the eigenvalues yields the overall variance of data set PETE 630 Geostatistics

PCA

Each principal axis is a linear combination of the original two variables

PC

j

= a i1 Y 1 + a i2 Y 2 + … a in Y n

a ’s are the coefficients for factor i, multiplied by the measured value for variable j PC 1 is simultaneously the direction of maximum variance and a least-squares “line of best fit” (squared distances of points away from PC 1 are minimized).

ij

PCA • • • • Each principal axis is a linear combination of the original two

Variable X 2

6 -6 -4 6 8 -2 4 1 PC 2 4 -4 8 12 10 2
6
-6
-4
6
8
-2
4
1
PC
2
4
-4
8
12
10
2
0
0
-2
PC 2
-8
-6

Variable X 1

PETE 630 Geostatistics

9

PCA

PC1 PC2 PC3 PC4 PC5 PC6 GR -0.16 0.43 -0.09 0.06 -0.31 -0.02 NPHI -0.42 -0.16
PC1
PC2
PC3
PC4
PC5
PC6
GR
-0.16
0.43
-0.09
0.06
-0.31
-0.02
NPHI
-0.42
-0.16
0.19
0.86
-0.15
0.08
RHOB
0.41
0.06
-0.76
0.42
0.27
-0.01
DT
-0.46
0.21
0.05
-0.04
0.86
-0.09
log (LLD)
0.46
0.13
0.45
0.24
0.13
-0.71
log(MSFL)
0.46
0.20
0.42
0.15
0.25
0.70
Contribution, %
Cum.Contribution, %
64.5
13.4
8.1
7.4
4.1
2.5
61.3
77.9
86.0
93.4
97.5 100
PC2 = 0.43(GR)-
0.16(NPHI)+0.06(RHOB)+0.21(DT)+0.13log(LLD)+0.20log(MSFL)
PCA PC1 PC2 PC3 PC4 PC5 PC6 GR -0.16 0.43 -0.09 0.06 -0.31 -0.02 NPHI -0.42

PETE 630 Geostatistics

PCA DT
PCA
DT
PCA DT PETE 630 Geostatistics

PETE 630 Geostatistics

PCA

4.0 3.0 2.0 1.0 0.0 -1.0 -2.0 -3.0 -6.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0
4.0
3.0
2.0
1.0
0.0
-1.0
-2.0
-3.0
-6.0
-5.0
-4.0
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
5.0
PC 2
PCA 4.0 3.0 2.0 1.0 0.0 -1.0 -2.0 -3.0 -6.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0

PC 1

PETE 630 Geostatistics

Tightness

PCA 4.0 3.0 2.0 1.0 0.0 -1.0 -2.0 -3.0 -6.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0

K-means Clustering

A method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.

Given a set of observations (x 1 , x 2 , …, x n ), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k n) S = {S 1 , S 2 , …, S k } so as to minimize the within- cluster sum of squares:

 

k

 

arg

s

min



1

ix S



ji

x   j i
x 
j
i

2

where,

 

is the mean of points in

S

  • i .

 
K-means Clustering • A method of cluster analysis which aims to partition n observations into k

PETE 630 Geostatistics

13

K-means Algorithm

Uses an iterative refinement technique.

Given an initial set of k means m 1 , m 2 , ….,m k , the algorithm proceeds by alternating between two steps:

  • Assignment step: Assign each observation to the cluster with the closest mean:

 Sx : () t  i p
Sx :
()
t
i
p

xm

p

()

t

i

 xm  () t p j
 xm
()
t
p
j

1 jk

 

where each x p goes into exactly one two of them.

S

  • (t) , even if it could go in

i

  • Updating step: Calculate the new means to be the centroid of the observations in the cluster.

K-means Algorithm • Uses an iterative refinement technique. • Given an initial set of k means

(1)

t

  • m i

1  x j S () t () t i xS  ji
1
x
j
S
()
t
()
t
i
xS
ji

PETE 630 Geostatistics

14

Steps for Clustering

Step 1: K initial “means” (in this case k = 3) are randomly

selected from the data set.

Step 2: K clusters are created by associating every observation with the nearest mean.

Step 3: The centroid of each of the k clusters becomes

the new means.

Step 4: Step2 through 3 are repeated until convergence has been reached.

Steps for Clustering • Step 1: K initial “means” (in this case k = 3) are

PETE 630 Geostatistics

15

STEP 1
STEP 1

K initial “means” (in this case k = 3) are randomly selected from the data set.

STEP 1 K initial “means” (in this case k = 3) are randomly selected from the
5 4 k 1 3 k 2 2 1 k 3 0 0 1 2 3
5
4
k 1
3
k 2
2
1
k 3
0
0
1
2
3
4
5

PETE 630 Geostatistics

STEP 2
STEP 2

K clusters are created by associating every observation with the nearest mean.

STEP 2 K clusters are created by associating every observation with the nearest mean. 5 4
5 4 k 1 3 k 2 2 1 k 3 0 0 1 2 3
5
4
k 1
3
k 2
2
1
k 3
0
0
1
2
3
4
5

PETE 630 Geostatistics

STEP 3
STEP 3

The centroid of each of the k clusters becomes the new means.

STEP 3 The centroid of each of the k clusters becomes the new means. 5 4
5 4 k 1 3 2 k 3 k 2 1 0 0 1 2 3
5
4
k 1
3
2
k 3
k 2
1
0
0
1
2
3
4
5

PETE 630 Geostatistics

K-means Clustering: Step 4
K-means Clustering: Step 4

Step2 through 3 are repeated until convergence has been reached.

K-means Clustering: Step 4 Step2 through 3 are repeated until convergence has been reached. 5 4
5 4 k 1 3 2 k 3 k 2 1 0 0 1 2 3
5
4
k 1
3
2
k 3
k 2
1
0
0
1
2
3
4
5

PETE 630 Geostatistics

Use of K-Means Clustering
Use of K-Means Clustering

Strength

  • Relatively efficient, simple and easy way to classify a

given data set

  • May find and terminate at a local optimum solution.

The global optimum solution may be searched with other techniques (for example, Genetic algorithms)

Weakness

  • The number of clusters, k, must be specified in advance

No way of knowing how many clusters exist

  • Sensitive to initial condition

Different result of cluster if different initial condition is used

  • Result is more likely circular shape because of use of distance

Use of K-Means Clustering • Strength  Relatively efficient, simple and easy way to classify a

PETE 630 Geostatistics

Discriminant Analysis

Discriminant analysis aims at classifying unknown samples to one of several possible groups or classes.

Classical discriminant function analysis attempts to develop a linear

equation (a linear weighted function of the variables) that best

differentiates between two different classes

The approach requires a training data set for which assignment to population groups are already known ( a supervised method)

Discriminant Analysis

If two data groups are plotted in a multidimensional space, they would appear as two data clouds with either a distinct separation or some overlap.

An axis is located on which the distance between each cloud is

maximized while the dispersion within each cloud is simultanneously

minimized.

The new axis defines the linear discriminant function and is calculated from the multivariate means, variances and covariances of the data

groups.

The data points of the two groups may be projected onto this axis to collapse multidimensional data in terms of a single discriminant variable

Discriminant Analysis

Discriminant Analysis

Discriminant Analysis

We assume that each data group is characterized by a group specific

probability density function

If each group is characterized by a group specific probability density

function f c (x) and the prior probability of the group p c is known, then the posterior distribution of the classes given the observation x is (Bayes’ theorem)

Discriminant Analysis We assume that each data group is characterized by a group specific probability density

The observation is allocated to the group with the maximal posterior probability

Discriminant Analysis

By Bayes’ theorem, the posterior distribution of the classes given the observation is

( | ) f ( x )
(
|
)
f
( x )
 

(

|

)

p

c

p

x

c

p

c

c

 

( x )

p c

 

x

p ( x )

 

p ( x )

 

p

c

f

c

Q

c

  2 log

(

x μ

f ( x )

c

)

T

Σ

1

(

2 log

x μ

p

c

)

c

log | Σ

 

|

 

2 log

p

 

c

c

c

   

c

By Maximum Likelihood rule,

Discriminant Analysis • By Bayes’ theorem, the posterior distribution of the classes given the observation is

PETE 630 Geostatistics