You are on page 1of 32

# Theory and

Toolkits of PCA

## 2009 5/4 IRLab

Study Group
Presenter : Chin-Hui Chen
Theory :
◦ 1. Scenario
◦ 2. What is PCA?
◦ 3. How to minimize Squared-Error ?
◦ 4. Dimensionality Reduction
Toolkit :
◦ A list of PCA toolkits
◦ Demo

Agenda
Consider a 2-dimension space

## Scenario (Point? Line?)

Theory :
◦ 1. Scenario
◦ 2. What is PCA?
◦ 3. How to minimize Squared-Error ?
◦ 4. Dimensionality Reduction
Toolkit :
◦ A list of PCA toolkits
◦ Demo

Agenda
Principal component analysis (PCA)
involves a mathematical procedure that
transforms a number of possibly correlated
variables into a smaller number of
uncorrelated variables called “principal
components”.

## What is PCA ? (1)

What can PCA do ?
◦ Dimensionality Reduction

For example :

## ◦ Assuming N points in D-dim space

◦ e.g. {x1, x2, x3, x4} ; xi = (v1, v2)
◦ A set (M) of basis for projection
◦ e.g. {u1}
They are orthonormal bases ( 長度 1, 兩兩內積 0)
M << D (represent the feature in M dimensions)
◦ e.g. xi = (p1)
What is PCA ? (2)
Theory :
◦ 1. Scenario
◦ 2. What is PCA?
◦ 3. How to minimize Squared-Error ?
◦ 4. Dimensionality Reduction
Toolkit :
◦ A list of PCA toolkits
◦ Demo

Agenda
Consider a D-dimension space
◦ Given N point : {x1, x2, …, xn}
◦ xi is a D-dim vector

How to
◦ 1. 找一個點使得 squared-error 最小
◦ 2. 找一條線使得 squared-error 最小

## How to minimize Squared-Error ?

◦ Goal : Find x0 s.t. min.

◦ Let .

How to ? - Point
∴ x0 =

◦ 1. 找一個點使得 squared-error 最小
◦ 2. 找一條線使得 squared-error 最小

L : xk’- x0 = ake
 xk’= x0 + ake
 = m + ake

## How to ? – Point - Line

L : xk’ = m + ake
Goal :

Find a1…an

How to ? – Line
每個部份微分後 [2ak – 2aket(xk-m)]

## What does it mean ?

How to ? – Line
Then, how about e ?

How to ? – Line
Independent of e

Let

How to ? – Line
J’1(e)= -etSe
Use lagrange multiplier :

f(x,y) ->

## Because |e| = 1 , u = etSe – λ(ete-1)

How to ? – Line

◦ What is S ?
Covariance Matrix ( 共變異數矩陣 )
◦ Assume D-dim

How to ? – Line
 , we know S.
Then, what is e ? Eigenvectors of S.

## AX= λX Eigen : same

How to ? – Line
Summary :
◦ Find a line : xk’= m + ake
ak = et(xk-m)
Se = λe ; e = eigenvectors of covariance matrix.
◦ D-dim space can find D eigenvectors.

How to ? – conclusion
Theory :
◦ 1. Scenario
◦ 2. What is PCA?
◦ 3. How to minimize Squared-Error ?
◦ 4. Dimensionality Reduction
Toolkit :
◦ A list of PCA toolkits
◦ Demo

Agenda
Dimensionality
Reduction
Consider a 2-dim space …

X1 = (a,b)
X2 = (c,d)

X1 = (a’,b’)
X2 = (c’,d’)

We are going to do …
X1 = (a’)
X2 = (c’)

Dimensionality Reduction
We want to proof :
◦ Axes of the data are independent.

## Consider N m-dim vectors

◦ {x1, x2, … ,xn}
◦ Let X=[x1-m x2-m … xn-m]T m = mean
Se = λe
eigen decomposition Eigen vector {e1,…,
em}

## ◦ Let E = [e1 e2 … em]

Dimensionality Reduction
E = [e1 e2 … em]

## SE = [Se1 Se2 … Sem]

 = [λe1 λe2 … λem]

 =

 = ED
S = EDE-1

Dimensionality Reduction
We want to know new Covariance Matrix of
projected vectors.

## Let Y = [y1 y2 … yn]T

 E = [e1 e2 … em]

Y = ETX

SY

Dimensionality Reduction
SY =D

## 1. Covariance of two axes are 0.

2. represent data↑->covariance of axes↑
 -> λ ↑

Dimensionality Reduction
Conclusion :
 If we want to reduce

 dimension D to M
 (M<<D)
 1. Find S
 2. ->eigenvalues
 3. Select Top M
 4. Project data

Dimensionality Reduction
Theory :
◦ 1. Scenario
◦ 2. What is PCA?
◦ 3. How to minimize Squared-Error ?
◦ 4. Dimensionality Reduction
Toolkit :
◦ A list of PCA toolkits
◦ Demo

Agenda
Toolkits
 C & Java
◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources
◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/

 Perl
◦ PDL::PCA

 Matlab
◦ Statistics Toolbox™ : princomp

 Weka
◦ weka.attributeSelection.PrincipalComponents
(http://www.laps.ufpa.br/aldebaro/weka/feature_selection.html
)

## A List of PCA Toolkits

 C & Java
◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources
◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/

C:
Download: pca.c
Compile: cc pca.c -lm -o pcac
Run: ./pcac spectr.dat 36 8 R > pcaout.c.txt

Java :
Download: JAMA, PCAcorr.java
Compile: javac –classpath Jama-1.0.2.jar PCAcorr.java
Run: java PCAcorr iris.dat > pcaout.java.txt