© All Rights Reserved

23 views

pca.pdf

© All Rights Reserved

- 14.pdf
- Labelling
- Cambridge Maths Syllabus
- JITMAY
- Optimal 2003
- A Dynamic Stiffness Element for Free Vibration Analysis of Delaminated Layered Beams
- chapindxf9
- Blackwell Science, Ltd Phylogenetic variation in heavy metal accumulation in angiosperms
- Factorial
- [IJCST-V4I5P29]:Swati Thakur , Monika Sharma
- ipd
- 04HQGR0087
- Verification of the Wind Induced Dynamic Response of the Svinesund Bridge in the Time Domain by the Use of Autoregressive Simulations
- Article 1
- logpca_aistats03
- What Australian Primary School Students Value in Mathematics Learning a WIFI Preliminary Study
- Colored Gaussian Noise 2
- compative analysis
- elkhamy2006
- Bor Wer DSFM Electricity20080207

You are on page 1of 33

Frank Wood

December 8, 2009

This lecture borrows and quotes from Joliffes Principle Component Analysis book. Go buy it!

The central idea of principal component analysis (PCA) is

to reduce the dimensionality of a data set consisting of a

large number of interrelated variables, while retaining as

much as possible of the variation present in the data set.

This is achieved by transforming to a new set of variables,

the principal components (PCs), which are uncorrelated,

and which are ordered so that the first few retain most of

the variation present in all of the original variables.

[Jolliffe, Pricipal Component Analysis, 2nd edition]

PCA rotation

PCA in a nutshell

Notation

I

k is a vector of p constants

P

0k x = pj=1 kj xj

Procedural description

I

01 x maximum variance.

Iterate.

Goal

It is hoped, in general, that most of the variation in x will be

accounted for by m PCs where m << p.

Derivation of PCA

Assumption and More Notation

I

covariance matrix, when is unknown.

Shortcut to solution

I

is an eigenvector of corresponding to its k th largest

eigenvalue k .

Var(zk ) = k

Derivation of PCA

First Step

I

length vector).

I

technique of Lagrange multipliers. We maximize the function

0k k (0k k 1)

w.r.t. to k by differentiating w.r.t. to k .

Derivation of PCA

Constrained maximization - method of Lagrange multipliers

I

This results in

d

0k k k (0k k 1) = 0

dk

k k k = 0

k

= k k

k is an eigenvector of b f and k is the associated

eigenvalue.

Derivation of PCA

Constrained maximization - method of Lagrange multipliers

I

0k k = 0k k k = k 0k k = k

then we should choose k to be as big as possible. So, calling

1 the largest eigenvector of and 1 the corresponding

eigenvector then the solution to

1 = 1 1

is the 1st principal component of x.

but similar.

Derivation of PCA

Constrained maximization - more constraints

I

uncorrelated with 1 x.

these equations

cov(01 x, 02 x) = 01 2 = 02 1 = 02 1 01

= 1 02 = 1 01 2 = 0

to maximize 2

02 2 2 (02 2 1) 02 1

Derivation of PCA

Constrained maximization - more constraints

I

result equal to zero) yields

d

02 2 2 (02 2 1) 02 1 = 0

d2

2 2 2 1 = 0

01 2 2 01 2 01 1 = 0

0 0 1 = 0

then we can see that must be zero and that when this is

true that we are left with

2 2 2 = 0

Derivation of PCA

Clearly

2 2 2 = 0

is another eigenvalue equation and the same strategy of choosing

2 to be the eigenvector associated with the second largest

eigenvalue yields the second PC of x, namely 02 x.

This process can be repeated for k = 1 . . . p yielding up to p

different eigenvectors of along with the corresponding

eigenvalues 1 , . . . p .

Furthermore, the variance of each of the PCs are given by

Var[0k x] = k ,

k = 1, 2, . . . , p

Properties of PCA

For any integer q, 1 q p, consider the orthonormal linear

transformation

y = B0 x

where y is a q-element vector and B0 is a q p matrix, and let

y = B0 B be the variance-covariance matrix for y. Then the

trace of y , denoted tr(y ), is maximized by taking B = Aq ,

where Aq consists of the first q columns of A.

What this means is that if you want to choose a lower dimensional

projection of x, the choice of B described here is probably a good

one. It maximizes the (retained) variance of the resulting variables.

In fact, since the projections are uncorrelated, the percentage of

variance accounted for by retaining the first q PCs is given by

Pq

k

Pk=1

100

p

k=1 k

If we recall that the sample covariance matrix (an unbiased

estimator for the covariance matrix of x) is given by

S=

1

X0 X

n1

other words, X is a zero mean design matrix).

We construct the matrix A by combining the p eigenvectors of S

(or eigenvectors of X0 X theyre the same) then we can define a

matrix of PC scores

Z = XA

Of course, if we instead form Z by selecting the q eigenvectors

corresponding to the q largest eigenvalues of S when forming A

then we can achieve an optimal (in some senses) q-dimensional

projection of x.

Given the sample covariance matrix

S=

1

X0 X

n1

matrix is to utilize the singular value decomposition of S = A0 A

where A is a matrix consisting of the eigenvectors of S and is a

diagonal matrix whose diagonal elements are the eigenvalues

corresponding to each eigenvector.

Creating a reduced dimensionality projection of X is accomplished

by selecting the q largest eigenvalues in and retaining the q

corresponding eigenvectors from A

PCA is useful in linear regression in several ways

I

fewer parameters and easier regression.

coefficient estimator is minimized by the PCA choice of basis.

4.5 1.5

I x N [2 5],

1.5 1.0

I

4.5

1.5

1.5

1.0

)

4.5

1.5

1.5

1.0

)

The figures before showed the data without the third colinear

design matrix column. Plotting such data is not possible, but its

colinearity is obvious by design.

When PCA is applied to the design matrix of rank q less than p

the number of positive eigenvalues discovered is equal to q the

true rank of the design matrix.

If the number of PCs retained is larger than q (and the data is

perfectly colinear, etc.) all of the variance of the data is retained in

the low dimensional projection.

In this example, when PCA is run on the design matrix of rank 2,

the resulting projection back into two dimensions has exactly the

same distribution as before.

If we take the standard regression model

y = X +

And consider instead the PCA rotation of X given by

Z = ZA

then we can rewrite the regression model in terms of the PCs

y = Z + .

We can also consider the reduced model

y = Zq q + q

where only the first q PCs are retained.

If we rewrite the regression relation as

y = Z + .

Then we can, because A is orthogonal, rewrite

X = XAA0 = Z

where = A0 .

= A

Clearly using least squares (or ML) to learn

is equivalent

directly.

to learning

And, like usual,

= (Z0 Z)1 Z0 y

= A(Z0 Z)1 Z0 y

so

Without derivation we note that the variance-covariance matrix of

is given by

p

X

2

Var() =

lk1 ak a0k

k=1

of A, and 2 is the observation noise variance, i.e. N(0, 2 I)

This sheds light on how multicolinearities produce large variances

If an eigenvector lk is small then the

for the elements of .

resulting variance of the estimator will be large.

One way to avoid this is to ignore those PCs that are associated

with small eigenvalues, namely, use biased estimator

=

m

X

lk1 ak a0k X0 y

k=1

where l1:m are the large eigenvalues of X0 X and lm+1:p are the

small.

= 2

Var()

m

X

lk1 ak a0k

k=1

is smaller it is possible that this could be an advantage.

Homework: find the bias of this estimator. Hint: use the spectral

decomposition of X0 X.

PCA is not without its problems and limitations

I PCA assumes approximate normality of the input space

distribution

I

projection of the data even if the data isnt normally distributed

Extensions to consider

to the exponential family.

Hyv

arinen, A. and Oja, E., Independent component analysis:

algorithms and applications

ISOMAP, LLE, Maximum variance unfolding, etc.

Non-normal data

Non-normal data

- 14.pdfUploaded bySimi Sweet
- LabellingUploaded byshruthig29111988
- Cambridge Maths SyllabusUploaded byHyun-ho Shin
- JITMAYUploaded byKampa Lavanya
- Optimal 2003Uploaded bySandesh Kumar B V
- A Dynamic Stiffness Element for Free Vibration Analysis of Delaminated Layered BeamsUploaded byesatec
- chapindxf9Uploaded byJuan Perez
- Blackwell Science, Ltd Phylogenetic variation in heavy metal accumulation in angiospermsUploaded byAnonymous CoUBbG1mL
- FactorialUploaded byjamesdkay
- [IJCST-V4I5P29]:Swati Thakur , Monika SharmaUploaded byEighthSenseGroup
- ipdUploaded byZakky Febrian
- 04HQGR0087Uploaded bydb
- Verification of the Wind Induced Dynamic Response of the Svinesund Bridge in the Time Domain by the Use of Autoregressive SimulationsUploaded byStephanie Rosales
- Article 1Uploaded byDawei Wang
- logpca_aistats03Uploaded byqsashutosh
- What Australian Primary School Students Value in Mathematics Learning a WIFI Preliminary StudyUploaded byRizki Maulidin
- Colored Gaussian Noise 2Uploaded byYasser Naguib
- compative analysisUploaded byashwani kumar sharma
- elkhamy2006Uploaded byAliAliss
- Bor Wer DSFM Electricity20080207Uploaded byalexander
- EMTP simul(11)Uploaded bykjfen
- ahlswede-winterUploaded byylo82
- Workers ‘specialized’ on inactivity: Behavioral consistency of inactive workers and their role in task allocationUploaded byhectorvargasg
- Bending and Vibration of Laminated Plates by a Layerwise Meshless FormulationUploaded bylekan4
- Lecture 20 ControlUploaded byBIRRU JEEVAN KUMAR
- 0100Uploaded byKawther Ben
- Lecture 3Uploaded bynarendra
- Luc TDS ING 2016 State Space TrajectoriesUploaded byNitin Maurya
- Quadratic Forms and Characteristic Roots Prof. NasserF1Uploaded byQamar Abbas
- materiUploaded byRonaldo Rumbekwan

- Networking SolutionUploaded byAmit Ray
- li-wang_SETUploaded byaasaddsd
- Pr ConceptUploaded byAmit Ray
- 6040_lecture13Uploaded byAmit Ray
- Analysis of Simple AlgorithmUploaded byAmit Ray
- open teaUploaded byAmit Ray
- Powell Notes5Uploaded byAmit Ray
- 00908974(3)Uploaded bybiren275
- BHUTER FUTUREUploaded byAmit Ray
- (Www.entrance-exam.net)-General Awarness 2011Uploaded byAkshay Mishra
- Search Water Jug HandoutUploaded byManjunath
- Cot 5310 NotesUploaded byRamkaran Jandu
- Syllabus for Group xUploaded byathipathy
- HuffmanUploaded byAmit Ray

- Lewis Carol CondensationUploaded byOkoh Ufuoma
- Lecture 11-12.pdfUploaded byAMIT SONI
- APMUploaded byUnknown
- STPM Sem 1 Mathematics Assignment (Matrice)Uploaded byLai Chungyi
- Chap04_8up- Eigenvalue ProblemsUploaded byagcore
- MA8251 important Quest - By EasyEngineering.net.pdfUploaded byPreethi
- pcaUploaded bybini12
- Everett & Borgatti 2012 - Dual ApproachUploaded byaotaegui29
- Solving Sparse linear systemUploaded byhame
- Equivalence of LineraCodesUploaded byErica S Erica
- list manipulation - Elegant operations on matrix rows and columns - Mathematica Stack Exchange.pdfUploaded bydharul khair
- Kalman DecompositionUploaded byOla Skeik
- MA251 - Algebra IUploaded byRebecca Rumsey
- Assignment 1 Mathematics IUploaded byUjjWal MahAjan
- Algebra of Matrices Question Bank Answer Sheet Maths lUploaded byshaniakiwi
- Code2flow - Online Interactive Code to Flowchart ConverterUploaded byJnaneswar Reddy Sabbella
- Linear Algebraic Equations and MatricesUploaded byMuhamad Hafiz
- Maths paper.Uploaded byCharlie Pinedo
- 2014-EE-121_Assigment1Uploaded byanon_730139085
- Direct Stiffness - TrussesUploaded byMauricio Riquelme Alvarado
- Inverse of a MatrixUploaded byrock man
- Lecture.5.EOF.allUploaded byguru_mahesh
- Jee Main Mathematics IIUploaded bynishantsah
- DeterminantUploaded bySantosh
- 32 Residual inverse iteration for the nonlinear eigenvalue problem.pdfUploaded byMaritza L M Capristano
- Multivariate StatisticsUploaded byInés Duranza
- jacobian.pdfUploaded bySyed Zain Mehdi
- Gauss quadrature 2Uploaded byrajaraman_1986
- antonchap1 matricesUploaded byapi-261282952
- Engineering MathematicsUploaded bysangeethsreeni