# Welcome back

## Find a book, put up your feet, stay awhile

Sign in with Facebook

Sorry, we are unable to log you in via Facebook at this time. Please try again later.

or

Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more

Download

Standard view

Full view

of .

Look up keyword

Like this

Share on social networks

18Activity

×

0 of .

Results for: No results containing your search query

P. 1

A Tutorial on Principal Component AnalysisRatings: (0)|Views: 154|Likes: 18

Published by microcolor

Principal component analisys course

Principal component analisys course

See more

See less

https://www.scribd.com/doc/20024006/A-Tutorial-on-Principal-Component-Analysis

09/27/2011

text

original

A Tutorial on Principal Component Analysis

Jonathon Shlens

∗

Systems Neurobiology Laboratory, Salk Insitute for Biological Studies La Jolla, CA 92037 and Institute for Nonlinear Science, University of California, San Diego La Jolla, CA 92093-0402

(Dated: December 10, 2005; Version 2)

Principal component analysis (PCA) is a mainstay of modern data analysis - a black box thatis widely used but poorly understood. The goal of this paper is to dispel the magic behind thisblack box. This tutorial focuses on building a solid intuition for how and why principal componentanalysis works; furthermore, it crystallizes this knowledge by deriving from simple intuitions, themathematics behind

PCA

. This tutorial does not shy away from explaining the ideas informally,nor does it shy away from the mathematics. The hope is that by addressing both aspects, readersof all levels will be able to gain a better understanding of

PCA

as well as the when, the how andthe why of applying this technique.

I. INTRODUCTION

Principal component analysis (

PCA

) has been calledone of the most valuable results from applied linear al-gebra.

PCA

is used abundantly in all forms of analysis -from neuroscience to computer graphics - because it is asimple, non-parametric method of extracting relevant in-formation from confusing data sets. With minimal addi-tional eﬀort

PCA

provides a roadmap for how to reducea complex data set to a lower dimension to reveal thesometimes hidden, simpliﬁed structure that often under-lie it.The goal of this tutorial is to provide both an intuitivefeel for

PCA

, and a thorough discussion of this topic.We will begin with a simple example and provide an intu-itive explanation of the goal of

PCA

. We will continue byadding mathematical rigor to place it within the frame-work of linear algebra to provide an explicit solution. Wewill see how and why

PCA

is intimately related to themathematical technique of singular value decomposition(

SVD

). This understanding will lead us to a prescriptionfor how to apply

PCA

in the real world. We will discussboth the assumptions behind this technique as well aspossible extensions to overcome these limitations.The discussion and explanations in this paper are infor-mal in the spirit of a tutorial. The goal of this paper is to

educate

. Occasionally, rigorous mathematical proofs arenecessary although relegated to the Appendix. Althoughnot as vital to the tutorial, the proofs are presented forthe adventurous reader who desires a more complete un-derstanding of the math. The only assumption is that thereader has a working knowledge of linear algebra. Pleasefeel free to contact me with any suggestions, correctionsor comments.

∗

Electronic address:

shlens@salk.edu

II. MOTIVATION: A TOY EXAMPLE

Here is the perspective: we are an experimenter. Weare trying to understand some phenomenon by measur-ing various quantities (e.g. spectra, voltages, velocities,etc.) in our system. Unfortunately, we can not ﬁgure outwhat is happening because the data appears clouded, un-clear and even redundant. This is not a trivial problem,but rather a fundamental obstacle in empirical science.Examples abound from complex systems such as neu-roscience, photometry, meteorology and oceanography -the number of variables to measure can be unwieldy andat times even

deceptive

, because the underlying relation-ships can often be quite simple.Take for example a simple toy problem from physicsdiagrammed in Figure 1. Pretend we are studying themotion of the physicist’s ideal spring. This system con-sists of a ball of mass

m

attached to a

massless, friction-less

spring. The ball is released a small distance awayfrom equilibrium (i.e. the spring is stretched). Becausethe spring is “ideal,” it oscillates indeﬁnitely along the

x

-axis about its equilibrium at a set frequency.This is a standard problem in physics in which the mo-tion along the

x

direction is solved by an explicit functionof time. In other words, the underlying dynamics can beexpressed as a function of a single variable

x

.However, being ignorant experimenters we do not knowany of this. We do not know which, let alone howmany, axes and dimensions are important to measure.Thus, we decide to measure the ball’s position in athree-dimensional space (since we live in a three dimen-sional world). Speciﬁcally, we place three movie camerasaround our system of interest. At 200 Hz each moviecamera records an image indicating a two dimensionalposition of the ball (a projection). Unfortunately, be-cause of our ignorance, we do not even know what arethe

real

“

x

”, “

y

” and “

z

” axes, so we choose three cam-era axes

{

a

,

b

,

c

}

at some arbitrary angles with respectto the system. The angles between our measurements

2

FIG. 1 A diagram of the toy example.

might not even be 90

o

! Now, we record with the camerasfor several minutes. The big question remains:

how dowe get from this data set to a simple equation of

x

?

We know a-priori that if we were smart experimenters,we would have just measured the position along the

x

-axis with one camera. But this is not what happens in thereal world. We often do not know which measurementsbest reﬂect the dynamics of our system in question. Fur-thermore, we sometimes record more dimensions than weactually need!Also, we have to deal with that pesky, real-world prob-lem of

noise

. In the toy example this means that weneed to deal with air, imperfect cameras or even frictionin a less-than-ideal spring. Noise contaminates our dataset only serving to obfuscate the dynamics further.

Thistoy example is the challenge experimenters face everyday.

We will refer to this example as we delve further into ab-stract concepts. Hopefully, by the end of this paper wewill have a good understanding of how to systematicallyextract

x

using principal component analysis.

III. FRAMEWORK: CHANGE OF BASIS

The goal of principal component analysis is to computethe most meaningful

basis

to re-express a noisy data set.The hope is that this new basis will ﬁlter out the noiseand reveal hidden structure. In the example of the spring,the explicit goal of

PCA

is to determine: “the dynamicsare along the

x

-axis.” In other words, the goal of

PCA

is to determine thatˆ

x

- the unit basis vector along the

x

-axis - is the important dimension. Determining thisfact allows an experimenter to discern which dynamicsare important, which are just redundant and which are just noise.

A. A Naive Basis

With a more precise deﬁnition of our goal, we needa more precise deﬁnition of our data as well. We treatevery time sample (or experimental trial) as an individualsample in our data set. At each time sample we recorda set of data consisting of multiple measurements (e.g.voltage, position, etc.). In our data set, at one pointin time, camera

A

records a corresponding ball position(

x

A

,y

A

). One sample or trial can then be expressed as a6 dimensional column vector

X

=

x

A

y

A

x

B

y

B

x

C

y

C

where each camera contributes a 2-dimensional projec-tion of the ball’s position to the entire vector

X

. If werecord the ball’s position for 10 minutes at 120 Hz, thenwe have recorded 10

×

60

×

120 = 72000 of these vectors.With this concrete example, let us recast this problemin abstract terms. Each sample

X

is an

m

-dimensionalvector, where

m

is the number of measurement types.Equivalently, every sample is a vector that lies in an

m

-dimensional

vector space

spanned by some orthonormalbasis. From linear algebra we know that all measurementvectors form a linear combination of this set of unit lengthbasis vectors. What is this orthonormal basis?This question is usually a tacit assumption often over-looked. Pretend we gathered our toy example data above,but only looked at camera

A

. What is an orthonormal ba-sis for (

x

A

,y

A

)? A naive choice would be

{

(1

,

0)

,

(0

,

1)

}

,but why select this basis over

{

(

√

22

,

√

22

)

,

(

−√

22

,

−√

22

)

}

orany other arbitrary rotation? The reason is that the

naive basis reﬂects the method we gathered the data.

Pre-tend we record the position (2

,

2). We did

not

record2

√

2 in the (

√

22

,

√

22

) direction and 0 in the perpindicu-lar direction. Rather, we recorded the position (2

,

2) onour camera meaning 2 units

up

and 2 units to the

left

in our camera window. Thus our naive basis reﬂects themethod we measured our data.How do we express this naive basis in linear algebra?In the two dimensional case,

{

(1

,

0)

,

(0

,

1)

}

can be recastas individual row vectors. A matrix constructed out of these row vectors is the 2

×

2 identity matrix

I

. We cangeneralize this to the

m

-dimensional case by constructingan

m

×

m

identity matrix

B

=

b

1

b

2

...

b

m

=

1 0

···

00 1

···

0............0 0

···

1

=

I

where each

row

is an orthornormal basis vector

b

i

with

m

components. We can consider our naive basis as theeﬀective starting point. All of our data has been recordedin this basis and thus it can be trivially expressed as alinear combination of

{

b

i

}

.

3

B. Change of Basis

With this rigor we may now state more precisely what

PCA

asks:

Is there another basis, which is a linear com-bination of the original basis, that best re-expresses our data set?

A close reader might have noticed the conspicuous ad-dition of the word

linear

. Indeed,

PCA

makes one strin-gent but powerful assumption:

linearity

. Linearity vastlysimpliﬁes the problem by (1) restricting the set of poten-tial bases, and (2) formalizing the implicit assumption of continuity in a data set.

1

With this assumption

PCA

is now limited to re-expressing the data as a

linear combination

of its ba-sis vectors. Let

X

be the original data set, where each

column

is a single sample (or moment in time) of our dataset (i.e.

X

). In the toy example

X

is an

m

×

n

matrixwhere

m

= 6 and

n

= 72000. Let

Y

be another

m

×

n

matrix related by a linear transformation

P

.

X

is theoriginal recorded data set and

Y

is a re-representation of that data set.

PX

=

Y

(1)Also let us deﬁne the following quantities.

2

•

p

i

are the

rows

of

P

•

x

i

are the

columns

of

X

(or individual

X

).

•

y

i

are the

columns

of

Y

.Equation 1 represents a

change of basis

and thus can havemany interpretations.1.

P

is a matrix that transforms

X

into

Y

.2. Geometrically,

P

is a rotation and a stretch whichagain transforms

X

into

Y

.3. The rows of

P

,

{

p

1

,...,

p

m

}

, are a set of new basisvectors for expressing the

columns

of

X

.The latter interpretation is not obvious but can be seenby writing out the explicit dot products of

PX

.

PX

=

p

1

...

p

m

x

1

···

x

n

Y

=

p

1

·

x

1

···

p

1

·

x

n

.........

p

m

·

x

1

···

p

m

·

x

n

1

A subtle point it is, but we have already assumed linearity byimplicitly stating that the data set even characterizes the dy-namics of the system. In other words, we are already relying onthe superposition principal of linearity to believe that the dataprovides an ability to interpolate between individual data points

2

In this section

x

i

and

y

i

are

column

vectors, but be forewarned.In all other sections

x

i

and

y

i

are

row

vectors.

We can note the form of each column of

Y

.

y

i

=

p

1

·

x

i

...

p

m

·

x

i

We recognize that each coeﬃcient of

y

i

is a dot-productof

x

i

with the corresponding row in

P

. In other words,the

j

th

coeﬃcient of

y

i

is a projection on to the

j

th

row of

P

. This is in fact the very form of an equation where

y

i

isa projection on to the basis of

{

p

1

,...,

p

m

}

. Therefore,the

rows

of

P

are indeed a new set of basis vectors forrepresenting of

columns

of

X

.

C. Questions Remaining

By assuming linearity the problem reduces to ﬁnd-ing the appropriate

change of basis

. The row vectors

{

p

1

,...,

p

m

}

in this transformation will become the

principal components

of

X

. Several questions now arise.

•

What is the best way to “re-express”

X

?

•

What is a good choice of basis

P

?

These questions must be answered by next asking our-selves what features we would like

Y

to exhibit

. Evi-dently, additional assumptions beyond

linearity

are re-quired to arrive at a reasonable result. The selection of these assumptions is the subject of the next section.

IV. VARIANCE AND THE GOAL

Now comes the most important question:

what does“best express” the data mean?

This section will build upan intuitive answer to this question and along the waytack on additional assumptions. The end of this sectionwill conclude with a mathematical goal for deciphering“garbled” data.In a linear system “garbled” can refer to only threepotential confounds:

noise, rotation

and

redundancy

. Letus deal with each situation individually.

A. Noise and Rotation

Measurement noise in any data set must be low or else,no matter the analysis technique, no information about asystem can be extracted. There exists no absolute scalefor noise but rather all noise is measured relative to themeasurement. A common measure is the

signal-to-noiseratio

(

SNR

), or a ratio of variances

σ

2

,

SNR

=

σ

2

signal

σ

2

noise

.

(2)A high

SNR

(

≫

1) indicates high precision data, while alow

SNR

indicates noise contaminated data.

You've already reviewed this. Edit your review.

1 hundred reads

nikunjmittal35 liked this

Alexander Waiganjo Macharia liked this

isaacsamuel2003 liked this

Gupta Rats liked this

mimajha liked this

arbiter007 liked this

syarinaosman liked this

shriram123 liked this

protolith liked this

- Read and print without ads
- Download to keep your version
- Edit, email or read offline

© Copyright 2015 Scribd Inc.

Language

Choose the language in which you want to experience Scribd:

Sign in with Facebook

Sorry, we are unable to log you in via Facebook at this time. Please try again later.

or

Password Reset Email Sent

Join with Facebook

Sorry, we are unable to log you in via Facebook at this time. Please try again later.

or

By joining, you agree to our

read free for two weeks

Unlimited access to more than

one million books

one million books

Personalized recommendations

based on books you love

based on books you love

Syncing across all your devices

Join with Facebook

or Join with emailSorry, we are unable to log you in via Facebook at this time. Please try again later.

Already a member? Sign in.

By joining, you agree to our

to download

Unlimited access to more than

one million books

one million books

Personalized recommendations

based on books you love

based on books you love

Syncing across all your devices

Continue with Facebook

Sign inJoin with emailSorry, we are unable to log you in via Facebook at this time. Please try again later.

By joining, you agree to our

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

CANCEL

OK

You've been reading!

NO, THANKS

OK

scribd