You are on page 1of 42

Multivariate Statistics

Statistics 2012: Module 5

1
Principal Components Analysis
• Consider a set of correlated variables x1 , . . . , xp
• Replace this original set of variables by a set of uncorre-
lated variables y 1 , . . . , y p

• The new variables are linear combinations of the original


variables

• The new variables are obtained in order of importance


• The new variables are called the principal components

2
PCA and variation
• PCA focuses on describing the variation in the
data set

• Associations result in variation of the variables

• Components that vary less contain less


information and thus are less important

3
PCA and dimension reduction
• If the original variables are highly correlated, then
one can hope that only a few PCs will account for
most of the variation in the original variables.

• The PC are often called indices measuring


different ’dimensions’ of the data.

4
A first example
We consider a dataset of 49 female sparrows
collected after a storm. The variables measured are

x1 Total length
x2 Alar length
x3 Length of beak and head
x4 Length of humerus
x5 Length of keel of sternum

5
Example: scatterplot matrix
−2 0 1 2 −2 0 1 2

1.5
0.0
−1.5
2
1
0
−2

2
1
−1 0
2
1
0
−2

2
1
0
−2
−1.5 0.0 1.5 −1 0 1 2 −2 0 1 2

6
Example: correlation matrix
x1 x2 x3 x4 x5
x1 1.00 0.73 0.66 0.63 0.61

x2 0.73 1.00 0.67 0.76 0.53

x3 0.66 0.67 1.00 0.72 0.53

x4 0.63 0.76 0.72 1.00 0.58

x5 0.61 0.53 0.53 0.58 1.00

7
Example: sparrows
The first PC is given by

y 1 = 0.45x1 + 0.47x2 + 0.45x3 + 0.46x4 + 0.40x5

This is the average of the measurements and can be


interpreted as an index of overall size

8
The first PC
• The first PC y 1 is a linear combination of x1 , . . . , xp .

Formally: y 1 = a11 x1 + a12 x2 + · · · + a1p xp

where the coefficients satisfy

a211 + a212 + · · · + a21p = 1

• The first PC maximizes its variance var(y 1 ) among all


linear combinations of x1 , . . . , xp that satisfy the above
condition.

9
The second PC
• The second PC y 2 is a linear combination of x1 , . . . , xp

y 2 = a21 x1 + a22 x2 + · · · + a2p xp

where the coefficients satisfy

a221 + a222 + · · · + a22p = 1


cov(y 1 , y 2 ) =0

• The second PC maximizes var(y 2 ) among all linear


combinations of x1 , . . . , xp that satisfy the above con-
ditions. 10
PCs in general
• The k th PC y k is a linear combination of x1 , . . . , xp

y k = ak1 x1 + ak2 x2 + · · · + akp xp

where the coefficients satisfy

a2k1 + a2k2 + · · · + a2kp = 1


cov(y j , y k ) = 0 for all j < k

• The k th PC maximizes var(y k ) among all linear combi-


nations of x1 , . . . , xp that satisfy the above conditions.
11
Some matrix results
Consider a square matrix Σ of size p.

• An eigenvalue-eigenvector pair of Σ consists of


a scalar λ and vector e = (e1 , . . . , ep )t such
that

Σe = λe

• The eigenvector e can be scaled such that

e21 + e22 + · · · + e2p = 1

12
Some matrix results
• A square matrix Σ of size p contains at most p
distinct eigenvalues

• A variance-covariance matrix has positive


eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λp > 0

• The eigenvectors e1 , . . . , ep are orthogonal to


each other

13
Finding PCs
The principal components are given by

y k = ek1 x1 + ek2 x2 + · · · + ekp xp

with var(y k )
= λk
where λk and ek = (ek1 , . . . , ekp )t is the k th
eigenvalue-eigenvector pair of the
variance-covariance matrix S

14
Example: Sparrows

y 1 = 0.45x1 + 0.47x2 + 0.45x3 + 0.46x4 + 0.40x5


y 2 = 0.07x1 − 0.31x2 − 0.29x3 − 0.23x4 + 0.87x5
y 3 = 0.73x1 + 0.27x2 − 0.35x3 − 0.48x4 − 0.20x5
y 4 = −0.23x1 + 0.48x2 − 0.73x3 + 0.42x4 + 0.05x5
y 5 = 0.44x1 − 0.62x2 − 0.23x3 + 0.57x4 − 0.18x5

var(y 1 ) = 3.58, var(y 2 ) = 0.54,


var(y 3 ) = 0.38, var(y 4 ) = 0.33, var(y 5 ) = 0.18

15
Total variance
• The total variance is the sum of the variances of
the components

• The sum of the eigenvalues equals the sum of


∑ p
the diagonal elements ∑p
of the matrix ∑
p
Total variance = var(xj ) = λj = var(y j )
j=1 j=1 j=1

−→ PCs and original variables account for the same


total variance

16
Example: Sparrows


p
Total variance = var(xj )
j=1

= 5.00

p
= var(y j )
j=1

17
Proportion of the variance explained
• The proportion of the total variance that the k th
PC accounts for is
λk λk
pk = = ∑p
total variance j=1 λj

• The proportion of the total variance accounted


for by the first m PCs is
∑m
j=1 λj
Pm = p1 + · · · + pm = ∑p
j=1 λj

18
Example: Sparrows
3.58
p1 = 5 = 0.715 P1 = 0.715 (71.5%)
0.54
p2 = 5 = 0.107 P2 = 0.822 (82.2%)
0.38
p3 = 5 = 0.076 P3 = 0.898 (89.8%)
0.33
p4 = 5 = 0.066 P4 = 0.964 (96.4%)
0.18
p5 = 5 = 0.036 P5 = 1.00 (100%)

19
Number of PCs
• Trade-off between parsimony (low dimension)
and retaining enough relevant information

• The choice is usually based on ad hoc criteria

20
Number of PCs: criteria
• Retain a large percentage of the total variance
(between 70% and 90%)

• Keep the PCs whose variance (eigenvalue)


exceeds the average variance (eigenvalue)

• Use a graphical procedure: Screeplot

21
Screeplot
• Plot the eigenvalues vs the PC number
• This is a decreasing curve
• Usually, there is an elbow point where the large
eigenvalues cease and the small eigenvalues
begin

• Retain the number of components corresponding


to the elbow point.

22
Example: Sparrows
• First PC already represents 71.52%, first two
PCs 82.24% of total variance

• Average variance= 1. Only the variance of first


PC exceeds this average

• Screeplot: Elbow at first PC

−→ Retain only 1 PC

23
Example: screeplot
Sparrows: screeplot

3.5
3.0
2.5
2.0
Variances

1.5
1.0
0.5
0.0

Comp.1 Comp.2 Comp.3 Comp.4 Comp.5

24
Standardizing the variables
• Usually, the variables are centered: xj := xj − x̄j
→ The PCs are also centered at zero
• Often the variables are standardized
xj − x̄j
zj = √ j = 1, . . . , p
var(xj )

→ The PCs are obtained from the correlation matrix R

25
Example: sparrows
• Previous results were based on standardized
variables

• var(x1 ) = 13.35, var(x2 ) = 25.68,


var(x3 ) = 0.63, var(x4 ) = 0.32,
var(x1 ) = 0.98

−→ A PCA based on original variables would be


dominated by x1 and x2

26
Standardize?
• No: Variables with largest variance dominate first
PCs

→ Meaningfull if measurement scales are


relevant

• Yes: All components are treated equally


→ Advisable if measurement scales are arbitrary

27
Example: Air pollution
Measurements of 6 variables at 41 U.S. cities

Temp Average annual temperature (◦ F )


Manuf number of enterprises with at least 20 workers
Pop Population size 1970 (thousands)
Wind Average annual wind speed (miles/hour)
Precip Average annual precipitation (inches)
Days Average number of days with precipitation per
year

28
Example: scatterplot matrix
0 2 4 −2 0 2 −3 −1 1

1
−1
4
2
0

3
1
−1
2
0
−2

2
0
−2
1
−1
−3

−1 1 −1 1 3 −2 0 2

29
Example: correlation matrix
Temp Manuf Pop Wind Precip Days

Temp 1.00 -0.19 -0.06 -0.35 0.39 -0.43

Manuf -0.19 1.00 0.96 0.24 -0.03 0.13

Pop -0.06 0.96 1.00 0.21 -0.03 0.04

Wind -0.35 0.24 0.21 1.00 -0.01 0.16

Precip 0.39 -0.03 -0.03 -0.01 1.00 0.50

Days -0.43 0.13 0.04 0.16 0.50 1.00

30
Example: Air pollution
Importance of components:
Comp1 Comp2 Comp3 Comp4 Comp5 Comp6

Standard deviation 1.48 1.22 1.18 0.87 0.34 0.19

Proportion of Variance 0.37 0.25 0.23 0.13 0.02 0.01

Cumulative Proportion 0.37 0.62 0.85 0.98 0.99 1.00

Loadings:
Comp1 Comp2 Comp3 Comp4 Comp5 Comp6

Temp 0.33 -0.13 0.67 -0.31 -0.56 -0.14

Manuf -0.61 -0.17 0.27 0.14 0.10 -0.70

Pop -0.58 -0.22 0.35 0.69

Wind -0.35 0.13 -0.30 -0.87 -0.11

Precip 0.62 0.50 -0.17 0.57

Days -0.24 0.71 0.31 -0.58

31
Example: Number of PCs
• Three PCs exceed average eigenvalue (= 1)

• Three PCs explain 85% of total variance

• Screeplot?

32
Example: screeplot
Screeplot

2.0
1.5
Variances

1.0
0.5
0.0

Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6

33
Example: Interpretation of PCs
Requires examining and interpreting the loadings

• PC1: Quality of environment?

• PC2: Wet weather?

• PC3: Climate type?

−→ Subjective: do not overstate claims!

34
Displaying multivariate data: PCA
• Often the main goal of PCA is creating a useful
graphical representation of the data

→ First PCs contain most of the information


→ Less dimensions are needed
→ Correlations are removed
• (Scaled) Euclidean distances between PC
scores approximate distances between original
observations

35
Example: Scatterplot of PC2 vs PC1

2
Bffl
Sttl
Miam
Chrl
Clvl Prvd
Ptts Hrtf NwOr
Clmb Nrfl Jcks
Albn
Atln
Lsvl
Cncn Nshv
Wlmn
M−SP Indn MmphRchmLttR
Mlwk
Bltm Wshn
0

Hstn DsMn
Dtrt KnsC
St.LOmah
Phld Wcht

Dlls
Chcg SlLC
PC2

−2

Dnvr
SnFr

Albr
−4

Phnx
−6

−6 −4 −2 0 2

PC1

36
Example: Scatterplot of PC3 vs PC1

Miam
Chcg

2
Hstn NwOr
Jcks
Phld
Mmph
DllsAtln Phnx
Nshv LttR
Rchm
Bltm WshnLsvl
Cncn
St.L Nrfl Chrl
0

Dtrt SnFr
IndnKnsC
Clmb Wlmn
Sttl Hrtf
Clvl Ptts
Prvd
Wcht
Omah
DnvrAlbn Albr
Mlwk
M−SP DsMn SlLC
PC3

Bffl
−2
−4
−6

−6 −4 −2 0 2

PC1

37
Example: Scatterplot of PC3 vs PC2

Miam
Chcg

2
Hstn NwOr
Jcks

Phld
Mmph
1
Dlls Atln
Nshv
LttR
Rchm
Bltm
Wshn Lsvl
Cncn
St.L
Nrfl Chrl
0

SnFr Dtrt
KnsC Indn
Clmb
Wlmn
PC3

Hrtf Sttl
Clvl
Ptts
Prvd
−1

Wcht
Albr Dnvr Omah
Albn
SlLC DsMn
Mlwk
M−SP Bffl
−2
−3

−3 −2 −1 0 1 2

PC2

38
Biplots
• Plot of first PCs

• Original variables are represented as arrows

• Arrows according to loadings of first PCs

39
Example: Biplot
−2.0 −1.5 −1.0 −0.5 0.0 0.5

Days
Bffl Precip
Sttl
Miam

0.5
1
Chrl
Clvl Prvd
Ptts Hrtf NwOr
Nrfl Jcks
Albn
Clmb Atln
LsvlNshv
Cncn Wlmn
Wind Indn MmphRchmLttR
M−SP
Mlwk

0.0
Bltm DsMn
0
HstnWshn
Dtrt KnsC Temp
Manuf St.L
Omah
PhldPop Wcht

−0.5
−1

Dlls
Comp.2

Chcg SlLC

Dnvr

−1.0
SnFr
−2

Albr

−1.5
−3

Phnx

−2.0
−4

−4 −3 −2 −1 0 1

Comp.1

40
Geometry of PCA
• PCs are a translation and rotation of the
coordinate axes

• New axes are the principal axes of the ellipsoids


corresponding to the variance-covariance matrix
S or the correlation matrix R

41
Geometry of PCA

y 2= e 2t x x2
y1= e1t x

f(x1,x 2) = c2

x1

42

You might also like