Multistat PV PCA

Multivariate Statistics
Statistics 2012: Module 5
1
Principal Components Analysis
• Consider a set of correlated variables x1 , . . . , xp
• Replace this original set of variables by a set of uncorre-
lated variables y 1 , . . . , y p
• The new variables are linear combinations of the original

variables
• The new variables are obtained in order of importance

• The new variables are called the principal components
2
PCA and variation
• PCA focuses on describing the variation in the
data set
• Associations result in variation of the variables
• Components that vary less contain less

information and thus are less important
3
PCA and dimension reduction
• If the original variables are highly correlated, then
one can hope that only a few PCs will account for
most of the variation in the original variables.
• The PC are often called indices measuring

different ’dimensions’ of the data.
4
A first example
We consider a dataset of 49 female sparrows
collected after a storm. The variables measured are
x1 Total length
x2 Alar length
x3 Length of beak and head
x4 Length of humerus
x5 Length of keel of sternum
5
Example: scatterplot matrix
−2 0 1 2 −2 0 1 2
1.5
0.0
−1.5
2
1
0
−2
2
1
−1 0
2
1
0
−2
2
1
0
−2
−1.5 0.0 1.5 −1 0 1 2 −2 0 1 2
6
Example: correlation matrix
x1 x2 x3 x4 x5
x1 1.00 0.73 0.66 0.63 0.61
x2 0.73 1.00 0.67 0.76 0.53
x3 0.66 0.67 1.00 0.72 0.53
x4 0.63 0.76 0.72 1.00 0.58
x5 0.61 0.53 0.53 0.58 1.00
7
Example: sparrows
The first PC is given by
y 1 = 0.45x1 + 0.47x2 + 0.45x3 + 0.46x4 + 0.40x5
This is the average of the measurements and can be

interpreted as an index of overall size
8
The first PC
• The first PC y 1 is a linear combination of x1 , . . . , xp .
Formally: y 1 = a11 x1 + a12 x2 + · · · + a1p xp
where the coefficients satisfy
a211 + a212 + · · · + a21p = 1
• The first PC maximizes its variance var(y 1 ) among all

linear combinations of x1 , . . . , xp that satisfy the above
condition.
9
The second PC
• The second PC y 2 is a linear combination of x1 , . . . , xp
y 2 = a21 x1 + a22 x2 + · · · + a2p xp
a221 + a222 + · · · + a22p = 1

cov(y 1 , y 2 ) =0
• The second PC maximizes var(y 2 ) among all linear

combinations of x1 , . . . , xp that satisfy the above con-
ditions. 10
PCs in general
• The k th PC y k is a linear combination of x1 , . . . , xp
y k = ak1 x1 + ak2 x2 + · · · + akp xp
a2k1 + a2k2 + · · · + a2kp = 1

cov(y j , y k ) = 0 for all j < k
• The k th PC maximizes var(y k ) among all linear combi-

nations of x1 , . . . , xp that satisfy the above conditions.
11
Some matrix results
Consider a square matrix Σ of size p.
• An eigenvalue-eigenvector pair of Σ consists of

a scalar λ and vector e = (e1 , . . . , ep )t such
that
Σe = λe
• The eigenvector e can be scaled such that
e21 + e22 + · · · + e2p = 1
12
Some matrix results
• A square matrix Σ of size p contains at most p
distinct eigenvalues
• A variance-covariance matrix has positive

eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λp > 0
• The eigenvectors e1 , . . . , ep are orthogonal to

each other
13
Finding PCs
The principal components are given by
y k = ek1 x1 + ek2 x2 + · · · + ekp xp
with var(y k )
= λk
where λk and ek = (ek1 , . . . , ekp )t is the k th
eigenvalue-eigenvector pair of the
variance-covariance matrix S
14
Example: Sparrows
y 1 = 0.45x1 + 0.47x2 + 0.45x3 + 0.46x4 + 0.40x5

y 2 = 0.07x1 − 0.31x2 − 0.29x3 − 0.23x4 + 0.87x5
y 3 = 0.73x1 + 0.27x2 − 0.35x3 − 0.48x4 − 0.20x5
y 4 = −0.23x1 + 0.48x2 − 0.73x3 + 0.42x4 + 0.05x5
y 5 = 0.44x1 − 0.62x2 − 0.23x3 + 0.57x4 − 0.18x5
var(y 1 ) = 3.58, var(y 2 ) = 0.54,

var(y 3 ) = 0.38, var(y 4 ) = 0.33, var(y 5 ) = 0.18
15
Total variance
• The total variance is the sum of the variances of
the components
• The sum of the eigenvalues equals the sum of

∑ p
the diagonal elements ∑p
of the matrix ∑
p
Total variance = var(xj ) = λj = var(y j )
j=1 j=1 j=1
−→ PCs and original variables account for the same

total variance
16
Example: Sparrows
∑
p
Total variance = var(xj )
j=1
= 5.00
∑
p
= var(y j )
j=1
17
Proportion of the variance explained
• The proportion of the total variance that the k th
PC accounts for is
λk λk
pk = = ∑p
total variance j=1 λj
• The proportion of the total variance accounted

for by the first m PCs is
∑m
j=1 λj
Pm = p1 + · · · + pm = ∑p
j=1 λj
18
Example: Sparrows
3.58
p1 = 5 = 0.715 P1 = 0.715 (71.5%)
0.54
p2 = 5 = 0.107 P2 = 0.822 (82.2%)
0.38
p3 = 5 = 0.076 P3 = 0.898 (89.8%)
0.33
p4 = 5 = 0.066 P4 = 0.964 (96.4%)
0.18
p5 = 5 = 0.036 P5 = 1.00 (100%)
19
Number of PCs
• Trade-off between parsimony (low dimension)
and retaining enough relevant information
• The choice is usually based on ad hoc criteria
20
Number of PCs: criteria
• Retain a large percentage of the total variance
(between 70% and 90%)
• Keep the PCs whose variance (eigenvalue)

exceeds the average variance (eigenvalue)
• Use a graphical procedure: Screeplot
21
Screeplot
• Plot the eigenvalues vs the PC number
• This is a decreasing curve
• Usually, there is an elbow point where the large
eigenvalues cease and the small eigenvalues
begin
• Retain the number of components corresponding

to the elbow point.
22
Example: Sparrows
• First PC already represents 71.52%, first two
PCs 82.24% of total variance
• Average variance= 1. Only the variance of first

PC exceeds this average
• Screeplot: Elbow at first PC
−→ Retain only 1 PC
23
Example: screeplot
Sparrows: screeplot
3.5
3.0
2.5
2.0
Variances
1.5
1.0
0.5
0.0
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
24
Standardizing the variables
• Usually, the variables are centered: xj := xj − x̄j
→ The PCs are also centered at zero
• Often the variables are standardized
xj − x̄j
zj = √ j = 1, . . . , p
var(xj )
→ The PCs are obtained from the correlation matrix R
25
Example: sparrows
• Previous results were based on standardized
variables
• var(x1 ) = 13.35, var(x2 ) = 25.68,

var(x3 ) = 0.63, var(x4 ) = 0.32,
var(x1 ) = 0.98
−→ A PCA based on original variables would be

dominated by x1 and x2
26
Standardize?
• No: Variables with largest variance dominate first
PCs
→ Meaningfull if measurement scales are

relevant
• Yes: All components are treated equally

→ Advisable if measurement scales are arbitrary
27
Example: Air pollution
Measurements of 6 variables at 41 U.S. cities
Temp Average annual temperature (◦ F )

Manuf number of enterprises with at least 20 workers
Pop Population size 1970 (thousands)
Wind Average annual wind speed (miles/hour)
Precip Average annual precipitation (inches)
Days Average number of days with precipitation per
year
28
Example: scatterplot matrix
0 2 4 −2 0 2 −3 −1 1
1
−1
4
2
0
3
1
−1
2
0
−2
2
0
−2
1
−1
−3
−1 1 −1 1 3 −2 0 2
29
Example: correlation matrix
Temp Manuf Pop Wind Precip Days
Temp 1.00 -0.19 -0.06 -0.35 0.39 -0.43
Manuf -0.19 1.00 0.96 0.24 -0.03 0.13
Pop -0.06 0.96 1.00 0.21 -0.03 0.04
Wind -0.35 0.24 0.21 1.00 -0.01 0.16
Precip 0.39 -0.03 -0.03 -0.01 1.00 0.50
Days -0.43 0.13 0.04 0.16 0.50 1.00
30
Example: Air pollution
Importance of components:
Comp1 Comp2 Comp3 Comp4 Comp5 Comp6
Standard deviation 1.48 1.22 1.18 0.87 0.34 0.19
Proportion of Variance 0.37 0.25 0.23 0.13 0.02 0.01
Cumulative Proportion 0.37 0.62 0.85 0.98 0.99 1.00
Loadings:
Comp1 Comp2 Comp3 Comp4 Comp5 Comp6
Temp 0.33 -0.13 0.67 -0.31 -0.56 -0.14
Manuf -0.61 -0.17 0.27 0.14 0.10 -0.70
Pop -0.58 -0.22 0.35 0.69
Wind -0.35 0.13 -0.30 -0.87 -0.11
Precip 0.62 0.50 -0.17 0.57
Days -0.24 0.71 0.31 -0.58
31
Example: Number of PCs
• Three PCs exceed average eigenvalue (= 1)
• Three PCs explain 85% of total variance
• Screeplot?
32
Example: screeplot
Screeplot
2.0
1.5
Variances
1.0
0.5
0.0
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
33
Example: Interpretation of PCs
Requires examining and interpreting the loadings
• PC1: Quality of environment?
• PC2: Wet weather?
• PC3: Climate type?
−→ Subjective: do not overstate claims!
34
Displaying multivariate data: PCA
• Often the main goal of PCA is creating a useful
graphical representation of the data
→ First PCs contain most of the information

→ Less dimensions are needed
→ Correlations are removed
• (Scaled) Euclidean distances between PC
scores approximate distances between original
observations
35
Example: Scatterplot of PC2 vs PC1
2
Bffl
Sttl
Miam
Chrl
Clvl Prvd
Ptts Hrtf NwOr
Clmb Nrfl Jcks
Albn
Atln
Lsvl
Cncn Nshv
Wlmn
M−SP Indn MmphRchmLttR
Mlwk
Bltm Wshn
0
Hstn DsMn
Dtrt KnsC
St.LOmah
Phld Wcht
Dlls
Chcg SlLC
PC2
−2
Dnvr
SnFr
Albr
−4
Phnx
−6
−6 −4 −2 0 2
PC1
36
Miam
Chcg
2
Hstn NwOr
Jcks
Phld
Mmph
DllsAtln Phnx
Nshv LttR
Rchm
Bltm WshnLsvl
Cncn
St.L Nrfl Chrl
0
Dtrt SnFr
IndnKnsC
Clmb Wlmn
Sttl Hrtf
Clvl Ptts
Prvd
Wcht
Omah
DnvrAlbn Albr
Mlwk
M−SP DsMn SlLC
PC3
Bffl
−2
−4
−6
−6 −4 −2 0 2
PC1
37
Miam
Chcg
2
Hstn NwOr
Jcks
Phld
Mmph
1
Dlls Atln
Nshv
LttR
Rchm
Bltm
Wshn Lsvl
Cncn
St.L
Nrfl Chrl
0
SnFr Dtrt
KnsC Indn
Clmb
Wlmn
PC3
Hrtf Sttl
Clvl
Ptts
Prvd
−1
Wcht
Albr Dnvr Omah
Albn
SlLC DsMn
Mlwk
M−SP Bffl
−2
−3
−3 −2 −1 0 1 2
PC2
38
Biplots
• Plot of first PCs
• Original variables are represented as arrows
• Arrows according to loadings of first PCs
39
Example: Biplot
−2.0 −1.5 −1.0 −0.5 0.0 0.5
Days
Bffl Precip
Sttl
Miam
0.5
1
Chrl
Clvl Prvd
Ptts Hrtf NwOr
Nrfl Jcks
Albn
Clmb Atln
LsvlNshv
Cncn Wlmn
Wind Indn MmphRchmLttR
M−SP
Mlwk
0.0
Bltm DsMn
0
HstnWshn
Dtrt KnsC Temp
Manuf St.L
Omah
PhldPop Wcht
−0.5
−1
Dlls
Comp.2
Chcg SlLC
Dnvr
−1.0
SnFr
−2
Albr
−1.5
−3
Phnx
−2.0
−4
−4 −3 −2 −1 0 1
Comp.1
40
Geometry of PCA
• PCs are a translation and rotation of the
coordinate axes
• New axes are the principal axes of the ellipsoids

corresponding to the variance-covariance matrix
S or the correlation matrix R
41
Geometry of PCA
y 2= e 2t x x2
y1= e1t x
f(x1,x 2) = c2
x1
42

Multistat PV PCA

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multistat PV PCA

Uploaded by

Copyright:

Available Formats

Multivariate Statistics

Statistics 2012: Module 5

• The new variables are linear combinations of the original

• The new variables are obtained in order of importance

• Associations result in variation of the variables

• Components that vary less contain less

• The PC are often called indices measuring

x2 0.73 1.00 0.67 0.76 0.53

x3 0.66 0.67 1.00 0.72 0.53

x4 0.63 0.76 0.72 1.00 0.58

x5 0.61 0.53 0.53 0.58 1.00

y 1 = 0.45x1 + 0.47x2 + 0.45x3 + 0.46x4 + 0.40x5

This is the average of the measurements and can be

Formally: y 1 = a11 x1 + a12 x2 + · · · + a1p xp

where the coefficients satisfy

a211 + a212 + · · · + a21p = 1

• The first PC maximizes its variance var(y 1 ) among all

y 2 = a21 x1 + a22 x2 + · · · + a2p xp

where the coefficients satisfy

a221 + a222 + · · · + a22p = 1

• The second PC maximizes var(y 2 ) among all linear

y k = ak1 x1 + ak2 x2 + · · · + akp xp

where the coefficients satisfy

a2k1 + a2k2 + · · · + a2kp = 1

• The k th PC maximizes var(y k ) among all linear combi-

• An eigenvalue-eigenvector pair of Σ consists of

• The eigenvector e can be scaled such that

e21 + e22 + · · · + e2p = 1

• A variance-covariance matrix has positive

• The eigenvectors e1 , . . . , ep are orthogonal to

y k = ek1 x1 + ek2 x2 + · · · + ekp xp

y 1 = 0.45x1 + 0.47x2 + 0.45x3 + 0.46x4 + 0.40x5

var(y 1 ) = 3.58, var(y 2 ) = 0.54,

• The sum of the eigenvalues equals the sum of

−→ PCs and original variables account for the same

• The proportion of the total variance accounted

• The choice is usually based on ad hoc criteria

• Keep the PCs whose variance (eigenvalue)

• Use a graphical procedure: Screeplot

• Retain the number of components corresponding

• Average variance= 1. Only the variance of first

• Screeplot: Elbow at first PC

Comp.1 Comp.2 Comp.3 Comp.4 Comp.5

→ The PCs are obtained from the correlation matrix R

• var(x1 ) = 13.35, var(x2 ) = 25.68,

−→ A PCA based on original variables would be

→ Meaningfull if measurement scales are

• Yes: All components are treated equally

Temp Average annual temperature (◦ F )

Temp 1.00 -0.19 -0.06 -0.35 0.39 -0.43

Manuf -0.19 1.00 0.96 0.24 -0.03 0.13

Pop -0.06 0.96 1.00 0.21 -0.03 0.04

Wind -0.35 0.24 0.21 1.00 -0.01 0.16

Precip 0.39 -0.03 -0.03 -0.01 1.00 0.50

Days -0.43 0.13 0.04 0.16 0.50 1.00

Standard deviation 1.48 1.22 1.18 0.87 0.34 0.19

Proportion of Variance 0.37 0.25 0.23 0.13 0.02 0.01

Cumulative Proportion 0.37 0.62 0.85 0.98 0.99 1.00

Temp 0.33 -0.13 0.67 -0.31 -0.56 -0.14

Manuf -0.61 -0.17 0.27 0.14 0.10 -0.70

Pop -0.58 -0.22 0.35 0.69

Wind -0.35 0.13 -0.30 -0.87 -0.11

Precip 0.62 0.50 -0.17 0.57

Days -0.24 0.71 0.31 -0.58