Professional Documents
Culture Documents
Steffen Unkel
Department of Medical Statistics
University Medical Center Göttingen
1/70
Outline
1 Principles of PCA
2 PCA biplots
3 Sparse PCA
2/70
1 Principles of PCA
3/70
Setting the scene
4/70
Principal components and dimensionality reduction
I The hope is that the first few PCs will account for a
substantial proportion of the variation in the original
variables x1 , x2 , . . . , xp .
5/70
The Olympic heptathlon data
library(HSAUR3)
data(heptathlon)
6/70
Score all seven events in the same direction
heptathlon[c(14,25),]
heptathlon[c(14,25),]
7/70
Scatterplot matrix
score <- which(colnames(heptathlon) == "score")
plot(heptathlon[, -score])
1.50 1.70 0 1 2 3 4 36 40 44
hurdles
2
0
1.80
highjump
1.50
14
shot
10
4
run200m
2
0
5.0 6.5
longjump
44
javelin
36
0 20
run800m
8/70
Correlation matrix
round(cor(heptathlon[,-score]), 2)
9/70
Removing the outlier
10/70
Finding the sample principal components
11/70
Eigendecomposition of the sample covariance matrix
i=1
12/70
PCA via the eigendecomposition
I The total variance of the r PCs will equal the total variance
of the original variables so that rj=1 λj = tr(S).
P
13/70
Singular value decomposition of the data matrix
I The sample PCs can also be found using the singular value
decomposition (SVD) of X.
j=1
14/70
PCA via the SVD
15/70
PCA via the SVD
of the total variance in the data and the sum in the SVD of
X is therefore truncated after the first k terms.
16/70
Finding the sample principal components in R
17/70
Correlations and covariances of variables and
components
I The covariance of variable i with component j is given by
Cov(xi , yj ) = λj aji .
18/70
PCA using the function princomp()
19/70
Coefficients
The coefficients (also called loadings) for the first PC are
obtained as
a1 <- heptathlon_pca$loadings[,1]
a1
a1%*%a1
## [,1]
## [1,] 1
a2 <- heptathlon_pca$loadings[,2]
a1%*%a2
## [,1]
## [1,] 2.22e-16
## [,1]
## [1,] 4.324
21/70
The variance explained by the principal components
I The total variance of the p PCs will equal the total variance
of the original variables so that
p
X
λj = s12 + s22 + · · · + sp2 ,
j=1
λj
Pp
j=1 λj
and the first k PCs account for a proportion
Pk
j=1 λj
Pp .
j=1 λj
22/70
The summary() function
summary(heptathlon_pca)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## Standard deviation 2.0793 0.9482 0.9109 0.68320 0.54619 0.33745
## Proportion of Variance 0.6177 0.1284 0.1185 0.06668 0.04262 0.01627
## Cumulative Proportion 0.6177 0.7461 0.8646 0.93131 0.97392 0.99019
## Comp.7
## Standard deviation 0.262042
## Proportion of Variance 0.009809
## Cumulative Proportion 1.000000
23/70
Criteria for choosing the number of components
24/70
Scree plot
plot(heptathlon_pca$sdev^2, xlab="Component number",
ylab="Component variance", type="l")
4
Component variance
3
2
1
0
1 2 3 4 5 6 7
Component number
25/70
Principal component scores
heptathlon_pca$scores[,1]
or
predict(heptathlon_pca)[,1]
26/70
The uncorrelatedness of the PC scores
t(heptathlon_pca$scores)%*%heptathlon_pca$scores/(24)
27/70
The scores assigned to the athletes and the 1st PC
cor(heptathlon$score, heptathlon_pca$scores[,1])
## [1] -0.9931
plot(heptathlon$score, heptathlon_pca$scores[,1])
4
2
heptathlon_pca$scores[, 1]
0
−2
−4
heptathlon$score 28/70
The USArrests data
29/70
The USArrests data
head(USArrests)
30/70
Examining the USArrests data
apply(USArrests, 2, mean)
apply(USArrests, 2, var)
31/70
PCA on a given data matrix
32/70
PCA using the function prcomp()
33/70
The output of prcomp()
names(pr.out)
pr.out
## Standard deviations:
## [1] 1.5749 0.9949 0.5971 0.4164
##
## Rotation:
## PC1 PC2 PC3 PC4
## Murder -0.5359 0.4182 -0.3412 0.64923
## Assault -0.5832 0.1880 -0.2681 -0.74341
## UrbanPop -0.2782 -0.8728 -0.3780 0.13388
## Rape -0.5434 -0.1673 0.8178 0.08902
34/70
Principal component scores
dim(pr.out$x)
## [1] 50 4
35/70
Proportion of variance explained by the components
summary(pr.out)
## Importance of components:
## PC1 PC2 PC3 PC4
## Standard deviation 1.57 0.995 0.5971 0.4164
## Proportion of Variance 0.62 0.247 0.0891 0.0434
## Cumulative Proportion 0.62 0.868 0.9566 1.0000
pr.out$sdev
36/70
Plot of the proportion of variance explained
plot(pve, xlab="Principal Component",
ylab="Proportion of Variance Explained",
ylim=c(0,1),type='b')
1.0
0.8
Proportion of Variance Explained
0.6
0.4
0.2
0.0
Principal Component
37/70
Plot of the cumulative proportion of variance explained
plot(cumsum(pve), xlab="Principal Component",
ylab="Cumulative Proportion of Variance Explained",
ylim=c(0,1),type='b')
1.0
Cumulative Proportion of Variance Explained
0.8
0.6
0.4
0.2
0.0
Principal Component
38/70
2 PCA biplots
39/70
Motivation
40/70
Example of the traditional form of a PCA biplot
−5
−4
−3
−2
r
−1 c
q p
j
u SPRk m g
v PLF 0 i h PC 1
−6 −4
t −2 0 2 4
n
RGF d
SLF f
s 1 e
w
ba
2
PC 2
3
Mississippi
North Carolina
2
South Carolina
0.5
Murder West Virginia
Vermont
Georgia
Alaska Alabama Arkansas
1
Louisiana
Tennessee Kentucky
South Dakota
Assault Montana North Dakota
Maryland Wyoming Maine
Virginia Idaho
PC2
FloridaNew Mexico
0.0
New Hampshire
0
Colorado Connecticut
California NewMassachusetts
Jersey
Utah
Rhode Island
−0.5
Hawaii
−2
−3
UrbanPop
−3 −2 −1 0 1 2 3
PC1
42/70
The effect of scaling the variables
pr.noscale <- prcomp(USArrests, scale=FALSE)
par(mfrow = c(1,2))
biplot(pr.out, scale=0); biplot(pr.noscale, scale=0)
1.0
3
UrbanPop
150
Mississippi
North Carolina
2
South Carolina
0.5
100
Murder West Virginia
Vermont
0.5
Georgia
AlaskaAlabamaArkansas Kentucky
1
Louisiana
Tennessee South Dakota
50
Assault Montana North Dakota
Maryland Wyoming IdahoMaine Rape
Virginia
PC2
PC2
New Mexico Hawaii New Jersey
0.0
Florida New Hampshire Massachusetts
Rhode Island California
0
0.0
Texas Delaware Oklahoma
Kansas
Indiana
Iowa Nebraska Virginia NewMaryland
Mexico
0
Rape Oregon
Pennsylvania New Hampshire Wyoming
Idaho Murder
Kentucky
Montana Louisiana
Georgia
Tennessee Assault
Illinois
Arizona Ohio Wisconsin
Minnesota Maine
North West
Dakota
South Dakota ArkansasAlabama
Nevada New York Alaska
Colorado Washington Vermont Virginia South Carolina
Mississippi
−1
Hawaii
−0.5
−2
−100
−3
UrbanPop
PC1 PC1
43/70
Calibrated axes
I The biplot axes are used in precisely the same way as the
Cartesian axes they approximate.
44/70
PCA biplot with calibrated axes
RGF
SLF
6
0
4
0.1
r
5
c 0
q
3 p
j
k m 2
u g
v
t i h
4
2
n d
6 0.2
4 e f
w s
8 1
SPR ba
0.3 3
−1
−2
PLF
library(BiplotGUI)
Biplots(USArrests)
46/70
Application to Quality control data
I Throughout the period of a calendar month, a
manufacturing company is monitoring 15 different variables
in a production process.
47/70
PCA biplot of the (scaled) quality monitoring data
A4 (0.45) C6 (0.62) C8 (0.85) C7 (0.82) D4 (0.48)
32.5 66
1.8
A3 (0.8) 18
5.6
1.2 Jul00
32
64
1.7
1.15
5.4
2.5
16 31.5
Mar01
1.1
62
20
1.6 5.2 3 14.2
1.05 31
1 60
14
3.5
● 30.5 5
14.3
Apr00 Target 1.5 Aug00
20.5
79 49
0.95
22 Jun00 D7 (0.72)
A5 (0.89)
B5 26 27 21.5 28
21 29 30
45 20.5
79.2 Feb01 D6 (0.74)
4
58
44
30
50
43 0.9
4.8
14.4
1.4 Dec00 A2 (0.03)
12
May00 21 Jan01
A1 (0.2) Sep00
Feb00 0.85
4.5
29.5
Mar00 Jan00
56 Nov00
Oct00
4.6
14.5
0.8
C5 (0.47) C4 (0.53)
E5 (0.49)
32.5 66
A3 1.8 18
5.6
1.2 Jul00
32
64
1.7
1.15
5.4
2.5
16 31.5
Mar01
1.1
62
20
1.6 5.2 3 14.2
1.05 31
1 60
14
3.5
30.5 5
Apr00 14.3
Target 1.5 Aug00
20.5
79 49
22 Jun00
0.95 D7
A5 B5
A5 26 27 21.5 28
21 29 30
45 20.5
4
79.2 Feb01 D6
44 58
50 30
43 0.9
14.4 4.8
1.4 Dec00 A2
12
May00 Jan01
A1 21 Sep00
Feb00 0.85
4.5
Jan00 29.5
Mar00 Nov00
56
Oct00
14.5 4.6
0.8
C5 E5 C4
50/70
3 Sparse PCA
51/70
Motivation
52/70
Jeffers’ pitprops data
library(elasticnet)
data(pitprops)
dim(pitprops)
## [1] 13 13
53/70
The variables in Jeffers’ pitprops data
54/70
PCA of pitprops data
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## Standard deviation 2.0539 1.5421 1.3705 1.05328 0.9540 0.90300
## Proportion of Variance 0.3245 0.1829 0.1445 0.08534 0.0700 0.06272
## Cumulative Proportion 0.3245 0.5074 0.6519 0.73726 0.8073 0.86999
## Comp.7 Comp.8 Comp.9 Comp.10 Comp.11
## Standard deviation 0.75917 0.66300 0.59387 0.43685 0.22487
## Proportion of Variance 0.04433 0.03381 0.02713 0.01468 0.00389
## Cumulative Proportion 0.91432 0.94813 0.97526 0.98994 0.99383
## Comp.12 Comp.13
## Standard deviation 0.20363 0.196785
## Proportion of Variance 0.00319 0.002979
## Cumulative Proportion 0.99702 1.000000
55/70
Loadings of the first six components
pitprop.pca$loadings[,1:6]
56/70
Rotation
57/70
Rotation
58/70
The Varimax rotation criterion
59/70
Gradient projection algorithm
60/70
Gradient projection algorithm for orthogonal rotation
61/70
The Gradient projection algorithm visualized
∂f
V −α
∂V
updated V
V
Manifold of permissible
matrices M
62/70
Iterative scheme
63/70
Using the Varimax criterion for Jeffers’ pitprops data
library(GPArotation)
A <- pitprop.pca$loadings[,1:6]
B <- GPForth(A, method="varimax")$loadings
B
64/70
Sparse PCA based on the “elastic net”
(k = 1, . . . , p).
65/70
Sparse PCA (SPCA) criterion based on the “elastic net”
I Optimization problem:
n k k
||xi −AB> xi ||2 +λ
X X X
(Â, B̂) = arg min ||βj ||2 + λ1,j ||βj ||1
A,B
i=1 j=1 j=1
subject to A> A = Ik .
66/70
Alternating algorithm to minimize the SPCA criterion
I B given A: For each j, let Yj∗ = Xαj . Each β̂j in
B̂ = (β̂1 , . . . , β̂k ) is an elastic net estimate
β̂j = arg min ||Yj∗ − Xβj ||2 + λ||βj ||2 + λ1,j ||βj ||1 .
βj
i=1
68/70
Implementation of sparse PCA in R
?spca
?arrayspc
69/70
Sparse PCA of Jeffers’ pitprops data
pitprop.spcap <- spca(pitprops,K = 6, type = "Gram", sparse = "penalty",
para=c(0.06,0.16,0.1,0.5,0.5,0.5))
pitprop.spcav <- spca(pitprops,K = 6, type = "Gram", sparse = "varnum",
para = c(7,4,4,1,1,1))
pitprop.spcap$loadings
pitprop.spcap$pev