You are on page 1of 4

Chapter 10: Exercise 11

a
data = read.csv("./Ch10Ex11.csv", header=F)
dim(data)

## [1] 1000 40

b
dd = as.dist(1 - cor(data))
plot(hclust(dd, method="complete"))

plot(hclust(dd, method="single"))
plot(hclust(dd, method="average"))
Two or three groups depending on the linkage method.

c
To look at which genes differ the most across the healthy patients and diseased patients, we could look at the loading
vectors outputted from PCA to see which genes are used to describe the variance the most.

pr.out = prcomp(t(data))
summary(pr.out)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 11.941 6.0682 5.9348 5.8312 5.7521 5.7003 5.6345
## Proportion of Variance 0.127 0.0327 0.0313 0.0302 0.0294 0.0289 0.0282
## Cumulative Proportion 0.127 0.1594 0.1907 0.2209 0.2503 0.2792 0.3074
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 5.5773 5.5494 5.5062 5.4885 5.4602 5.4023 5.3344
## Proportion of Variance 0.0276 0.0274 0.0269 0.0268 0.0265 0.0259 0.0253
## Cumulative Proportion 0.3350 0.3624 0.3893 0.4160 0.4425 0.4685 0.4938
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 5.2776 5.2159 5.200 5.1514 5.1160 5.0559 5.0384
## Proportion of Variance 0.0248 0.0242 0.024 0.0236 0.0232 0.0227 0.0226
## Cumulative Proportion 0.5185 0.5427 0.567 0.5903 0.6135 0.6362 0.6588
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 5.0187 4.9597 4.9139 4.864 4.8180 4.8081 4.7348
## Proportion of Variance 0.0224 0.0219 0.0215 0.021 0.0206 0.0205 0.0199
## Cumulative Proportion 0.6812 0.7030 0.7245 0.745 0.7661 0.7866 0.8066
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 4.7010 4.6556 4.6162 4.5673 4.5303 4.495 4.3650
## Proportion of Variance 0.0196 0.0193 0.0189 0.0185 0.0182 0.018 0.0169
## Cumulative Proportion 0.8262 0.8455 0.8644 0.8829 0.9012 0.919 0.9360
## PC36 PC37 PC38 PC39 PC40
## Standard deviation 4.3586 4.2670 4.2028 4.1392 5.25e-15
## Proportion of Variance 0.0169 0.0162 0.0157 0.0152 0.00e+00
## Cumulative Proportion 0.9529 0.9691 0.9848 1.0000 1.00e+00

total_load = apply(pr.out$rotation, 1, sum)


indices = order(abs(total_load), decreasing=T)
indices[1:10]

## [1] 865 68 911 428 624 11 524 803 980 822

total_load[indices[1:10]]

## [1] 0.7765 0.7138 -0.7100 -0.6364 -0.6196 0.5885 0.5583 0.5535


## [9] -0.5217 0.4982

This shows one representation of the top 1% of differing genes.

(*) I’m not sure this is the correct way to aggregate the loading vector.

You might also like